New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-16275: Improve PipelineTask sub-classing support #66
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a few minor comments. I suspect this will break some untested PipelineTask implementations in pipe_tasks, but the fix should be trivial, and I'm content for @natelust to fix that after he gets back from vacation; it may provide an opportunity to revisit our conventions for how to declare auxiiliary DatasetTypes.
@@ -46,6 +46,65 @@ def __init__(self, key, numDataIds): | |||
"received {} DataIds").format(key, numDataIds)) | |||
|
|||
|
|||
class DatasetTypeDescriptor: | |||
"""Describe DatasetType and its option for PipelineTask. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
option
-> options
datasetType : `DatasetType` | ||
scalar : `bool`, optional | ||
`True` if this is a scalar dataset, `None` for dataset types that | ||
do not need scalar option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "dataset types that do not need scalar option", do you mean InitInput
and InitOutput
datasets (which are always scalar
, I think?). If so, it might be better to just let those say scalar == True
and not make this optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's InitInput
and InitOutput
, their configs don't have scalar
field so I though it should be more consistent to have None
for them. I think it should be safe to use scalar=True
for those, currently we don't examine scalar
value for those dataset types anywhere.
|
||
Parameters | ||
---------- | ||
datasetConfig : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure sphinx will be happy with the type being missing here. If you're not sure either, could you try building the sphinx docs to check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what is going on but sphinx refuses to generate documentation for any of the class attributes/methods (though it makes docs for class and constructor). I'm sure missing type is not a problem here but I will add object
as a type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or lsst.pex.config.Config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's because of __slots__
, I'll remove __slots__
, they are not super-useful here. Sphinx does OK without type, still I'll leave lsst.pex.config.Config
as a type.
fields are used as keys in returned dictionary. Subclasses can | ||
override this behavior. | ||
uses them for constructing `DatasetTypeDescriptor` instances. The | ||
keys of these fields are used as keys in returned dictionary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-existing, but I'd say "names of these fields" rather than "keys of these fields", since the fields aren't really in a dict.
dataRefs = dataRefs[0] | ||
dataIds = dataIds[0] | ||
outputDataRefs[key] = dataRefs | ||
outputDataIds[key] = dataIds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block for outputs and the one above for inputs are awfully similar. Could you most of this into a separate method? Could even be a local function.
I think |
Changed return type of the get*DatasetTypes methods so that they can be used by PipelineTask class itself (and can be meaningfully redefined in subclasses). This adds new class `DatasetTypeDescriptor` which holds dataset type instance and corresponding configuration options. Unit tests were updated and extended to test new behavior.
8546190
to
eae5d35
Compare
Changed return type of the get*DatasetTypes methods so that they can be
used by PipelineTask class itself (and can be meaningfully redefined in
subclasses). This adds new class
DatasetTypeDescriptor
which holdsdataset type instance and corresponding configuration options. Unit tests
were updated and extended to test new behavior.