Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-16275: Improve PipelineTask sub-classing support #66

Merged
merged 1 commit into from Oct 26, 2018

Conversation

andy-slac
Copy link
Contributor

Changed return type of the get*DatasetTypes methods so that they can be
used by PipelineTask class itself (and can be meaningfully redefined in
subclasses). This adds new class DatasetTypeDescriptor which holds
dataset type instance and corresponding configuration options. Unit tests
were updated and extended to test new behavior.

Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a few minor comments. I suspect this will break some untested PipelineTask implementations in pipe_tasks, but the fix should be trivial, and I'm content for @natelust to fix that after he gets back from vacation; it may provide an opportunity to revisit our conventions for how to declare auxiiliary DatasetTypes.

@@ -46,6 +46,65 @@ def __init__(self, key, numDataIds):
"received {} DataIds").format(key, numDataIds))


class DatasetTypeDescriptor:
"""Describe DatasetType and its option for PipelineTask.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

option -> options

datasetType : `DatasetType`
scalar : `bool`, optional
`True` if this is a scalar dataset, `None` for dataset types that
do not need scalar option.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "dataset types that do not need scalar option", do you mean InitInput and InitOutput datasets (which are always scalar, I think?). If so, it might be better to just let those say scalar == True and not make this optional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's InitInput and InitOutput, their configs don't have scalar field so I though it should be more consistent to have None for them. I think it should be safe to use scalar=True for those, currently we don't examine scalar value for those dataset types anywhere.


Parameters
----------
datasetConfig :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure sphinx will be happy with the type being missing here. If you're not sure either, could you try building the sphinx docs to check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what is going on but sphinx refuses to generate documentation for any of the class attributes/methods (though it makes docs for class and constructor). I'm sure missing type is not a problem here but I will add object as a type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or lsst.pex.config.Config

Copy link
Contributor Author

@andy-slac andy-slac Oct 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's because of __slots__, I'll remove __slots__, they are not super-useful here. Sphinx does OK without type, still I'll leave lsst.pex.config.Config as a type.

fields are used as keys in returned dictionary. Subclasses can
override this behavior.
uses them for constructing `DatasetTypeDescriptor` instances. The
keys of these fields are used as keys in returned dictionary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-existing, but I'd say "names of these fields" rather than "keys of these fields", since the fields aren't really in a dict.

dataRefs = dataRefs[0]
dataIds = dataIds[0]
outputDataRefs[key] = dataRefs
outputDataIds[key] = dataIds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block for outputs and the one above for inputs are awfully similar. Could you most of this into a separate method? Could even be a local function.

@andy-slac
Copy link
Contributor Author

I think pipe_tasks as currently in stack should be OK. DetectCoaddSourcesTask overrides getOutputDatasetTypes and getInitOutputDatasetTypes bit it calls base class methods after updating config. Of course there may be other commits not in last weekly build yet.

Changed return type of the get*DatasetTypes methods so that they can be
used by PipelineTask class itself (and can be meaningfully redefined in
subclasses). This adds new class `DatasetTypeDescriptor` which holds
dataset type instance and corresponding configuration options. Unit tests
were updated and extended to test new behavior.
@andy-slac andy-slac merged commit b461e1a into master Oct 26, 2018
@timj timj deleted the tickets/DM-16275 branch April 13, 2022 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants