DM-16275: Improve PipelineTask sub-classing support #66

andy-slac · 2018-10-24T23:36:10Z

Changed return type of the get*DatasetTypes methods so that they can be
used by PipelineTask class itself (and can be meaningfully redefined in
subclasses). This adds new class DatasetTypeDescriptor which holds
dataset type instance and corresponding configuration options. Unit tests
were updated and extended to test new behavior.

TallJimbo

Only a few minor comments. I suspect this will break some untested PipelineTask implementations in pipe_tasks, but the fix should be trivial, and I'm content for @natelust to fix that after he gets back from vacation; it may provide an opportunity to revisit our conventions for how to declare auxiiliary DatasetTypes.

TallJimbo · 2018-10-26T18:49:56Z

python/lsst/pipe/base/pipelineTask.py

@@ -46,6 +46,65 @@ def __init__(self, key, numDataIds):
                          "received {} DataIds").format(key, numDataIds))


+class DatasetTypeDescriptor:
+    """Describe DatasetType and its option for PipelineTask.


option -> options

TallJimbo · 2018-10-26T18:52:07Z

python/lsst/pipe/base/pipelineTask.py

+    datasetType : `DatasetType`
+    scalar : `bool`, optional
+        `True` if this is a scalar dataset, `None` for dataset types that
+        do not need scalar option.


By "dataset types that do not need scalar option", do you mean InitInput and InitOutput datasets (which are always scalar, I think?). If so, it might be better to just let those say scalar == True and not make this optional.

Yes, it's InitInput and InitOutput, their configs don't have scalar field so I though it should be more consistent to have None for them. I think it should be safe to use scalar=True for those, currently we don't examine scalar value for those dataset types anywhere.

TallJimbo · 2018-10-26T18:53:59Z

python/lsst/pipe/base/pipelineTask.py

+
+        Parameters
+        ----------
+        datasetConfig :


I'm not sure sphinx will be happy with the type being missing here. If you're not sure either, could you try building the sphinx docs to check?

Not sure what is going on but sphinx refuses to generate documentation for any of the class attributes/methods (though it makes docs for class and constructor). I'm sure missing type is not a problem here but I will add object as a type.

Or lsst.pex.config.Config

I think it's because of __slots__, I'll remove __slots__, they are not super-useful here. Sphinx does OK without type, still I'll leave lsst.pex.config.Config as a type.

TallJimbo · 2018-10-26T18:55:44Z

python/lsst/pipe/base/pipelineTask.py

-        fields are used as keys in returned dictionary. Subclasses can
-        override this behavior.
+        uses them for constructing `DatasetTypeDescriptor` instances. The
+        keys of these fields are used as keys in returned dictionary.


Pre-existing, but I'd say "names of these fields" rather than "keys of these fields", since the fields aren't really in a dict.

TallJimbo · 2018-10-26T19:05:09Z

python/lsst/pipe/base/pipelineTask.py

+                dataRefs = dataRefs[0]
+                dataIds = dataIds[0]
+            outputDataRefs[key] = dataRefs
+            outputDataIds[key] = dataIds


This block for outputs and the one above for inputs are awfully similar. Could you most of this into a separate method? Could even be a local function.

andy-slac · 2018-10-26T20:00:35Z

I think pipe_tasks as currently in stack should be OK. DetectCoaddSourcesTask overrides getOutputDatasetTypes and getInitOutputDatasetTypes bit it calls base class methods after updating config. Of course there may be other commits not in last weekly build yet.

Changed return type of the get*DatasetTypes methods so that they can be used by PipelineTask class itself (and can be meaningfully redefined in subclasses). This adds new class `DatasetTypeDescriptor` which holds dataset type instance and corresponding configuration options. Unit tests were updated and extended to test new behavior.

TallJimbo approved these changes Oct 26, 2018

View reviewed changes

andy-slac force-pushed the tickets/DM-16275 branch from 8546190 to eae5d35 Compare October 26, 2018 22:10

andy-slac merged commit b461e1a into master Oct 26, 2018

timj deleted the tickets/DM-16275 branch April 13, 2022 22:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-16275: Improve PipelineTask sub-classing support #66

DM-16275: Improve PipelineTask sub-classing support #66

andy-slac commented Oct 24, 2018

TallJimbo left a comment

TallJimbo Oct 26, 2018

TallJimbo Oct 26, 2018

andy-slac Oct 26, 2018

TallJimbo Oct 26, 2018

andy-slac Oct 26, 2018

TallJimbo Oct 26, 2018

andy-slac Oct 26, 2018 •

edited

TallJimbo Oct 26, 2018

TallJimbo Oct 26, 2018

andy-slac commented Oct 26, 2018

DM-16275: Improve PipelineTask sub-classing support #66

DM-16275: Improve PipelineTask sub-classing support #66

Conversation

andy-slac commented Oct 24, 2018

TallJimbo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andy-slac Oct 26, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andy-slac commented Oct 26, 2018

andy-slac Oct 26, 2018 •

edited