DM-22162: Add metadata writing to PipelineTask execution logic (pipe_base) #110

andy-slac · 2019-12-06T23:16:40Z

Add support for a special dataset type to store task metadata. The dataset type is added to quantum outputs like regular dataset so that other tasks can specify it as an input. The dataset type does not appear in standard connections config, instead it is added dynamically based on task config parameter saveMetadata (bool).

natelust

overall strait forward and well implemented

natelust · 2019-12-09T19:53:24Z

tests/test_config.py

+        self.assertTrue(config.saveMetadata)
+        config.saveMetadata = False
+        self.assertFalse(config.saveMetadata)
+


So while I am all for unit tests, this test does not actually seem to be testing anything about code in pipe_base, but rather that pex_config works the way pex_config should, which presumably pex_config tests. If you wanted to test something about saveMetaData in a unit test, maybe just checking that the descriptor is present in the PipelineTaskConfig class, and or possibly its default value if you think changing it could cause future problems.

I think about this test as testing for a default value of a parameter and that it can be changed to a different value. I can probably remove it altogether, this test initially was done for a different parameter (metadataDataset) with more complicated set of allowed values, it's not too useful for simple boolean option.

natelust · 2019-12-09T20:38:05Z

python/lsst/pipe/base/pipeline.py

@@ -212,6 +222,13 @@ def addTask(self, task: Union[PipelineTask, str], label: str):
        else:
            raise ValueError("task must be either a child class of PipelineTask or a string containing"
                             " a fully qualified name to one")
+        if not label:


Actually Thinking about this now, I dont think is possible with the new pipeline object to have a task that does not have a label specified.

Hmm looking at the code in ctrl_mpexec for task with no label, I didnt remove the ? in the regex, and or change make pipeline to use the task name if there is no label. This will lead to weird/broken behavior. If you dont want to change on this ticket, that's fine. I can make a ticket and fix that behavior.

I think we don't want to force users to always provide label on the command line so if label is missing then it should come from _DefaultName. To know _DefaultName I need to import task class and this is where I thought is the most natural place for it. I do not want to import anything when command line is parsed, and another potential place for that is in CmdLineFwk class but I think that if I do it here it will be more generic. Of course if you say that Pipeline.addTask method has to receive non-empty label then I'd simply add a check here and move doImport to CmdLineFwk instead.

@natelust, let me know if you want me to move that doImport to CmdLineFwk before I merge both branches, should be easy for me to do, certainly faster than opening another ticket.

I guess this is fine here, I really dont like the doImport in either place, as it is done again later. One thing we talked about doing in the future is not having a _DefaultName at all, and using the name of the task in places where _DefaultName would have been used. How would you feel about just using the string name of the class for the label here? I think @TallJimbo might have had and opinion as well.

I'm OK with using class name instead of _DefaultName, but I know that _DefaultName has a long history and this should probably be discussed with wider audience. Just tell me what to do, I'll do it.

Lets stick with _DefaultName for now. I'd like to replace that with the unqualified Task name or something derived from it eventually (and then take advantage of that to e.g. avoid the doImport here), but until we've done that more globally, using the unqualified Task name here just exacerbates the problem of having too many names for a Task.

OK, I'll merge it as it is now.

Add support for metadata dataset in QuantumGraph

6887494

natelust approved these changes Dec 9, 2019

View reviewed changes

natelust reviewed Dec 9, 2019

View reviewed changes

Switch to on/off config for metadata, fixed dataset name

8587f3b

andy-slac force-pushed the tickets/DM-22162 branch from 73022bf to 8587f3b Compare December 9, 2019 21:44

andy-slac merged commit 204a9c6 into master Dec 10, 2019

andy-slac deleted the tickets/DM-22162 branch July 10, 2022 02:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-22162: Add metadata writing to PipelineTask execution logic (pipe_base) #110

DM-22162: Add metadata writing to PipelineTask execution logic (pipe_base) #110

andy-slac commented Dec 6, 2019

natelust left a comment

natelust Dec 9, 2019

andy-slac Dec 9, 2019

natelust Dec 9, 2019

natelust Dec 9, 2019

andy-slac Dec 9, 2019

andy-slac Dec 9, 2019

natelust Dec 10, 2019

andy-slac Dec 10, 2019

TallJimbo Dec 10, 2019

andy-slac Dec 10, 2019

DM-22162: Add metadata writing to PipelineTask execution logic (pipe_base) #110

DM-22162: Add metadata writing to PipelineTask execution logic (pipe_base) #110

Conversation

andy-slac commented Dec 6, 2019

natelust left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment