New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-22162: Add metadata writing to PipelineTask execution logic (pipe_base) #110
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall strait forward and well implemented
tests/test_config.py
Outdated
self.assertTrue(config.saveMetadata) | ||
config.saveMetadata = False | ||
self.assertFalse(config.saveMetadata) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So while I am all for unit tests, this test does not actually seem to be testing anything about code in pipe_base, but rather that pex_config works the way pex_config should, which presumably pex_config tests. If you wanted to test something about saveMetaData in a unit test, maybe just checking that the descriptor is present in the PipelineTaskConfig class, and or possibly its default value if you think changing it could cause future problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think about this test as testing for a default value of a parameter and that it can be changed to a different value. I can probably remove it altogether, this test initially was done for a different parameter (metadataDataset
) with more complicated set of allowed values, it's not too useful for simple boolean option.
@@ -212,6 +222,13 @@ def addTask(self, task: Union[PipelineTask, str], label: str): | |||
else: | |||
raise ValueError("task must be either a child class of PipelineTask or a string containing" | |||
" a fully qualified name to one") | |||
if not label: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually Thinking about this now, I dont think is possible with the new pipeline object to have a task that does not have a label specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm looking at the code in ctrl_mpexec for task with no label, I didnt remove the ? in the regex, and or change make pipeline to use the task name if there is no label. This will lead to weird/broken behavior. If you dont want to change on this ticket, that's fine. I can make a ticket and fix that behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't want to force users to always provide label on the command line so if label is missing then it should come from _DefaultName
. To know _DefaultName
I need to import task class and this is where I thought is the most natural place for it. I do not want to import anything when command line is parsed, and another potential place for that is in CmdLineFwk
class but I think that if I do it here it will be more generic. Of course if you say that Pipeline.addTask
method has to receive non-empty label then I'd simply add a check here and move doImport
to CmdLineFwk
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@natelust, let me know if you want me to move that doImport
to CmdLineFwk
before I merge both branches, should be easy for me to do, certainly faster than opening another ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is fine here, I really dont like the doImport in either place, as it is done again later. One thing we talked about doing in the future is not having a _DefaultName at all, and using the name of the task in places where _DefaultName would have been used. How would you feel about just using the string name of the class for the label here? I think @TallJimbo might have had and opinion as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with using class name instead of _DefaultName, but I know that _DefaultName has a long history and this should probably be discussed with wider audience. Just tell me what to do, I'll do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets stick with _DefaultName
for now. I'd like to replace that with the unqualified Task name or something derived from it eventually (and then take advantage of that to e.g. avoid the doImport
here), but until we've done that more globally, using the unqualified Task name here just exacerbates the problem of having too many names for a Task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll merge it as it is now.
73022bf
to
8587f3b
Compare
Add support for a special dataset type to store task metadata. The dataset type is added to quantum outputs like regular dataset so that other tasks can specify it as an input. The dataset type does not appear in standard connections config, instead it is added dynamically based on task config parameter
saveMetadata
(bool).