New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-27140 Add named subsets to pipelines #154
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from a preliminary perspective; don't have much to say structurally. I haven't done a close read for e.g. docstrings etc. yet.
python/lsst/pipe/base/pipeline.py
Outdated
for label in labelSpecifier.labels: | ||
if label in pipeline._pipelineIR.named_subsets: | ||
toRemove.add(label) | ||
toAdd.update(pipeline._pipelineIR.named_subsets[label].subset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you thought about whether this should be recursive (i.e. can subsets nest)? I don't see a compelling reason why it should be, but it's probably something that should be a conscious choice rather than an implementation artifact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but lots of little style nitpicks.
python/lsst/pipe/base/pipeline.py
Outdated
# is needed because a user may only configure the tasks they intend | ||
# to run, which may cause some contracts to fail if they will later | ||
# be dropped | ||
pipeline = copy.copy(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure a one-level deep shallow copy is the right level of copying here?
(I always find copy.copy
to be scary, and I might go so far as to call it an antipattern, because whether it is or is not the right amount of copying to do is really an implementation detail in general.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I had deep copy then switched to copy because I knew the implementation details (there are only a few top level containers of things that nothing really modifies) but it probably is better to always deep copy to avoid any future issues that may arise, it certainly is worth the small amount of memory overhead.
python/lsst/pipe/base/pipeline.py
Outdated
closed = True | ||
return pipeline | ||
labelSet.add(label) | ||
if labelSpecifier.end and label == labelSpecifier.end: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the first test here could be labelSpecifier.end is not None
, it probably should be.
python/lsst/pipe/base/pipelineIR.py
Outdated
"""A set of task labels contained in this subset. | ||
""" | ||
description: Optional[str] | ||
"""A description of what this sub set of tasks is intended to do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"sub set" -> "subset"
python/lsst/pipe/base/pipelineIR.py
Outdated
@staticmethod | ||
def from_yaml(label: str, value: Union[List[str], dict]) -> LabeledSubset: | ||
"""This static method generates `LabeledSubset` objects given a properly | ||
formatted object that as been created by a yaml loader. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring should start with the imperative voice.
python/lsst/pipe/base/pipelineIR.py
Outdated
---------- | ||
label : `str` | ||
The label that will be used to identify this labeled subset. | ||
value : `list` of str or `dict` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add backticks around str
.
python/lsst/pipe/base/pipelineIR.py
Outdated
subset = value | ||
description = None | ||
else: | ||
raise ValueError(f"There was a problem parsing the labeled subset {label}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a very helpful error message. Can you at least report what syntax was expected?
python/lsst/pipe/base/pipelineIR.py
Outdated
|
||
Parameters | ||
---------- | ||
loaded_yaml: `Mapping` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, I think this needs to be MutableMapping
(and you should probably state below that it's modified in-place).
raise ValueError(f"There was a problem parsing the labeled subset {label}") | ||
return LabeledSubset(label, set(subset), description) | ||
|
||
def to_primitives(self) -> dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the "to" method to_primitives
, but the "from" method from_yaml
? It looks like "primitives" and "yaml" mean the same thing, given that from_yaml
operates on a dict
or list
rather than a file or string.
python/lsst/pipe/base/pipelineIR.py
Outdated
# existing labeled subsets, and with existing task labels. | ||
overlapping_subsets = accumulate_labeled_subsets.keys() & tmp_IR.labeled_subsets.keys() | ||
task_subset_overlap = (accumulate_labeled_subsets.keys() | tmp_IR.labeled_subsets.keys()) \ | ||
& accumulate_tasks.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An extra parentheses for line continuation is preferred over backslash.
|
||
# create a copy of the object to iterate over | ||
labeled_subsets = copy.copy(pipeline.labeled_subsets) | ||
# remove any labeled subsets that no longer have a complete set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like the right behavior, but it's sufficiently non-obvious that it should probably be mentioned in the docstring as well.
1413b97
to
7ae232a
Compare
Add a directive to pipeline documents that allows a subset of labels in the pipeline to be declared with a name. This modification allows portions of a pipeline to be run with using this named subset instead of specifying a list of labels.
There were a few modules issues with documentation generation, and some documentation was not being generated at all. This commit makes sure all the appropriate documentation builds.
7ae232a
to
fcbc789
Compare
Add a directive to pipeline documents that allows a subset of
labels in the pipeline to be declared with a name. This
modification allows portions of a pipeline to be run with using
this named subset instead of specifying a list of labels.