New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

DM-27140 Add named subsets to pipelines #154

Merged

natelust merged 2 commits into master from tickets/DM-27140

Oct 21, 2020

Contributor

natelust commented Oct 12, 2020

Add a directive to pipeline documents that allows a subset of
labels in the pipeline to be declared with a name. This
modification allows portions of a pipeline to be run with using
this named subset instead of specifying a list of labels.

natelust requested a review from TallJimbo

October 12, 2020 22:30

TallJimbo reviewed

View reviewed changes

Member

TallJimbo left a comment

Looks good from a preliminary perspective; don't have much to say structurally. I haven't done a close read for e.g. docstrings etc. yet.

python/lsst/pipe/base/pipeline.py Outdated

+                          for label in labelSpecifier.labels:
+                              if label in pipeline._pipelineIR.named_subsets:
+                                  toRemove.add(label)
+                                  toAdd.update(pipeline._pipelineIR.named_subsets[label].subset)

Member

TallJimbo Oct 13, 2020

Have you thought about whether this should be recursive (i.e. can subsets nest)? I don't see a compelling reason why it should be, but it's probably something that should be a conscious choice rather than an implementation artifact.

TallJimbo approved these changes

View reviewed changes

Member

TallJimbo left a comment

Looks good, but lots of little style nitpicks.

python/lsst/pipe/base/pipeline.py Outdated

+                          # is needed because a user may only configure the tasks they intend
+                          # to run, which may cause some contracts to fail if they will later
+                          # be dropped
+                          pipeline = copy.copy(self)

Member

TallJimbo Oct 19, 2020

Are you sure a one-level deep shallow copy is the right level of copying here?

(I always find copy.copy to be scary, and I might go so far as to call it an antipattern, because whether it is or is not the right amount of copying to do is really an implementation detail in general.)

Contributor Author

natelust Oct 20, 2020

Yeah, I had deep copy then switched to copy because I knew the implementation details (there are only a few top level containers of things that nothing really modifies) but it probably is better to always deep copy to avoid any future issues that may arise, it certainly is worth the small amount of memory overhead.

python/lsst/pipe/base/pipeline.py Outdated

-                                      closed = True
-                      return pipeline
+                              labelSet.add(label)
+                              if labelSpecifier.end and label == labelSpecifier.end:

Member

TallJimbo Oct 19, 2020

If the first test here could be labelSpecifier.end is not None, it probably should be.

python/lsst/pipe/base/pipelineIR.py Outdated

+                  """A set of task labels contained in this subset.
+                  """
+                  description: Optional[str]
+                  """A description of what this sub set of tasks is intended to do

Member

TallJimbo Oct 19, 2020

"sub set" -> "subset"

python/lsst/pipe/base/pipelineIR.py Outdated

+                  @staticmethod
+                  def from_yaml(label: str, value: Union[List[str], dict]) -> LabeledSubset:
+                      """This static method generates `LabeledSubset` objects given a properly
+                      formatted object that as been created by a yaml loader.

Member

TallJimbo Oct 19, 2020

Docstring should start with the imperative voice.

python/lsst/pipe/base/pipelineIR.py Outdated

+                      ----------
+                      label : `str`
+                          The label that will be used to identify this labeled subset.
+                      value : `list` of str or `dict`

Member

TallJimbo Oct 19, 2020

Add backticks around str.

python/lsst/pipe/base/pipelineIR.py Outdated

+                          subset = value
+                          description = None
+                      else:
+                          raise ValueError(f"There was a problem parsing the labeled subset {label}")

Member

TallJimbo Oct 19, 2020

This is not a very helpful error message. Can you at least report what syntax was expected?

python/lsst/pipe/base/pipelineIR.py Outdated

+                      Parameters
+                      ----------
+                      loaded_yaml: `Mapping`

Member

TallJimbo Oct 19, 2020

As above, I think this needs to be MutableMapping (and you should probably state below that it's modified in-place).

python/lsst/pipe/base/pipelineIR.py

+                          raise ValueError(f"There was a problem parsing the labeled subset {label}")
+                      return LabeledSubset(label, set(subset), description)
+                  def to_primitives(self) -> dict:

Member

TallJimbo Oct 19, 2020

Why is the "to" method to_primitives, but the "from" method from_yaml? It looks like "primitives" and "yaml" mean the same thing, given that from_yaml operates on a dict or list rather than a file or string.

python/lsst/pipe/base/pipelineIR.py Outdated

+                          # existing labeled subsets, and with existing task labels.
+                          overlapping_subsets = accumulate_labeled_subsets.keys() & tmp_IR.labeled_subsets.keys()
+                          task_subset_overlap = (accumulate_labeled_subsets.keys() | tmp_IR.labeled_subsets.keys()) \
+                              & accumulate_tasks.keys()

Member

TallJimbo Oct 19, 2020

An extra parentheses for line continuation is preferred over backslash.

python/lsst/pipe/base/pipelineIR.py

+                      # create a copy of the object to iterate over
+                      labeled_subsets = copy.copy(pipeline.labeled_subsets)
+                      # remove any labeled subsets that no longer have a complete set

Member

TallJimbo Oct 19, 2020

This sounds like the right behavior, but it's sufficiently non-obvious that it should probably be mentioned in the docstring as well.

natelust force-pushed the tickets/DM-27140 branch from 1413b97 to 7ae232a Compare

October 21, 2020 16:08

natelust added 2 commits

October 21, 2020 11:19


          Add named subsets to pipelines

82f59a7

Add a directive to pipeline documents that allows a subset of
labels in the pipeline to be declared with a name. This
modification allows portions of a pipeline to be run with using
this named subset instead of specifying a list of labels.


          Fix documentation of pipe base

fcbc789

There were a few modules issues with documentation generation,
and some documentation was not being generated at all. This
commit makes sure all the appropriate documentation builds.

natelust force-pushed the tickets/DM-27140 branch from 7ae232a to fcbc789 Compare

October 21, 2020 16:19

natelust merged commit e1cbb79 into master

timj deleted the tickets/DM-27140 branch

April 13, 2022 22:20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment