New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-21421 Introduce new Pipeline representation #106
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly OK but check the comments. There may be an issue with config overrides ordering which I do not completely understand, but it may bite us at some point.
python/lsst/pipe/base/pipelineIR.py
Outdated
"""An optional message to be shown to the user if a contract fails | ||
""" | ||
|
||
def to_primatives(self) -> dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
primitives? (here and everywhere)
|
||
Parameters | ||
---------- | ||
python_snippit: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
snippet?
self._overrides.append((OverrideTypes.Python, python_snippit)) | ||
|
||
def addInstrumentOverride(self, instrument: str, task_name: str): | ||
"""Apply any overrides that an intrument has for a task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing s in instrument
python/lsst/pipe/base/pipeline.py
Outdated
name: {name} | ||
tasks: | ||
""" | ||
self._pipelineIR = pipelineIR.PipelineIR(yaml.safe_load(task_string)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you just pass a dict here, going through yaml parsing seem to be an overkill.
@@ -84,54 +94,226 @@ def __str__(self): | |||
return rep | |||
|
|||
|
|||
class Pipeline(list): | |||
"""Pipeline is a sequence of `TaskDef` objects. | |||
class Pipeline: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring would be useful.
python/lsst/pipe/base/pipelineIR.py
Outdated
|
||
@classmethod | ||
def from_file(cls, filename: str): | ||
"""Create a `PipelineIR` object from the docuemnt specified by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docuemnt
if self.instrument is not None: | ||
accumulate['instrument'] = self.instrument | ||
accumulate['tasks'] = {l: t.to_primatives() for l, t in self.tasks.items()} | ||
if len(self.contracts) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self.contracts:
is few characters shorter 🙂
tests/test_pipelineIR.py
Outdated
# exist | ||
pipeline_str = """ | ||
name: Test Pipeline | ||
inherits: dummy_pipeline.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No one ever created dummy_pipeline.yaml
in the current directory? Maybe just add slash, I think /dummy_pipeline.yaml
has fewer chances to be created.
tests/test_pipelineIR.py
Outdated
modA: test.modA | ||
modB: | ||
class: test.modB | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This YAML is indented (has leading white spaces in each line). It probably does not matter but I would use textwrap.dedent()
on that to be 100% sure.
tests/test_pipelineIR.py
Outdated
name: Test Pipeline | ||
inherits: | ||
- "{PIPE_BASE_DIR}/tests/testPipeline1.yaml" | ||
- "{PIPE_BASE_DIR}/tests/testPipeline2.yaml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I would prefer shell-style envvar expansion ($PIPE_BASE_DIR
or ${PIPE_BASE_DIR}
), would it be hard to implement? I know that current style is very easy to do in Python with format()
but it feels a bit too much Python-oriented.
One more after-thought - in the new design the pipeline name is a sort of identifying attribute (e.g. pipeline name is used when comparing two pipelines) but in general if feels more like an accidental feature. I'm not sure why pipeline is important or why should it exist at all. If it's for documentation/comment purposes then it does not need to be identifying. |
To address your two points, things are now changed up such that you can only ever merge two configIR objects if they are key value only and have no repeated keys. This supports doing a few overrides and serializing out the new pipeline, but as you say the rest is hard to reason about what the correct behavior is without knowing the contents, so they are now left as is. The name field was always intended to to be a friendlier identifier anyway, so I changed it up to a description tag, so that if someone has a pipeline in memory they can examine what was written from within python and not just with yaml comments. |
This commits introduces the ability for Pipelines to be defined using yaml documents. The data structures used to facilitate these documents are now used to implement the in memory Python representation of Pipelines.
5061c41
to
6ba2a86
Compare
This commits introduces the ability for Pipelines to be defined
using yaml documents. The data structures used to facilitate these
documents are now used to implement the in memory Python representation
of Pipelines.