Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-21421 Introduce new Pipeline representation #106

Merged
merged 1 commit into from Nov 9, 2019
Merged

Conversation

natelust
Copy link
Contributor

@natelust natelust commented Nov 6, 2019

This commits introduces the ability for Pipelines to be defined
using yaml documents. The data structures used to facilitate these
documents are now used to implement the in memory Python representation
of Pipelines.

Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly OK but check the comments. There may be an issue with config overrides ordering which I do not completely understand, but it may bite us at some point.

"""An optional message to be shown to the user if a contract fails
"""

def to_primatives(self) -> dict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

primitives? (here and everywhere)


Parameters
----------
python_snippit: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snippet?

self._overrides.append((OverrideTypes.Python, python_snippit))

def addInstrumentOverride(self, instrument: str, task_name: str):
"""Apply any overrides that an intrument has for a task
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing s in instrument

name: {name}
tasks:
"""
self._pipelineIR = pipelineIR.PipelineIR(yaml.safe_load(task_string))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you just pass a dict here, going through yaml parsing seem to be an overkill.

@@ -84,54 +94,226 @@ def __str__(self):
return rep


class Pipeline(list):
"""Pipeline is a sequence of `TaskDef` objects.
class Pipeline:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring would be useful.


@classmethod
def from_file(cls, filename: str):
"""Create a `PipelineIR` object from the docuemnt specified by the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docuemnt

if self.instrument is not None:
accumulate['instrument'] = self.instrument
accumulate['tasks'] = {l: t.to_primatives() for l, t in self.tasks.items()}
if len(self.contracts) > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if self.contracts: is few characters shorter 🙂

# exist
pipeline_str = """
name: Test Pipeline
inherits: dummy_pipeline.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No one ever created dummy_pipeline.yaml in the current directory? Maybe just add slash, I think /dummy_pipeline.yaml has fewer chances to be created.

modA: test.modA
modB:
class: test.modB
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This YAML is indented (has leading white spaces in each line). It probably does not matter but I would use textwrap.dedent() on that to be 100% sure.

name: Test Pipeline
inherits:
- "{PIPE_BASE_DIR}/tests/testPipeline1.yaml"
- "{PIPE_BASE_DIR}/tests/testPipeline2.yaml"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I would prefer shell-style envvar expansion ($PIPE_BASE_DIR or ${PIPE_BASE_DIR}), would it be hard to implement? I know that current style is very easy to do in Python with format() but it feels a bit too much Python-oriented.

@andy-slac
Copy link
Contributor

One more after-thought - in the new design the pipeline name is a sort of identifying attribute (e.g. pipeline name is used when comparing two pipelines) but in general if feels more like an accidental feature. I'm not sure why pipeline is important or why should it exist at all. If it's for documentation/comment purposes then it does not need to be identifying.

@natelust
Copy link
Contributor Author

natelust commented Nov 8, 2019

To address your two points, things are now changed up such that you can only ever merge two configIR objects if they are key value only and have no repeated keys. This supports doing a few overrides and serializing out the new pipeline, but as you say the rest is hard to reason about what the correct behavior is without knowing the contents, so they are now left as is.

The name field was always intended to to be a friendlier identifier anyway, so I changed it up to a description tag, so that if someone has a pipeline in memory they can examine what was written from within python and not just with yaml comments.

This commits introduces the ability for Pipelines to be defined
using yaml documents. The data structures used to facilitate these
documents are now used to implement the in memory Python representation
of Pipelines.
@natelust natelust merged commit 6ba2a86 into master Nov 9, 2019
@timj timj deleted the tickets/DM-21421 branch April 13, 2022 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants