Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-24797: Store per-run information (configs, software versions) in butler repo #51

Merged
merged 3 commits into from May 14, 2020

Conversation

andy-slac
Copy link
Collaborator

PreExecInit now saves task configurations for every task in a pipeline,
the names of dataset types are fixed as {taskLabel}_config. It also saves
package versions in a dataset with fixed type name "packages". Dataset
types are registered together with all other dataset types when
--register-dataset-types option is specified.

PreExecInit now saves task configuratios for every task in a pipeline,
the names of dataset types are fixed as `{taskLabel}_config`. Dataset
types are registered together with all other dataset types.
Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good; only minor comments.

Exception
Raised if ``skipExisting`` is `False` and datasets already
exists. Content of a butler collection may be changed if
exception is raised.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make this exception safe by wrapping it in with self.butler.transaction(), I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, let me try that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works fine with ci_hsc_gen3, will add that.

if oldPackages is not None:
# Note that because we can only detect python modules that have been imported, the stored
# list of products may be more or less complete than what we have now. What's important is
# that the products that are in common have the same version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the set of imported packages will be sufficiently complete after we've loaded all PipelineTask config instances, and probably not complete before that. Do you happen to know if the sequencing of operations is such that this will be called after that happens?

Copy link
Collaborator Author

@andy-slac andy-slac May 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though about that and figured that it should work because the complete graph is in memory (with some caveat) and it means that config instances are loaded too which should import all relevant code.
The caveat is that BPS is executing this not for the whole graph but one task at a time so there is only one task configuration loaded and not whole graph. But this still should work because we update versions incrementally if they don't conflict. And I think BPS should probably do --init-only on the whole graph in the future.

@andy-slac andy-slac merged commit 8b60ee7 into master May 14, 2020
@andy-slac andy-slac deleted the tickets/DM-24797 branch September 11, 2020 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants