New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-24797: Store per-run information (configs, software versions) in butler repo #51
Conversation
PreExecInit now saves task configuratios for every task in a pipeline, the names of dataset types are fixed as `{taskLabel}_config`. Dataset types are registered together with all other dataset types.
b25ce7a
to
e7c3758
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good; only minor comments.
Exception | ||
Raised if ``skipExisting`` is `False` and datasets already | ||
exists. Content of a butler collection may be changed if | ||
exception is raised. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make this exception safe by wrapping it in with self.butler.transaction()
, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, let me try that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works fine with ci_hsc_gen3, will add that.
if oldPackages is not None: | ||
# Note that because we can only detect python modules that have been imported, the stored | ||
# list of products may be more or less complete than what we have now. What's important is | ||
# that the products that are in common have the same version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the set of imported packages will be sufficiently complete after we've loaded all PipelineTask
config instances, and probably not complete before that. Do you happen to know if the sequencing of operations is such that this will be called after that happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I though about that and figured that it should work because the complete graph is in memory (with some caveat) and it means that config instances are loaded too which should import all relevant code.
The caveat is that BPS is executing this not for the whole graph but one task at a time so there is only one task configuration loaded and not whole graph. But this still should work because we update versions incrementally if they don't conflict. And I think BPS should probably do --init-only on the whole graph in the future.
PreExecInit now saves task configurations for every task in a pipeline,
the names of dataset types are fixed as
{taskLabel}_config
. It also savespackage versions in a dataset with fixed type name "packages". Dataset
types are registered together with all other dataset types when
--register-dataset-types option is specified.