DM-24797: Store per-run information (configs, software versions) in butler repo #51

andy-slac · 2020-05-13T22:23:36Z

PreExecInit now saves task configurations for every task in a pipeline,
the names of dataset types are fixed as {taskLabel}_config. It also saves
package versions in a dataset with fixed type name "packages". Dataset
types are registered together with all other dataset types when
--register-dataset-types option is specified.

PreExecInit now saves task configuratios for every task in a pipeline, the names of dataset types are fixed as `{taskLabel}_config`. Dataset types are registered together with all other dataset types.

TallJimbo

Looks good; only minor comments.

TallJimbo · 2020-05-14T16:19:29Z

python/lsst/ctrl/mpexec/preExecInit.py

+        Exception
+            Raised if ``skipExisting`` is `False` and datasets already
+            exists. Content of a butler collection may be changed if
+            exception is raised.


We could make this exception safe by wrapping it in with self.butler.transaction(), I think.

Indeed, let me try that.

That works fine with ci_hsc_gen3, will add that.

TallJimbo · 2020-05-14T16:23:49Z

python/lsst/ctrl/mpexec/preExecInit.py

+        if oldPackages is not None:
+            # Note that because we can only detect python modules that have been imported, the stored
+            # list of products may be more or less complete than what we have now.  What's important is
+            # that the products that are in common have the same version.


I think the set of imported packages will be sufficiently complete after we've loaded all PipelineTask config instances, and probably not complete before that. Do you happen to know if the sequencing of operations is such that this will be called after that happens?

I though about that and figured that it should work because the complete graph is in memory (with some caveat) and it means that config instances are loaded too which should import all relevant code.
The caveat is that BPS is executing this not for the whole graph but one task at a time so there is only one task configuration loaded and not whole graph. But this still should work because we update versions incrementally if they don't conflict. And I think BPS should probably do --init-only on the whole graph in the future.

andy-slac added 2 commits May 13, 2020 17:25

Implement saving of task configurations (DM-24797)

6f5274b

PreExecInit now saves task configuratios for every task in a pipeline, the names of dataset types are fixed as `{taskLabel}_config`. Dataset types are registered together with all other dataset types.

Implement saving of package versions.

e7c3758

andy-slac force-pushed the tickets/DM-24797 branch from b25ce7a to e7c3758 Compare May 13, 2020 22:25

TallJimbo approved these changes May 14, 2020

View reviewed changes

Use transaction around config saving to rollback on exceptions.

1d63fef

andy-slac merged commit 8b60ee7 into master May 14, 2020

andy-slac deleted the tickets/DM-24797 branch September 11, 2020 21:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-24797: Store per-run information (configs, software versions) in butler repo #51

DM-24797: Store per-run information (configs, software versions) in butler repo #51

andy-slac commented May 13, 2020

TallJimbo left a comment

TallJimbo May 14, 2020

andy-slac May 14, 2020

andy-slac May 14, 2020

TallJimbo May 14, 2020

andy-slac May 14, 2020 •

edited

DM-24797: Store per-run information (configs, software versions) in butler repo #51

DM-24797: Store per-run information (configs, software versions) in butler repo #51

Conversation

andy-slac commented May 13, 2020

TallJimbo left a comment

Choose a reason for hiding this comment

TallJimbo May 14, 2020

Choose a reason for hiding this comment

andy-slac May 14, 2020

Choose a reason for hiding this comment

andy-slac May 14, 2020

Choose a reason for hiding this comment

TallJimbo May 14, 2020

Choose a reason for hiding this comment

andy-slac May 14, 2020 • edited

Choose a reason for hiding this comment

andy-slac May 14, 2020 •

edited