New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-38779: Use graph DatasetRef in execution butler #326
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #326 +/- ##
==========================================
- Coverage 82.21% 82.18% -0.04%
==========================================
Files 60 60
Lines 6713 6718 +5
Branches 1370 1374 +4
==========================================
+ Hits 5519 5521 +2
- Misses 919 921 +2
- Partials 275 276 +1
☔ View full report in Codecov by Sentry. |
1018702
to
8d93e1c
Compare
@andy-slac / @TallJimbo I think we need to work out what to do with this PR and the related one in lsst/ctrl_mpexec#234. I think the change in execution butler to use the graph DatasetRef is fairly uncontroversial so maybe we do this for now and leave the second half of the PR. The controversial part is that I now force |
@@ -339,7 +340,7 @@ def _setupNewButler( | |||
def _import( | |||
yamlBuffer: io.StringIO, | |||
newButler: Butler, | |||
inserts: DataSetTypeMap, | |||
inserts: DataSetTypeRefMap, | |||
run: Optional[str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer use this parameter and buildExecutionButler
also needs to be documented to ignore the run
parameter. We now require that it's all matching the graph.
Would it be reasonable option for now to execute |
Where? We can't do it inside execution butler creation because then we would get a ID mismatch if you try to run the pipeline with that graph and that execution butler. I think we have to assume that |
I was not thinking about execution butler, but rather about individual users trying to re-run their favorite graph with different output runs. |
That "controversial" change sounds great to me; I think it's a big step towards having SingleQuantumExecutor and ButlerQuantumContext only use the LimitedButler interface without any special-casing for full Butler. But I bet we have to fix datasetExists to get all the way there, at least for SQE. |
f16f1f1
to
64eed8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
0815fb1
to
58f1812
Compare
This attempts to ensure we have provenance agreement between the graph and the execution butler, since the graph now always has resolved refs.
A quantum graph always uses resolved refs and for provenance we now always want to use that ref. Execution butler now records datasets that match those in the graph so we can also now disable the use of the special "put of predefined dataset" flag. There are two caveats here: * The code now assumes that the butler already knows about the dataset. Butler.put will not try to insert into registry. * Can runQuantum be called with refs that butler knows nothing about. There is the question of whether we can make this change until unresolved refs are removed rather than deprecated. In theory Butler.put() could be modified such that if the dataset does not exist in registry (we do check for this) we fall into the other code patch and try to add insert into registry first.
30d62f6
to
82e228e
Compare
Unresolved refs are no longer allowed and the quantum should always include a resolved ref now.
This attempts to ensure we have provenance agreement between the graph and the execution butler.
Checklist
doc/changes