Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-38779: Use graph DatasetRef in execution butler #326

Merged
merged 6 commits into from May 4, 2023
Merged

Conversation

timj
Copy link
Member

@timj timj commented Apr 29, 2023

This attempts to ensure we have provenance agreement between the graph and the execution butler.

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@codecov
Copy link

codecov bot commented Apr 29, 2023

Codecov Report

Patch coverage: 81.81% and project coverage change: -0.04 ⚠️

Comparison is base (cecac69) 82.21% compared to head (27a8167) 82.18%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #326      +/-   ##
==========================================
- Coverage   82.21%   82.18%   -0.04%     
==========================================
  Files          60       60              
  Lines        6713     6718       +5     
  Branches     1370     1374       +4     
==========================================
+ Hits         5519     5521       +2     
- Misses        919      921       +2     
- Partials      275      276       +1     
Impacted Files Coverage Δ
python/lsst/pipe/base/testUtils.py 90.66% <71.42%> (-1.86%) ⬇️
python/lsst/pipe/base/tests/simpleQGraph.py 70.55% <75.00%> (-0.08%) ⬇️
python/lsst/pipe/base/executionButlerBuilder.py 65.77% <85.71%> (-1.13%) ⬇️
python/lsst/pipe/base/butlerQuantumContext.py 81.13% <100.00%> (-0.85%) ⬇️

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@timj timj force-pushed the tickets/DM-38779 branch 4 times, most recently from 1018702 to 8d93e1c Compare May 1, 2023 21:14
@timj
Copy link
Member Author

timj commented May 1, 2023

@andy-slac / @TallJimbo I think we need to work out what to do with this PR and the related one in lsst/ctrl_mpexec#234.

I think the change in execution butler to use the graph DatasetRef is fairly uncontroversial so maybe we do this for now and leave the second half of the PR.

The controversial part is that I now force ButlerQuantumContext.put to use the graph dataset ref (effectively becoming a putDirect). (I also make SingleQuantumExecutor do this as well). This allows us to remove the flag in execution butler that requires we check registry for a predefined dataset. What concerns me is that we might run into problems with people calling runQuantum repeatedly and tweaking the butler run. I don't know if anyone does that except maybe in test code. pipelines_check and ctrl_mpexec tests pass.

@@ -339,7 +340,7 @@ def _setupNewButler(
def _import(
yamlBuffer: io.StringIO,
newButler: Butler,
inserts: DataSetTypeMap,
inserts: DataSetTypeRefMap,
run: Optional[str],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer use this parameter and buildExecutionButler also needs to be documented to ignore the run parameter. We now require that it's all matching the graph.

@andy-slac
Copy link
Contributor

Would it be reasonable option for now to execute graph.updateRun() if people provide an explicit output run that is different from graph run (and give plenty of warnings maybe)?

@timj
Copy link
Member Author

timj commented May 1, 2023

Would it be reasonable option for now to execute graph.updateRun()

Where? We can't do it inside execution butler creation because then we would get a ID mismatch if you try to run the pipeline with that graph and that execution butler. I think we have to assume that buildExecutionButler is effectively an internal API that is only called by makeGraph rather than try to deal with people using the API directly and wanting the run to be inconsistent. I think the best thing is for me to check that run matched the graph run and raise if they don't match.

@timj timj closed this May 1, 2023
@timj timj reopened this May 1, 2023
@andy-slac
Copy link
Contributor

I was not thinking about execution butler, but rather about individual users trying to re-run their favorite graph with different output runs.

@TallJimbo
Copy link
Member

That "controversial" change sounds great to me; I think it's a big step towards having SingleQuantumExecutor and ButlerQuantumContext only use the LimitedButler interface without any special-casing for full Butler.

But I bet we have to fix datasetExists to get all the way there, at least for SQE.

@timj timj force-pushed the tickets/DM-38779 branch 2 times, most recently from f16f1f1 to 64eed8f Compare May 3, 2023 15:03
@timj timj requested a review from andy-slac May 3, 2023 21:59
Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

python/lsst/pipe/base/_dataset_handle.py Outdated Show resolved Hide resolved
python/lsst/pipe/base/executionButlerBuilder.py Outdated Show resolved Hide resolved
@timj timj force-pushed the tickets/DM-38779 branch 2 times, most recently from 0815fb1 to 58f1812 Compare May 4, 2023 19:10
timj added 2 commits May 4, 2023 12:13
This attempts to ensure we have provenance agreement between
the graph and the execution butler, since the graph now always
has resolved refs.
A quantum graph always uses resolved refs and for provenance
we now always want to use that ref. Execution butler now records
datasets that match those in the graph so we can also now disable
the use of the special "put of predefined dataset" flag.

There are two caveats here:

* The code now assumes that the butler already knows about the
  dataset. Butler.put will not try to insert into registry.
* Can runQuantum be called with refs that butler knows nothing
  about.

There is the question of whether we can make this change until
unresolved refs are removed rather than deprecated.

In theory Butler.put() could be modified such that if the
dataset does not exist in registry (we do check for this)
we fall into the other code patch and try to add insert into
registry first.
@timj timj force-pushed the tickets/DM-38779 branch 2 times, most recently from 30d62f6 to 82e228e Compare May 4, 2023 19:20
@timj timj merged commit 1cb0dfd into main May 4, 2023
8 of 10 checks passed
@timj timj deleted the tickets/DM-38779 branch May 4, 2023 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants