New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

DM-40392: Re-implement QuantumGraph.updateRun method #369

Merged

andy-slac merged 2 commits into main from tickets/DM-40392

Aug 19, 2023

Contributor

andy-slac commented Aug 17, 2023 •

edited by timj

This commit fixes bug in updateRun which did not update dataset
IDs of the references after changing their run collection.

Depends on lsst/daf_butler#882

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes

codecov bot commented Aug 17, 2023 •

edited

Codecov Report

Patch coverage: 96.22% and project coverage change: +0.02% 🎉

Comparison is base (33ad370) 83.44% compared to head (bfe6f61) 83.46%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #369      +/-   ##
==========================================
+ Coverage   83.44%   83.46%   +0.02%     
==========================================
  Files          77       77              
  Lines        9173     9212      +39     
  Branches     1768     1782      +14     
==========================================
+ Hits         7654     7689      +35     
- Misses       1231     1233       +2     
- Partials      288      290       +2

Files Changed	Coverage Δ
python/lsst/pipe/base/graph/quantumNode.py	`83.60% <50.00%> (-2.36%)`	⬇️
python/lsst/pipe/base/graph/graph.py	`84.74% <100.00%> (+0.41%)`	⬆️
python/lsst/pipe/base/tests/util.py	`100.00% <100.00%> (ø)`
tests/test_quantumGraph.py	`97.36% <100.00%> (-0.56%)`	⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

andy-slac force-pushed the tickets/DM-40392 branch 3 times, most recently from 3d77341 to 1db8a18 Compare

August 18, 2023 16:24

timj mentioned this pull request

DM-40392: Update unit test after fixing QuantumGraph.updateRun lsst/ctrl_mpexec#261

Merged

2 tasks

timj approved these changes

View reviewed changes

python/lsst/pipe/base/graph/graph.py

		@@ -1229,32 +1236,44 @@ def updateRun(self, run: str, *, metadata_key: str \| None = None, update_graph_i
		update_graph_id : `bool`, optional
		If `True` then also update graph ID with a new unique value.

Member

timj Aug 18, 2023

I've been meaning to ask in what scenario you want to update all the dataset refs but not update the graph ID...

Contributor Author

andy-slac Aug 18, 2023

I have no idea, for now there is an option for pipetask update-graph-run and I'm not sure if this is used or not.

python/lsst/pipe/base/graph/graph.py Outdated

-                      def _update_refs_in_place(refs: list[DatasetRef], run: str) -> None:
-                          """Update list of `~lsst.daf.butler.DatasetRef` with new run and
-                          dataset IDs.
+                      def _update_output_ref(ref: DatasetRef, run: str) -> DatasetRef:

Member

timj Aug 18, 2023

Is it more efficient in python to have this take Iterable(DatasetRef) and return a list (like it did before) rather than have the function called N times? It's always called in a list comprehension (a quick benchmark seems to show me that it takes half the time if you don't call the function repeatedly even if calling list.append).

Contributor Author

andy-slac Aug 18, 2023

I'll change that, I thought there was a context when it was called on a single ref, but it's indeed only lists of refs here.

python/lsst/pipe/base/graph/graph.py

+                      def _update_input_ref(ref: DatasetRef, run: str) -> DatasetRef:
+                          """Update `~lsst.daf.butler.DatasetRef` with new run and dataset
+                          ID.
+                          """

Member

timj Aug 18, 2023

Maybe add a comment explaining that it only returns an updated ref if the ref is listed as an output elsewhere in the graph?

python/lsst/pipe/base/graph/graph.py Outdated

+                          ID.
+                          """
+                          if dataset_id := dataset_id_map.get(ref.id):
+                              ref = ref.replace(run=run, id=dataset_id)

Member

timj Aug 18, 2023

Shouldn't this ref be identical to the other ref in the other part of the graph? Why do we need to create a new ref? They are immutable. Can't dataset_id_map point to the ref itself or is there a memory concern and we are trying to minimize that so we don't have to carry around all the outputs twice?

Contributor Author

andy-slac Aug 18, 2023

I think the reason was that refs may have different storage classes so they are not exactly identical.

Member

TallJimbo Aug 18, 2023

Yes, different storage classes, and also sometimes some of them are components.

python/lsst/pipe/base/graph/graph.py Outdated

-                      for refs in self._initOutputRefs.values():
-                          _update_refs_in_place(refs, run)
+                      # Loop through all outputs and update their dataset refs.

Member

timj Aug 18, 2023

This loop isn't updating the dataset refs is it?

Contributor Author

andy-slac Aug 18, 2023

I'll rephrase that.

andy-slac force-pushed the tickets/DM-40392 branch from 1db8a18 to 30a45de Compare

August 18, 2023 17:40

andy-slac added 2 commits

August 18, 2023 13:33


          Re-implement QuantumGraph.updateRun method (DM-40392)

ed06b6a

This commit fixes bug in updateRun which did not update dataset
IDs of the references after changing their run collection.


          Add news fragment

bfe6f61

andy-slac force-pushed the tickets/DM-40392 branch 2 times, most recently from 967862d to bfe6f61 Compare

August 19, 2023 02:37

andy-slac merged commit 870df52 into main

14 checks passed

andy-slac deleted the tickets/DM-40392 branch

August 19, 2023 02:40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment