Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-40392: Re-implement QuantumGraph.updateRun method #369

Merged
merged 2 commits into from Aug 19, 2023

Conversation

andy-slac
Copy link
Contributor

@andy-slac andy-slac commented Aug 17, 2023

This commit fixes bug in updateRun which did not update dataset
IDs of the references after changing their run collection.

Depends on lsst/daf_butler#882

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@codecov
Copy link

codecov bot commented Aug 17, 2023

Codecov Report

Patch coverage: 96.22% and project coverage change: +0.02% 🎉

Comparison is base (33ad370) 83.44% compared to head (bfe6f61) 83.46%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #369      +/-   ##
==========================================
+ Coverage   83.44%   83.46%   +0.02%     
==========================================
  Files          77       77              
  Lines        9173     9212      +39     
  Branches     1768     1782      +14     
==========================================
+ Hits         7654     7689      +35     
- Misses       1231     1233       +2     
- Partials      288      290       +2     
Files Changed Coverage Δ
python/lsst/pipe/base/graph/quantumNode.py 83.60% <50.00%> (-2.36%) ⬇️
python/lsst/pipe/base/graph/graph.py 84.74% <100.00%> (+0.41%) ⬆️
python/lsst/pipe/base/tests/util.py 100.00% <100.00%> (ø)
tests/test_quantumGraph.py 97.36% <100.00%> (-0.56%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -1229,32 +1236,44 @@ def updateRun(self, run: str, *, metadata_key: str | None = None, update_graph_i
update_graph_id : `bool`, optional
If `True` then also update graph ID with a new unique value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been meaning to ask in what scenario you want to update all the dataset refs but not update the graph ID...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea, for now there is an option for pipetask update-graph-run and I'm not sure if this is used or not.

def _update_refs_in_place(refs: list[DatasetRef], run: str) -> None:
"""Update list of `~lsst.daf.butler.DatasetRef` with new run and
dataset IDs.
def _update_output_ref(ref: DatasetRef, run: str) -> DatasetRef:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it more efficient in python to have this take Iterable(DatasetRef) and return a list (like it did before) rather than have the function called N times? It's always called in a list comprehension (a quick benchmark seems to show me that it takes half the time if you don't call the function repeatedly even if calling list.append).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change that, I thought there was a context when it was called on a single ref, but it's indeed only lists of refs here.

def _update_input_ref(ref: DatasetRef, run: str) -> DatasetRef:
"""Update `~lsst.daf.butler.DatasetRef` with new run and dataset
ID.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment explaining that it only returns an updated ref if the ref is listed as an output elsewhere in the graph?

ID.
"""
if dataset_id := dataset_id_map.get(ref.id):
ref = ref.replace(run=run, id=dataset_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this ref be identical to the other ref in the other part of the graph? Why do we need to create a new ref? They are immutable. Can't dataset_id_map point to the ref itself or is there a memory concern and we are trying to minimize that so we don't have to carry around all the outputs twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason was that refs may have different storage classes so they are not exactly identical.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, different storage classes, and also sometimes some of them are components.


for refs in self._initOutputRefs.values():
_update_refs_in_place(refs, run)
# Loop through all outputs and update their dataset refs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop isn't updating the dataset refs is it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rephrase that.

This commit fixes bug in updateRun which did not update dataset
IDs of the references after changing their run collection.
@andy-slac andy-slac force-pushed the tickets/DM-40392 branch 2 times, most recently from 967862d to bfe6f61 Compare August 19, 2023 02:37
@andy-slac andy-slac merged commit 870df52 into main Aug 19, 2023
14 checks passed
@andy-slac andy-slac deleted the tickets/DM-40392 branch August 19, 2023 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants