Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-40392: Add methods to update run collection name in a quantum #882

Merged
merged 1 commit into from Aug 19, 2023

Conversation

andy-slac
Copy link
Contributor

@andy-slac andy-slac commented Aug 17, 2023

QuantumGraph.updateRun() method needs a better implementation and these new methods in DatasetRef and Quantum classes are for supporting that implementation.

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@timj
Copy link
Member

timj commented Aug 17, 2023

The ruff problem comes from ruff 0.285 with astral-sh/ruff#6465

@@ -706,11 +706,48 @@ def overrideStorageClass(self, storageClass: str | StorageClass) -> DatasetRef:
A new dataset reference that is the same as the current one but
with a different storage class in the `DatasetType`.
"""
return self.update(storage_class=storageClass)

def update(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe replace() to match the name used in dataclasses? update doesn't immediately suggest that a new version is being returned. I called the equivalent ResourcePath.replace for that reason.

@@ -190,7 +190,7 @@ def direct(
# This method requires tuples as values of the mapping, but JSON
# readers will read things in as lists. Be kind and transparently
# transform to tuples
_recItems = {k: v if type(v) != list else tuple(v) for k, v in record.items()} # type: ignore
_recItems = {k: v if not isinstance(v, list) else tuple(v) for k, v in record.items()} # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we deliberately used type() here for speed since we know it can only be a list coming from JSON and we need this to be fast. We don't want to worry about subclasses of list here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest ruff was complaining about it, should I just suppress this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why don't we always use tuple(v)? tuple(v) will return v if it's already a tuple.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. That might be a better answer. Let the tuple() constructor deal with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, not all items are supposed to be tuples there, sorry for the noise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we only want lists to migrate. I think adding a noqa will work but you'll also have to adjust pyproject.toml to stop ruff complaining about unused noqa -- the rubin-env ruff used in Jenkins will not trigger an error and will instead get upset by the unused noqa.

@codecov
Copy link

codecov bot commented Aug 18, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (60b7293) 87.70% compared to head (cda19fe) 87.71%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #882   +/-   ##
=======================================
  Coverage   87.70%   87.71%           
=======================================
  Files         274      274           
  Lines       36195    36219   +24     
  Branches     7575     7578    +3     
=======================================
+ Hits        31746    31770   +24     
  Misses       3272     3272           
  Partials     1177     1177           
Files Changed Coverage Δ
python/lsst/daf/butler/core/datasets/ref.py 80.86% <100.00%> (+0.77%) ⬆️
tests/test_datasets.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add trivial test for update_output_run and update_input_run.

If these two methods have to be called in exactly the right way and solely from QuantumGraph.updateRun maybe they should be private methods in the sense that middleware sometimes knows it can call private methods in different packages but we don't want them public? Else it begs the question why isn't there a quantum.update_run that combines these two methods?

Returns
-------
modified : `DatasetRef`
A new dataset reference with updated attributes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all 3 parameters are None do you want to return self instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could but I do not want an additional test for all None, I don't think it's going to used with no parameters.

refs: Iterable[DatasetRef], run: str, dataset_id_map: Mapping[DatasetId, DatasetId]
) -> Iterator[DatasetRef]:
for ref in refs:
if dataset_id := dataset_id_map.get(ref.id):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment that this is for intermediates only and if it's not already in dataset_id_map the ref won't be modified?

@andy-slac
Copy link
Contributor Author

I agree that those two new methods in Quantum should be considered private. Maybe I need to move both of them to Quantumgraph as that is the only place that needs them, let me try that.

@andy-slac andy-slac force-pushed the tickets/DM-40392 branch 2 times, most recently from 89362da to a900423 Compare August 18, 2023 20:37
@andy-slac
Copy link
Contributor Author

I moved that private code to QuantumGraph.updateRun, does not look too terrible.

QuantumGraph.updateRun() method needs a better implementation and
these new methods in DatasetRef and Quantum classes are for supporting
that implementation.
@andy-slac andy-slac merged commit c732061 into main Aug 19, 2023
16 checks passed
@andy-slac andy-slac deleted the tickets/DM-40392 branch August 19, 2023 02:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants