Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-39198: fix storage class handling in datastore, as used by QuantumBackedButler #838

Merged
merged 3 commits into from May 17, 2023

Conversation

TallJimbo
Copy link
Member

@TallJimbo TallJimbo commented May 16, 2023

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@codecov
Copy link

codecov bot commented May 16, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (5b85eaa) 87.73% compared to head (a4daa98) 87.73%.

❗ Current head a4daa98 differs from pull request most recent head 412971c. Consider uploading reports for the commit 412971c to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #838   +/-   ##
=======================================
  Coverage   87.73%   87.73%           
=======================================
  Files         268      268           
  Lines       35352    35354    +2     
  Branches     7440     7442    +2     
=======================================
+ Hits        31016    31018    +2     
  Misses       3169     3169           
  Partials     1167     1167           
Impacted Files Coverage Δ
python/lsst/daf/butler/datastores/fileDatastore.py 82.35% <100.00%> (+0.03%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a while to work out that this is needed because the graph has no datastore records and so has to determine the file name every time.

doc/changes/DM-39198.bugfix.md Outdated Show resolved Hide resolved
# Make a mapping from refs with the internal storage class to the given
# refs that may have a different one. We'll use the internal refs
# throughout this method and convert back at the very end.
internal_ref_to_input_ref = {self._cast_storage_class(ref): ref for ref in refs}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed because the graph doesn't include datastore records and so we are in "trustGetRequest" mode in QBB execution and so we have to determine the output file name and therefore need the correct storage class to find the correct template/file extension?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the particular QBB context, yes, and I think in execution butler too (which is weird, but maybe historical and irrelevant). But I can't claim I've made sense of it all; it seems like sometimes we're expecting butler to guarantee that certain refs passed to Datastore already use the registry storage class, and in other cases we're expecting Datastore to fix them itself via the callback passed to set_retrieve_dataset_type_method. What I've tried to do here is just be super defensive so the Datastore can handle what's thrown at it, and I bet that means we've got a few code paths where we ensure we've got the registry storage class on the same ref multiple times.

I really think the right fix is to make Butler.datastore private (which means giving ctrl_mpexec an alternative to calling butler.datastore.mexists) and then rework the Datastore API so it always gets registry storage classes in one argument and overrides in another argument, with the butler (either full or QBB) making sure those method are called correctly. It'd also help a lot to modify Datastore APIs that really only need the dataset ID or ID + component (not the full ref) to take just that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't used to matter because this was always getting refs that came from Registry because graphs used unresolved refs and butler.put used unresolved refs. Now that everything is resolved and there is no guarantee Registry even knows about it things are a bit more exciting.

@TallJimbo TallJimbo merged commit 98bc225 into main May 17, 2023
13 checks passed
@TallJimbo TallJimbo deleted the tickets/DM-39198 branch May 17, 2023 03:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants