DM-39198: fix storage class handling in datastore, as used by QuantumBackedButler #838

TallJimbo · 2023-05-16T18:04:50Z

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes

codecov · 2023-05-16T18:31:31Z

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (5b85eaa) 87.73% compared to head (a4daa98) 87.73%.

❗ Current head a4daa98 differs from pull request most recent head 412971c. Consider uploading reports for the commit 412971c to get more accurate results

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #838   +/-   ##
=======================================
  Coverage   87.73%   87.73%           
=======================================
  Files         268      268           
  Lines       35352    35354    +2     
  Branches     7440     7442    +2     
=======================================
+ Hits        31016    31018    +2     
  Misses       3169     3169           
  Partials     1167     1167

Impacted Files	Coverage Δ
python/lsst/daf/butler/datastores/fileDatastore.py	`82.35% <100.00%> (+0.03%)`	⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

timj

Took me a while to work out that this is needed because the graph has no datastore records and so has to determine the file name every time.

doc/changes/DM-39198.bugfix.md

timj · 2023-05-16T19:53:52Z

python/lsst/daf/butler/datastores/fileDatastore.py

+        # Make a mapping from refs with the internal storage class to the given
+        # refs that may have a different one.  We'll use the internal refs
+        # throughout this method and convert back at the very end.
+        internal_ref_to_input_ref = {self._cast_storage_class(ref): ref for ref in refs}


This is needed because the graph doesn't include datastore records and so we are in "trustGetRequest" mode in QBB execution and so we have to determine the output file name and therefore need the correct storage class to find the correct template/file extension?

In the particular QBB context, yes, and I think in execution butler too (which is weird, but maybe historical and irrelevant). But I can't claim I've made sense of it all; it seems like sometimes we're expecting butler to guarantee that certain refs passed to Datastore already use the registry storage class, and in other cases we're expecting Datastore to fix them itself via the callback passed to set_retrieve_dataset_type_method. What I've tried to do here is just be super defensive so the Datastore can handle what's thrown at it, and I bet that means we've got a few code paths where we ensure we've got the registry storage class on the same ref multiple times.

I really think the right fix is to make Butler.datastore private (which means giving ctrl_mpexec an alternative to calling butler.datastore.mexists) and then rework the Datastore API so it always gets registry storage classes in one argument and overrides in another argument, with the butler (either full or QBB) making sure those method are called correctly. It'd also help a lot to modify Datastore APIs that really only need the dataset ID or ID + component (not the full ref) to take just that.

It didn't used to matter because this was always getting refs that came from Registry because graphs used unresolved refs and butler.put used unresolved refs. Now that everything is resolved and there is no guarantee Registry even knows about it things are a bit more exciting.

TallJimbo added 2 commits May 16, 2023 13:49

Handle storage class conversions in QuantumBackedButler.put.

548367b

Handle storage class conversions in FileDatastore.mexists.

02e9f46

timj approved these changes May 16, 2023

View reviewed changes

Add changelog entry.

412971c

TallJimbo force-pushed the tickets/DM-39198 branch from a4daa98 to 412971c Compare May 17, 2023 02:26

TallJimbo merged commit 98bc225 into main May 17, 2023
13 checks passed

TallJimbo deleted the tickets/DM-39198 branch May 17, 2023 03:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-39198: fix storage class handling in datastore, as used by QuantumBackedButler #838

DM-39198: fix storage class handling in datastore, as used by QuantumBackedButler #838

TallJimbo commented May 16, 2023 •

edited

codecov bot commented May 16, 2023 •

edited

timj left a comment

timj May 16, 2023

TallJimbo May 16, 2023

timj May 16, 2023

DM-39198: fix storage class handling in datastore, as used by QuantumBackedButler #838

DM-39198: fix storage class handling in datastore, as used by QuantumBackedButler #838

Conversation

TallJimbo commented May 16, 2023 • edited

Checklist

codecov bot commented May 16, 2023 • edited

Codecov Report

timj left a comment

Choose a reason for hiding this comment

timj May 16, 2023

Choose a reason for hiding this comment

TallJimbo May 16, 2023

Choose a reason for hiding this comment

timj May 16, 2023

Choose a reason for hiding this comment

TallJimbo commented May 16, 2023 •

edited

codecov bot commented May 16, 2023 •

edited