Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-38210: Make butler.get act like getDirect and deprecate getDirect #797

Merged
merged 8 commits into from Apr 3, 2023

Conversation

timj
Copy link
Member

@timj timj commented Mar 3, 2023

We now want a resolved DatasetRef to work with butler.get without doing registry lookup. Also change the LimitedButler to add a .get(). .getDirect and deferred form are now deprecated.

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@timj
Copy link
Member Author

timj commented Mar 3, 2023

@TallJimbo can you take a quick look at this to see if it matches your expectations? Obviously a lot of the checks for .id is not None will go away when the other half of the deprecation is done but it seems like getting this deprecation for getDirect out the way early would be a good idea. I need to fix the other lsst_distrib packages.

@codecov
Copy link

codecov bot commented Mar 3, 2023

Codecov Report

Patch coverage: 83.33% and project coverage change: +0.04 🎉

Comparison is base (77f0486) 87.64% compared to head (30e075b) 87.69%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #797      +/-   ##
==========================================
+ Coverage   87.64%   87.69%   +0.04%     
==========================================
  Files         267      267              
  Lines       34706    34700       -6     
  Branches     7294     7299       +5     
==========================================
+ Hits        30417    30429      +12     
+ Misses       3136     3117      -19     
- Partials     1153     1154       +1     
Impacted Files Coverage Δ
python/lsst/daf/butler/script/removeCollections.py 100.00% <ø> (ø)
python/lsst/daf/butler/script/removeRuns.py 100.00% <ø> (ø)
python/lsst/daf/butler/_butler.py 78.53% <61.90%> (+3.00%) ⬆️
python/lsst/daf/butler/_limited_butler.py 79.59% <68.75%> (-11.59%) ⬇️
python/lsst/daf/butler/_quantum_backed.py 89.07% <80.00%> (-0.71%) ⬇️
python/lsst/daf/butler/_deferredDatasetHandle.py 90.00% <100.00%> (ø)
tests/test_butler.py 97.62% <100.00%> (ø)
tests/test_logFormatter.py 100.00% <100.00%> (ø)
tests/test_quantumBackedButler.py 99.06% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in terms of behavior, but I'd like to handle the signatures in LimitedButler slightly differently.

datasetExistsDirect should probably also go away, and the getURI[s] are probably worth a look to see if they need modifications (my recollection is that they already have behavior consistent with the new get behavior, but I'm not sure).

python/lsst/daf/butler/_limited_butler.py Outdated Show resolved Hide resolved
@timj
Copy link
Member Author

timj commented Mar 3, 2023

the getURI[s] are probably worth a look

getURI uses _findDatasetRef so will now work directly with a resolved ref.

datasetExistsDirect should probably also go away

This begs the question of whether we are "fixing" datasetExists such that it only returns a boolean rather than raising if registry doesn't know about the dataset. In DM-32940 you suggest we change that to return an enum that can evaluate as a boolean. Then datasetExistsDirect could go away. We have known to registry, known to datastore, actually exists on "disk" as options but only the last one is True.

I assume putDirect should disappear as well.

@TallJimbo
Copy link
Member

👍 to removing putDirect.

What to do about datasetExists is trickier. I tried an enum on the DMTN-249 prototyping and while I think it might have worked fine for full Butler alone, it didn't make sense for LimitedButler where there's no concept of registry existence. And if you have something that checks for datastore existence only in LimitedButler, that doesn't make sense in full Butler, where Datastore-only existence checks are designed to almost never be necessary.

So I think our best bet is to give the LimitedButler datastore-only method a somewhat obscure, very precise name, and reserve exists or datastoreExists as a full-Butler method that returns an enum, but we might need to experiment on that a bit, or gather feedback from users.

@timj timj force-pushed the tickets/DM-38210 branch 2 times, most recently from 01a7aaa to 18c5c46 Compare March 24, 2023 23:37
@timj timj force-pushed the tickets/DM-38210 branch 2 times, most recently from 38e9196 to aa7e0f1 Compare March 29, 2023 22:12
@timj timj marked this pull request as ready for review March 30, 2023 17:51
@timj timj force-pushed the tickets/DM-38210 branch 2 times, most recently from d5f484e to f97bad1 Compare March 30, 2023 19:55
returned by this method. By default the returned type matches
the dataset type definition for this dataset. Specifying a
read `StorageClass` can force a different type to be returned.
This type must be compatible with the original type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not have a storageClass argument here, since the caller can override the registry storage class via ref.datasetType.storageClass already and I think that makes it more confusing than convenient (especially since right now LimitedButler mostly exists just for task execution and that doesn't use this parameter at all). I think that argument is even more applicable to Datastore.get, since that really isn't in need of convenience functionality since only butler should call it, but that may be out of scope for this ticket.

To be clear, I do think it's worth having a storageClass parameter to full Butler.get, since there it's very convenient when not passing a DatasetRef, and then since it'll present anyway it might as well further override what's in the ref when a DatasetRef is passed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main argument for keeping it is that it's forwarded directly to datastore and datastore will use it and treat it as the priority. It seems odd to declare that the base class can't support the parameter even when it can.

python/lsst/daf/butler/_butler.py Outdated Show resolved Hide resolved
We now want a resolved DatasetRef to work with butler.get
without doing registry lookup. Also change the LimitedButler
to add a .get(). .getDirect and deferred form are now deprecated.
Now that registry is not checked by default with
_findDatasetRef.
Butler.get is allowed to define more flexible interface that is
consistent with the base class, but the base class no longer
has to support registry queries.
This has long been replaced with removeRuns and removeCollection
and we should have deleted it in DM-37534.
Co-authored-by: Jim Bosch <jbosch@astro.princeton.edu>
@timj timj merged commit 476b978 into main Apr 3, 2023
12 of 13 checks passed
@timj timj deleted the tickets/DM-38210 branch April 3, 2023 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants