Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-41761: Add Butler.query property #914

Merged
merged 4 commits into from Dec 7, 2023
Merged

DM-41761: Add Butler.query property #914

merged 4 commits into from Dec 7, 2023

Conversation

andy-slac
Copy link
Contributor

This new property defines an interface for making complex queries
on repository database. This initial commit includes only three
convenience methods that are a close match to the query methods that
already exist in Registry. The implementation of that interface
for DirectButler just forwards calls to SqlRegistry.

RemoteButler does not implement those methods yet, we will need
to define representation for queries and results that client exchanges
server. This should become clearer as we add more methods to Query.

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

Copy link

codecov bot commented Nov 30, 2023

Codecov Report

Attention: 23 lines in your changes are missing coverage. Please review.

Comparison is base (3243903) 87.24% compared to head (0e06fc3) 87.50%.

Files Patch % Lines
python/lsst/daf/butler/direct_query_results.py 90.00% 10 Missing and 1 partial ⚠️
python/lsst/daf/butler/tests/butler_query.py 99.10% 1 Missing and 4 partials ⚠️
python/lsst/daf/butler/direct_butler.py 91.66% 2 Missing and 2 partials ⚠️
tests/test_butler_query.py 97.29% 2 Missing ⚠️
python/lsst/daf/butler/_exceptions.py 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #914      +/-   ##
==========================================
+ Coverage   87.24%   87.50%   +0.25%     
==========================================
  Files         286      292       +6     
  Lines       37168    38067     +899     
  Branches     7833     8062     +229     
==========================================
+ Hits        32428    33310     +882     
- Misses       3539     3553      +14     
- Partials     1201     1204       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only about halfway through and will have more comments tomorrow, but I figured it might be useful to publish some big-picture thoughts now.

python/lsst/daf/butler/_query.py Show resolved Hide resolved
python/lsst/daf/butler/_query.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/_query.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/_query.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/_query_results.py Show resolved Hide resolved
Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some potential (minor, probably preexisting) issues with which test lines are actually raising our expected exceptions, but otherwise I've got nothing much to say for the second of half of this review.

I've hit "Accepted" because I think everything I'd like to consider changing can be deferred to another ticket if we're marking the Butler.query result types as experimental somehow. But I would like to avoid committing long-term to supporting some aspects of the old result-type interface via Butler.query.

python/lsst/daf/butler/_query_results.py Show resolved Hide resolved
python/lsst/daf/butler/tests/butler_query.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/tests/butler_query.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/tests/butler_query.py Outdated Show resolved Hide resolved
@andy-slac andy-slac force-pushed the tickets/DM-41761 branch 3 times, most recently from 0a1c9b9 to f8954bf Compare December 4, 2023 22:32
Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to make Butler.query private (i.e. Butler._query) to avoid committing to the Query class APIs (and the APIs of its helper classes); I just think we probably want to keep all of these APIs private until we're moderately far along writing the RemoteButler versions, as I could see unexpected difficulties there shaking things up.

I think the Butler convenience methods are probably closest to being ready for public consumption - the fact that those do not return objects capable of follow-up queries or lazy iteration makes me pretty confident RemoteButler won't throw a wrench in their works, but it's still conceivable we'd want to make small adjustments in the future to make them more similar to corresponding advanced-interface methods that do need to be adjusted due to RemoteButler considerations.

But while I want to be very cautious about rolling this out because I think we might want to change thigns, I don't actually see anything here that differs enough from how I'd imagined doing it for me to have recommendations to change more now, so I think we put up that last "privacy wall" and proceed.

"""The components of the parent dataset type included in these results
(`~collections.abc.Sequence` [ `str` or `None` ]).
"""
raise NotImplementedError()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can drop this from the new API. If its equivalent in the old API wasn't deprecated, I think it should have been, as RFC-879 included deprecating all support for components in query APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registry.queryDatasets's component parameter is documented as deprecated, I guess that makes that attribute deprecated too. I'll drop it in the new API and I can explicitly document it as deprecated in old API.

@@ -1357,3 +1363,275 @@ def registry(self) -> Registry:
will be replaced by equivalent `Butler` methods.
"""
raise NotImplementedError()

@abstractmethod
def query(self) -> AbstractContextManager[Query]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this private for now, too? While I'm not super worried about this method changing in backwards-compatible ways (it might get additional arguments, but they could be default), I do think we will want to tinker with the Query class APIs further as we actually implement more kinds of chaining, and it's easier to put the wall here than on every single method there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I should also remove news fragment as we are hiding new API from users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep it in with the underscores and when I make the release notes I can decide what to do with it.

@andy-slac andy-slac force-pushed the tickets/DM-41761 branch 3 times, most recently from 034ae8d to f90d27d Compare December 6, 2023 19:29
This new method defines an interface for making complex queries
on repository database. This initial commit includes only three
convenience methods that are a close match  to the query methods that
already exist in Registry. The implementation of that interface
for DirectButler just forwards calls to SqlRegistry.

RemoteButler does not implement those methods yet, we will need
to define representation for queries and results that client exchanges
server. This should become clearer as we add more methods to `Query`.
As we do not have non-UUID manager anymore, there is no point
keeping two dataset.yaml files.
Many tests were just copied from `RegistryTests` to `ButlerQueryTests`
replacing calls to registry methods with matching calls to `Butler.query`
methods. Tests are implemented for DirectButler only, for both SQLite
and Postgres backends.
@andy-slac andy-slac merged commit d41daf1 into main Dec 7, 2023
17 checks passed
@andy-slac andy-slac deleted the tickets/DM-41761 branch December 7, 2023 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants