Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-37609: Resurrect prototype client/server and add tests #774

Merged
merged 51 commits into from Jan 25, 2023
Merged

Conversation

timj
Copy link
Member

@timj timj commented Jan 18, 2023

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@codecov
Copy link

codecov bot commented Jan 18, 2023

Codecov Report

Base: 84.89% // Head: 85.57% // Increases project coverage by +0.68% 🎉

Coverage data is based on head (31dac07) compared to base (0c6899d).
Patch coverage: 87.01% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #774      +/-   ##
==========================================
+ Coverage   84.89%   85.57%   +0.68%     
==========================================
  Files         267      269       +2     
  Lines       35052    35305     +253     
  Branches     5990     6015      +25     
==========================================
+ Hits        29758    30214     +456     
+ Misses       4009     3772     -237     
- Partials     1285     1319      +34     
Impacted Files Coverage Δ
python/lsst/daf/butler/server.py 82.97% <82.97%> (ø)
python/lsst/daf/butler/registries/remote.py 69.43% <85.45%> (+69.43%) ⬆️
tests/test_server.py 93.82% <93.82%> (ø)
python/lsst/daf/butler/core/config.py 92.11% <100.00%> (+0.20%) ⬆️
python/lsst/daf/butler/core/repoRelocation.py 95.23% <100.00%> (+0.79%) ⬆️
python/lsst/daf/butler/core/serverModels.py 78.57% <100.00%> (+78.57%) ⬆️
python/lsst/daf/butler/registries/sql.py 84.05% <0.00%> (+0.19%) ⬆️
python/lsst/daf/butler/registry/wildcards.py 54.58% <0.00%> (+0.48%) ⬆️
...hon/lsst/daf/butler/core/dimensions/_coordinate.py 86.01% <0.00%> (+0.59%) ⬆️
... and 5 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

The expression is now a path component rather than query param.
* dimension universe
* return butler config
* query datasets types now properly handles expressions
This significantly improves performance.
This means that the service runs at /butler
Use the same method we use when copying other mappings.
Deepcopy causes problems if we try to attach objects to a config
after reading the config (for example in client/server testing).
In some edge cases a field that is normally a string will
be replaced by an object that does not support BUTLER_ROOT.
Catch the type error.
This works around the problem of hardcoding the butler repo location
(which needs to be fixed in the future) for testing.
Testing needs to be expanded but this demonstrates that testing
can be done.
@timj timj marked this pull request as ready for review January 23, 2023 22:12
Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, few comments. I'm not very familiar yet with httpx/fastapi, will need to learn more. My biggest worry now is server-side state and caching of the registry data in memory, it's clear that we need to think about consistency, probably disable all caching in registry. A related issue is potential caching of responses by proxy servers (if we ever get to that), some of our GET requests are not idempotent, so we may need to disable caching for those too.

python/lsst/daf/butler/core/repoRelocation.py Show resolved Hide resolved
pass
# How do we know which server should be refreshed?
# Should there be caches in the client?
response = self._client.put(self._get_url("v1/registry/refresh"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it feels a bit odd that server-side refresh should depend on one of the clients asking for it explicitly. 🤔 There could be multiple server instances too, and which one of them receives this request is not known to this client. Maybe the client should only refresh its own caches, and assume that server-side state is always current (I guess that probably means we'd have to disable caching in the Registry on server side).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my comment in the code there was meant to be me worrying a lot about refresh but in this test code I really need refresh to run in the server because I inject the collections after the initial cached butler is created. You make a later comment on that.

python/lsst/daf/butler/registries/remote.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/registries/remote.py Outdated Show resolved Hide resolved

BUTLER_ROOT = "ci_hsc_gen3/DATA"

log = logging.getLogger("excalibur")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚔️ ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two years ago I was having fun with branding and I was fitting in with the SQuaRE naming scheme involving King Arthur names.

python/lsst/daf/butler/server.py Outdated Show resolved Hide resolved
# refreshed? How do we know the server we are refreshing is used later?
# For testing at the moment it is important if a test adds a dataset type
# directly in the server since the test client will not see it.
butler.registry.refresh()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it refresh both read-only and writable butlers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, but I'm having an existential crisis over the refreshing in general. As you say in the overall comment, caching the dataset types and collections in the long-lived server butlers is going to cause a lot of problems since we never really know when something has changed in the database backend that would need a refresh. Globally disabling the cache wold be safest but might be a performance disaster when 10,000 people connect. cc/ @TallJimbo

python/lsst/daf/butler/server.py Show resolved Hide resolved
python/lsst/daf/butler/tests/utils.py Outdated Show resolved Hide resolved
tests/test_server.py Outdated Show resolved Hide resolved
This is not going to be a production method but is there
as a demonstration. To make it work we had to implement
refresh in the server but the long term usefulness of this
needs thought. What happens if a dataset type or collection
is added to the server butler? How do the cached butler objects
in the server know they all need to update?
It's likely not necessary since this is a simple GET that
returns a dict, but playing it safe won't harm anything.
They will all fail anyhow if FastAPI is not installed.
@timj timj merged commit 64184da into main Jan 25, 2023
@timj timj deleted the tickets/DM-37609 branch January 25, 2023 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants