New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-37609: Resurrect prototype client/server and add tests #774
Conversation
Codecov ReportBase: 84.89% // Head: 85.57% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #774 +/- ##
==========================================
+ Coverage 84.89% 85.57% +0.68%
==========================================
Files 267 269 +2
Lines 35052 35305 +253
Branches 5990 6015 +25
==========================================
+ Hits 29758 30214 +456
+ Misses 4009 3772 -237
- Partials 1285 1319 +34
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
The expression is now a path component rather than query param.
* dimension universe * return butler config * query datasets types now properly handles expressions
This significantly improves performance.
This means that the service runs at /butler
Use the same method we use when copying other mappings. Deepcopy causes problems if we try to attach objects to a config after reading the config (for example in client/server testing).
In some edge cases a field that is normally a string will be replaced by an object that does not support BUTLER_ROOT. Catch the type error.
This works around the problem of hardcoding the butler repo location (which needs to be fixed in the future) for testing.
Testing needs to be expanded but this demonstrates that testing can be done.
98b18dd
to
6da6ff6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, few comments. I'm not very familiar yet with httpx/fastapi, will need to learn more. My biggest worry now is server-side state and caching of the registry data in memory, it's clear that we need to think about consistency, probably disable all caching in registry. A related issue is potential caching of responses by proxy servers (if we ever get to that), some of our GET requests are not idempotent, so we may need to disable caching for those too.
pass | ||
# How do we know which server should be refreshed? | ||
# Should there be caches in the client? | ||
response = self._client.put(self._get_url("v1/registry/refresh")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it feels a bit odd that server-side refresh should depend on one of the clients asking for it explicitly. 🤔 There could be multiple server instances too, and which one of them receives this request is not known to this client. Maybe the client should only refresh its own caches, and assume that server-side state is always current (I guess that probably means we'd have to disable caching in the Registry on server side).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my comment in the code there was meant to be me worrying a lot about refresh but in this test code I really need refresh to run in the server because I inject the collections after the initial cached butler is created. You make a later comment on that.
|
||
BUTLER_ROOT = "ci_hsc_gen3/DATA" | ||
|
||
log = logging.getLogger("excalibur") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚔️ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two years ago I was having fun with branding and I was fitting in with the SQuaRE naming scheme involving King Arthur names.
# refreshed? How do we know the server we are refreshing is used later? | ||
# For testing at the moment it is important if a test adds a dataset type | ||
# directly in the server since the test client will not see it. | ||
butler.registry.refresh() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it refresh both read-only and writable butlers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, but I'm having an existential crisis over the refreshing in general. As you say in the overall comment, caching the dataset types and collections in the long-lived server butlers is going to cause a lot of problems since we never really know when something has changed in the database backend that would need a refresh. Globally disabling the cache wold be safest but might be a performance disaster when 10,000 people connect. cc/ @TallJimbo
This is not going to be a production method but is there as a demonstration. To make it work we had to implement refresh in the server but the long term usefulness of this needs thought. What happens if a dataset type or collection is added to the server butler? How do the cached butler objects in the server know they all need to update?
It's likely not necessary since this is a simple GET that returns a dict, but playing it safe won't harm anything.
They will all fail anyhow if FastAPI is not installed.
dfe001a
to
f5ed3d4
Compare
Checklist
doc/changes