New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-42934: Run existing unit tests against RemoteButler #961
Conversation
5ac533b
to
cbb532b
Compare
Pull Butler server test setup out to a separate function, to allow it to be re-used in tests outside of test_server.py.
DirectButler throws a FileNotFoundError for missing datasets in get(), not the LookupError that RemoteButler was throwing.
DirectButler throws if get() is given both a DatasetRef and DataId -- RemoteButler now does so as well for consistency.
cbb532b
to
6736bdb
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #961 +/- ##
==========================================
+ Coverage 88.39% 88.41% +0.02%
==========================================
Files 307 309 +2
Lines 39508 39690 +182
Branches 8316 8331 +15
==========================================
+ Hits 34922 35091 +169
- Misses 3373 3385 +12
- Partials 1213 1214 +1 ☔ View full report in Codecov by Sentry. |
ec32860
to
19bdddc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. I really like the hybrid butler approach -- makes it very clear what works with remote server.
_remote_butler: RemoteButler | ||
_direct_butler: Butler | ||
|
||
def __new__(cls, remote_butler: RemoteButler, direct_butler: Butler) -> "HybridButler": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def __new__(cls, remote_butler: RemoteButler, direct_butler: Butler) -> "HybridButler": | |
def __new__(cls, remote_butler: RemoteButler, direct_butler: Butler) -> Self: |
Does typing.Self
work here? (maybe also need from __future__ import annotations
to remove the quotes?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self doesn't for some reason... it causes mypy to throw "expected Self but got HybridButler".
from __future__ import annotations
works though, thanks -- I could not for the life of me figure out why this exact same pattern worked in DirectButler and RemoteButler but not in this file.
storageClass: str | StorageClass | None = None, | ||
**kwargs: Any, | ||
) -> DeferredDatasetHandle: | ||
return self._direct_butler.getDeferred( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getDeferred
in theory is simple to implement in remote butler since it is a deferred get and get works already. The minor complication is that the direct butler implementation does a check to make sure the get is likely to work (calling datastore at the moment but it should really call self.exists()
and self.stored()
) before it returns the deferred handle (this is so that you don't spend hours doing some processing only to find that the get fails). This might be something to add fairly soon (not on this ticket) along with exists
and stored
.
run: str | None = None, | ||
inferDefaults: bool = True, | ||
**kwargs: Any, | ||
) -> Butler: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Self
better here?
tests/test_butler.py
Outdated
) -> tuple[Butler, DatasetType]: | ||
"""Create a Butler for the test repository and insert some test data | ||
into it. | ||
""" | ||
butler = Butler.from_config(self.tmpConfigFile, run=run) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this call self.create_empty_butler()
here? Or is there some restriction on this having to be a direct butler? If it can't call it, maybe add a comment to that effect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that makes sense and will let me get rid of the other method override below. I didn't discover the need for create_empty_butler
until later in the process so I didn't think about it.
tests/test_butler.py
Outdated
butler = Butler.from_config(self.tmpConfigFile, run=run) | ||
assert isinstance(butler, DirectButler), "Expect DirectButler in configuration" | ||
|
||
collections = set(butler.registry.queryCollections()) | ||
self.assertEqual(collections, {run}) | ||
|
||
return self._setup_butler_data(butler, storageClass, datasetTypeName) | ||
|
||
def create_empty_butler(self, run: str | None = None, writeable: bool | None = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def create_empty_butler(self, run: str | None = None, writeable: bool | None = None): | |
def create_empty_butler(self, run: str | None = None, writeable: bool | None = None) -> Butler: |
self.assertIsInstance(uri, ResourcePath) | ||
self.assertIn("424", str(uri), f"Checking visit is in URI {uri}") | ||
self.assertEqual(uri.fragment, "predicted", f"Checking for fragment in {uri}") | ||
if self.predictionSupported: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a temporary restriction on remote butler or a permanent one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not currently implemented. I don't really understand the use cases for that feature so I'm not sure whether it makes sense to implement in the future or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's a tricky one. People wanted to know what the path would be if they were to store the dataset. In client/server this becomes a bit odder since you have to decide what it comes back as since returning a signed URL to something that doesn't exist seems odd.
We have discussed in the past whether getURI should have the option of returning the server-side URI (s3://bucket/path_in_store/file.fits) rather than the client-side signed URL -- presumably this is assumed to be a bad idea because the bucket name shouldn't leak to an untrusted user? Maybe for predicted URIs we could return something like butler://datastore_name/path_in_store/file.fits#predicted
so that people could see what the final filename template would resolve to. I'm not sure how many users care about this at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this could become a DirectButler
-only interface? It certainly won't be the last one; we might even want to go ahead and move the entire write interface from Butler
to DirectButler
at some point.
tests/test_butler.py
Outdated
def testStringification(self) -> None: | ||
# RemoteButler does not have datastore/registry strings like | ||
# DirectButler does. | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a bespoke test here that checks that RemoteButler stringifies into something reasonable? Maybe with the URL of the server?
Add a "HybridButler" class that uses RemoteButler when possible, and DirectButler for methods not yet implemented by RemoteButler. Use HybridButler to run parts of the main Butler test suite against RemoteButler.
When running unit tests in Docker, the tests and the main library code are not in the same folder. Explicitly specify the test directory when creating the server so it is able to locate its configuration file.
Add a way to optionally skip "predict mode" tests, so we can run the rest of the composite tests against Butler server.
19bdddc
to
2e9104e
Compare
Added a new Butler subclass HybridButler that delegates methods to RemoteButler where implemented, and uses DirectButler for the rest. Extracted the test server setup from test_server.py into a reusable function. Modified test_butler.py to run tests against Butler server via HybridButler.
This revealed a few minor incompatibilities between RemoteButler and DirectButler, so updated RemoteButler to conform with the DirectButler behavior.
Checklist
doc/changes