DM-42934: Run existing unit tests against RemoteButler #961

dhirving · 2024-02-16T16:23:15Z

Added a new Butler subclass HybridButler that delegates methods to RemoteButler where implemented, and uses DirectButler for the rest. Extracted the test server setup from test_server.py into a reusable function. Modified test_butler.py to run tests against Butler server via HybridButler.

This revealed a few minor incompatibilities between RemoteButler and DirectButler, so updated RemoteButler to conform with the DirectButler behavior.

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes

Pull Butler server test setup out to a separate function, to allow it to be re-used in tests outside of test_server.py.

DirectButler throws a FileNotFoundError for missing datasets in get(), not the LookupError that RemoteButler was throwing.

DirectButler throws if get() is given both a DatasetRef and DataId -- RemoteButler now does so as well for consistency.

codecov · 2024-02-16T17:34:18Z

Codecov Report

Attention: 20 lines in your changes are missing coverage. Please review.

Comparison is base (3e35ddf) 88.39% compared to head (2e9104e) 88.41%.

Files	Patch %	Lines
python/lsst/daf/butler/tests/hybrid_butler.py	85.56%	14 Missing ⚠️
tests/test_butler.py	95.77%	2 Missing and 1 partial ⚠️
python/lsst/daf/butler/tests/server.py	96.49%	2 Missing ⚠️
tests/test_server.py	95.23%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #961      +/-   ##
==========================================
+ Coverage   88.39%   88.41%   +0.02%     
==========================================
  Files         307      309       +2     
  Lines       39508    39690     +182     
  Branches     8316     8331      +15     
==========================================
+ Hits        34922    35091     +169     
- Misses       3373     3385      +12     
- Partials     1213     1214       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

timj

Looks great. I really like the hybrid butler approach -- makes it very clear what works with remote server.

timj · 2024-02-16T21:00:26Z

python/lsst/daf/butler/tests/hybrid_butler.py

+    _remote_butler: RemoteButler
+    _direct_butler: Butler
+
+    def __new__(cls, remote_butler: RemoteButler, direct_butler: Butler) -> "HybridButler":


Suggested change

def __new__(cls, remote_butler: RemoteButler, direct_butler: Butler) -> "HybridButler":

def __new__(cls, remote_butler: RemoteButler, direct_butler: Butler) -> Self:

Does typing.Self work here? (maybe also need from __future__ import annotations to remove the quotes?)

Self doesn't for some reason... it causes mypy to throw "expected Self but got HybridButler".

from __future__ import annotations works though, thanks -- I could not for the life of me figure out why this exact same pattern worked in DirectButler and RemoteButler but not in this file.

timj · 2024-02-16T21:07:58Z

python/lsst/daf/butler/tests/hybrid_butler.py

+        storageClass: str | StorageClass | None = None,
+        **kwargs: Any,
+    ) -> DeferredDatasetHandle:
+        return self._direct_butler.getDeferred(


getDeferred in theory is simple to implement in remote butler since it is a deferred get and get works already. The minor complication is that the direct butler implementation does a check to make sure the get is likely to work (calling datastore at the moment but it should really call self.exists() and self.stored()) before it returns the deferred handle (this is so that you don't spend hours doing some processing only to find that the get fails). This might be something to add fairly soon (not on this ticket) along with exists and stored.

timj · 2024-02-16T21:10:51Z

python/lsst/daf/butler/tests/hybrid_butler.py

+        run: str | None = None,
+        inferDefaults: bool = True,
+        **kwargs: Any,
+    ) -> Butler:


Is Self better here?

timj · 2024-02-16T21:21:37Z

tests/test_butler.py

+    ) -> tuple[Butler, DatasetType]:
+        """Create a Butler for the test repository and insert some test data
+        into it.
+        """
        butler = Butler.from_config(self.tmpConfigFile, run=run)


Can this call self.create_empty_butler() here? Or is there some restriction on this having to be a direct butler? If it can't call it, maybe add a comment to that effect?

Yeah that makes sense and will let me get rid of the other method override below. I didn't discover the need for create_empty_butler until later in the process so I didn't think about it.

timj · 2024-02-16T21:22:35Z

tests/test_butler.py

        butler = Butler.from_config(self.tmpConfigFile, run=run)
        assert isinstance(butler, DirectButler), "Expect DirectButler in configuration"

        collections = set(butler.registry.queryCollections())
        self.assertEqual(collections, {run})

+        return self._setup_butler_data(butler, storageClass, datasetTypeName)
+
+    def create_empty_butler(self, run: str | None = None, writeable: bool | None = None):


Suggested change

def create_empty_butler(self, run: str | None = None, writeable: bool | None = None):

def create_empty_butler(self, run: str | None = None, writeable: bool | None = None) -> Butler:

timj · 2024-02-16T21:24:10Z

tests/test_butler.py

-        self.assertIsInstance(uri, ResourcePath)
-        self.assertIn("424", str(uri), f"Checking visit is in URI {uri}")
-        self.assertEqual(uri.fragment, "predicted", f"Checking for fragment in {uri}")
+        if self.predictionSupported:


Is this a temporary restriction on remote butler or a permanent one?

It's not currently implemented. I don't really understand the use cases for that feature so I'm not sure whether it makes sense to implement in the future or not.

Yes, it's a tricky one. People wanted to know what the path would be if they were to store the dataset. In client/server this becomes a bit odder since you have to decide what it comes back as since returning a signed URL to something that doesn't exist seems odd.

We have discussed in the past whether getURI should have the option of returning the server-side URI (s3://bucket/path_in_store/file.fits) rather than the client-side signed URL -- presumably this is assumed to be a bad idea because the bucket name shouldn't leak to an untrusted user? Maybe for predicted URIs we could return something like butler://datastore_name/path_in_store/file.fits#predicted so that people could see what the final filename template would resolve to. I'm not sure how many users care about this at this point.

Maybe this could become a DirectButler-only interface? It certainly won't be the last one; we might even want to go ahead and move the entire write interface from Butler to DirectButler at some point.

timj · 2024-02-16T21:27:01Z

tests/test_butler.py

+    def testStringification(self) -> None:
+        # RemoteButler does not have datastore/registry strings like
+        # DirectButler does.
+        pass


Can we have a bespoke test here that checks that RemoteButler stringifies into something reasonable? Maybe with the URL of the server?

Add a "HybridButler" class that uses RemoteButler when possible, and DirectButler for methods not yet implemented by RemoteButler. Use HybridButler to run parts of the main Butler test suite against RemoteButler.

When running unit tests in Docker, the tests and the main library code are not in the same folder. Explicitly specify the test directory when creating the server so it is able to locate its configuration file.

Add a way to optionally skip "predict mode" tests, so we can run the rest of the composite tests against Butler server.

Implement RemoteButler._clone

e197f91

dhirving force-pushed the tickets/DM-42934 branch 2 times, most recently from 5ac533b to cbb532b Compare February 16, 2024 16:49

dhirving added 3 commits February 16, 2024 10:23

Factor out test server setup

89c6bd3

Pull Butler server test setup out to a separate function, to allow it to be re-used in tests outside of test_server.py.

Make get() error consistent with DirectButler

f55b21b

DirectButler throws a FileNotFoundError for missing datasets in get(), not the LookupError that RemoteButler was throwing.

Add check for consistency with DirectButler

e1645d0

DirectButler throws if get() is given both a DatasetRef and DataId -- RemoteButler now does so as well for consistency.

dhirving force-pushed the tickets/DM-42934 branch from cbb532b to 6736bdb Compare February 16, 2024 17:23

dhirving force-pushed the tickets/DM-42934 branch from ec32860 to 19bdddc Compare February 16, 2024 18:22

dhirving marked this pull request as ready for review February 16, 2024 20:54

timj approved these changes Feb 16, 2024

View reviewed changes

dhirving added 4 commits February 19, 2024 09:41

Run main Butler tests against RemoteButler

4c539e6

Add a "HybridButler" class that uses RemoteButler when possible, and DirectButler for methods not yet implemented by RemoteButler. Use HybridButler to run parts of the main Butler test suite against RemoteButler.

Fix Docker unit tests

9c8c4f7

When running unit tests in Docker, the tests and the main library code are not in the same folder. Explicitly specify the test directory when creating the server so it is able to locate its configuration file.

Add back composite tests for Butler server

c7aad0b

Add a way to optionally skip "predict mode" tests, so we can run the rest of the composite tests against Butler server.

Implement stringification for RemoteButler

2e9104e

dhirving force-pushed the tickets/DM-42934 branch from 19bdddc to 2e9104e Compare February 19, 2024 16:42

dhirving merged commit 81146d7 into main Feb 19, 2024
18 checks passed

dhirving deleted the tickets/DM-42934 branch February 19, 2024 19:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-42934: Run existing unit tests against RemoteButler #961

DM-42934: Run existing unit tests against RemoteButler #961

dhirving commented Feb 16, 2024 •

edited

codecov bot commented Feb 16, 2024 •

edited

timj left a comment

timj Feb 16, 2024

dhirving Feb 16, 2024

timj Feb 16, 2024

timj Feb 16, 2024

timj Feb 16, 2024

dhirving Feb 16, 2024

timj Feb 16, 2024

timj Feb 16, 2024

dhirving Feb 19, 2024

timj Feb 19, 2024

TallJimbo Feb 20, 2024

timj Feb 16, 2024

	def __new__(cls, remote_butler: RemoteButler, direct_butler: Butler) -> "HybridButler":
	def __new__(cls, remote_butler: RemoteButler, direct_butler: Butler) -> Self:

	def create_empty_butler(self, run: str \| None = None, writeable: bool \| None = None):
	def create_empty_butler(self, run: str \| None = None, writeable: bool \| None = None) -> Butler:

DM-42934: Run existing unit tests against RemoteButler #961

DM-42934: Run existing unit tests against RemoteButler #961

Conversation

dhirving commented Feb 16, 2024 • edited

Checklist

codecov bot commented Feb 16, 2024 • edited

Codecov Report

timj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhirving commented Feb 16, 2024 •

edited

codecov bot commented Feb 16, 2024 •

edited