DM-27154: Usability improvement suggestions for butler collection commands #431

n8pease · 2020-11-10T19:19:30Z

No description provided.

timj

Mainly comments on test infrastructure changes but also a comment on the new exception class.

timj · 2020-11-10T19:58:59Z

python/lsst/daf/butler/tests/utils.py

+
+        Returns
+        -------
+        `str`


I'm expecting this to return a ButlerURI.

Also this is the wrong doc syntax for a return value. I'm not completely convinced that this deserves a method of its own.

timj · 2020-11-10T20:02:56Z

python/lsst/daf/butler/tests/utils.py

+            butler = self.butler
+        butler.put(metric, ref)
+
+    def addDatasetType(self, dimensions, datasetTypeName, storageClass):


I don't understand why this is here. It's identical to lsst.daf.butler.tests.addDatasetType isn't it?

timj · 2020-11-10T20:06:30Z

python/lsst/daf/butler/tests/utils.py

+            butler = Butler(self.root, run=run)
+        else:
+            butler = self.butler
+        butler.put(metric, ref)


Probably should return the DatasetRef returned by this put.

I think this whole routine can be something like:

def addDataset(...): metric = self._makeExampleMetrics() return self.butler.put(metric, datasetType if datasetType is not None else self.datasetType, dataId=dataId, run=run)

timj · 2020-11-10T20:07:06Z

python/lsst/daf/butler/tests/utils.py

+            datasetType = self.datasetType
+        ref = DatasetRef(datasetType, dataId, id=None)
+        metric = self._makeExampleMetrics()
+        if run:


This isn't needed. You can say self.butler.put(metric, ref, run=run) below.

It fails when I call this and the run doesn't exist yet.

It would be clearer then if you had a line here that created the run collection rather than without commentary relying on Butler to create it behind the scenes.

maybe it's weird if/because I'm thinking about it wrong though...

Without a comment here it all seems superfluous. Whereas:

if run: self.butler.registry.registerCollection(run, type=CollectionType.RUN)

is explicit about what you need. @TallJimbo do you prefer the "create a new butler for each run" paradigm? Maybe that's better for the general user. If that is the case I'd like a comment to explain why you are having to create a new butler.

I see. That makes sense, works.

Outside of tests, users are never going to create new runs or even call put themselves (it'll all be operations like ingest or pipetask doing that). So I think we can go with whatever maximizes test code readability. I don't have a super strong preference; the "new butler for each run" pattern is maybe a bit more familiar to those coming from Gen2, but I agree that it's less explicit.

timj · 2020-11-10T20:07:41Z

python/lsst/daf/butler/tests/utils.py

+        """
+        if not datasetType:
+            datasetType = self.datasetType
+        ref = DatasetRef(datasetType, dataId, id=None)


This is not necessary. You do not need to create a ref for the put.

timj · 2020-11-11T20:51:44Z

tests/test_cliCmdQueryCollections.py

            butlerCfg = Butler.makeRepo("here")
            # the purpose of this call is to create some collections
-            _ = Butler(butlerCfg, run=run, tags=[tag], collections=[tag])
+            _ = Butler(butlerCfg, run=run, tags=[tag], collections=[tag], writeable=True)


Don't need the _ =

timj · 2020-11-11T20:52:54Z

tests/test_cliCmdQueryCollections.py

+    def testChained(self):
+        with self.runner.isolated_filesystem():
+
+            # Create a butler and add some chained collections


...and replace the datastore with a mock

timj · 2020-11-11T21:40:28Z

tests/test_cliCmdQueryCollections.py

+
+            # Create a butler and add some chained collections
+            butlerCfg = Butler.makeRepo("here")
+            with unittest.mock.patch.object(Datastore, "fromConfig", spec=Datastore.fromConfig):


I find the with a bit confusing here since it implies that something is freed or reversed when the block exits. This seems to be how it works though.

I didn't totally understand it (it's cargo culted from something @TallJimbo wrote), so I did more research. The deal is that Datastore.fromConfig gets patched: replaced with a function that returns a magicmock. The butler initializer calls Datastore.fromConfig, which is a MagicMock instance and calling it returns a new MagicMock instance, so now the butler's self.datastore is a magicmock. Init finishes, and export and get are monkey patched onto the butler.datastore magic mock. Then the with block exits, which does restore Datastore.fromConfig to its normal function, but butler.datastore is of course not changed because how would it and anyway we don't want it to be - it's why we did all the above anyway.

However, all that seems kind of complicated and it seems like it's not necessary to patch Datastore.fromConfig - the butler init works just fine with it not patched. We do run into problems when we call butler1.import_(filename=os.path.join(TESTDIR, "data", "registry", "base.yaml")) without patching, but before making that call we can simply replace the datastore functions that need replacing, without any context manager. Doing this seems to be enough:

butler1.datastore.export = self._mock_export butler1.datastore.get = self._mock_get butler1.datastore.ingest = MagicMock()

(note that instead of datastore be a MagicMock with export and get monkey patched on, I've replaced those two with our alternative impls, and made ingest a mock because it will get called in the course of calling butler.ingest_, but does not seem to need to do anything.)

@TallJimbo if there's a reason to use the patch context manager, please let me know? In the meantime I'll change it to the simpler version I just described and you and @timj can 👍 or 👎 that impl if y'all want.

(hmm, it seems to need the context manager in test_simpleButler. working on understanding why...)

aha. The butlers were getting created slightly differently, the one that worked was being init'd with the config returned by Butler.makeRepo, and the one that did not was building a config itself, and needed config["root"] to be set.

This was the first time I ever really used unittest.mock (other than small changes to tests others had written), so if you find a simpler way to accomplish what my code did, assume I just wasn't aware of that alternative.

timj · 2020-11-11T21:41:45Z

tests/test_cliCmdQueryCollections.py

+                              formatter="lsst.daf.butler.formatters.json.JsonFormatter")
+
+    @staticmethod
+    def _mock_get(ref: DatasetRef, parameters: Optional[Mapping[str, Any]] = None


Rather than copying this from test_simpleButler.py, please put the mocks in a DatastoreMock base class in the tests hierarchy.

timj · 2020-11-11T21:48:30Z

python/lsst/daf/butler/_butler.py

        collectionType = self.registry.getCollectionType(name)
+        if purge and not unstore:
+            raise PruneCollectionsArgsError(PruneCollectionsArgsError.Reason.PURGE_WITHOUT_UNSTORE,


I'm fine with a new subclass for specificity but couldn't we also make three different exceptions? You could have a new base class of PruneCollectionsArgsError and then have three subclasses of that such as PurgeWithoutUnstoreCollectionsError etc. They could have the error message burned in and you wouldn't need the enum here and you wouldn't need to check the reason later on -- you could simply catch the specific exception and replace it with a new error message suitable for click users.

Adds an execption type (inherits from TypeError) for the ways arguments to Butler.pruneCollections can fail. This allows the CLI script function to catch the exception, know the reason for the failure, and format a message to report the error in a way that will make sense on the command line. If this seems like an acceptible solution to reviewers this pattern may be used elsewhere to improve CLI error reporting as needed.

Adds various ways of formatting the collections.

Puts duplicated test repo code for the various CLI tests in a shared location & adds some API for modifying the repo.

Numpy recently added a deprecation warning for "ragged" arrays; in this case it was ragged because of a type mismatch; CollectionSearch vs str. Making the CollectionSearch a string in the list comprehension prevents us from encountering the warning.

@patch

removes use of @patch, and just replaces mocked functions.

n8pease force-pushed the tickets/DM-27154 branch 2 times, most recently from 28170eb to 2efb461 Compare November 10, 2020 19:22

timj approved these changes Nov 11, 2020

View reviewed changes

n8pease added 8 commits November 13, 2020 08:54

alphabatize shared arguments

ff087d2

make collection in prune-collection an argument

e74bca9

add --chains option to query-collections

f6cab84

Adds various ways of formatting the collections.

move readTables to tests.utils

ad52f00

add execution test for prune-collecion

4a4808a

make a test repo manager

f714b55

Puts duplicated test repo code for the various CLI tests in a shared location & adds some API for modifying the repo.

make collection chain a string early

13475bb

Numpy recently added a deprecation warning for "ragged" arrays; in this case it was ragged because of a type mismatch; CollectionSearch vs str. Making the CollectionSearch a string in the list comprehension prevents us from encountering the warning.

n8pease force-pushed the tickets/DM-27154 branch from 2efb461 to 8d2c58b Compare November 16, 2020 19:58

deduplciate mock code

5e9389d

removes use of @patch, and just replaces mocked functions.

n8pease force-pushed the tickets/DM-27154 branch from 8d2c58b to 5e9389d Compare November 16, 2020 20:11

n8pease merged commit 6353d3e into master Nov 17, 2020

timj deleted the tickets/DM-27154 branch February 16, 2024 17:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-27154: Usability improvement suggestions for butler collection commands #431

DM-27154: Usability improvement suggestions for butler collection commands #431

n8pease commented Nov 10, 2020

timj left a comment

timj Nov 10, 2020

timj Nov 10, 2020

timj Nov 10, 2020

timj Nov 10, 2020

timj Nov 10, 2020

n8pease Nov 13, 2020

timj Nov 13, 2020

n8pease Nov 13, 2020

timj Nov 13, 2020 •

edited

n8pease Nov 13, 2020

TallJimbo Nov 13, 2020

timj Nov 10, 2020

timj Nov 11, 2020

timj Nov 11, 2020

timj Nov 11, 2020

n8pease Nov 14, 2020 •

edited

n8pease Nov 14, 2020 •

edited

n8pease Nov 14, 2020

n8pease Nov 14, 2020

n8pease Nov 14, 2020

TallJimbo Nov 14, 2020

timj Nov 11, 2020

timj Nov 11, 2020

DM-27154: Usability improvement suggestions for butler collection commands #431

DM-27154: Usability improvement suggestions for butler collection commands #431

Conversation

n8pease commented Nov 10, 2020

timj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timj Nov 13, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

n8pease Nov 14, 2020 • edited

Choose a reason for hiding this comment

n8pease Nov 14, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timj Nov 13, 2020 •

edited

n8pease Nov 14, 2020 •

edited

n8pease Nov 14, 2020 •

edited