New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-27154: Usability improvement suggestions for butler collection commands #431
Conversation
28170eb
to
2efb461
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly comments on test infrastructure changes but also a comment on the new exception class.
|
||
Returns | ||
------- | ||
`str` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm expecting this to return a ButlerURI.
Also this is the wrong doc syntax for a return value. I'm not completely convinced that this deserves a method of its own.
butler = self.butler | ||
butler.put(metric, ref) | ||
|
||
def addDatasetType(self, dimensions, datasetTypeName, storageClass): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this is here. It's identical to lsst.daf.butler.tests.addDatasetType
isn't it?
butler = Butler(self.root, run=run) | ||
else: | ||
butler = self.butler | ||
butler.put(metric, ref) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should return the DatasetRef returned by this put
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this whole routine can be something like:
def addDataset(...):
metric = self._makeExampleMetrics()
return self.butler.put(metric, datasetType if datasetType is not None else self.datasetType,
dataId=dataId, run=run)
datasetType = self.datasetType | ||
ref = DatasetRef(datasetType, dataId, id=None) | ||
metric = self._makeExampleMetrics() | ||
if run: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't needed. You can say self.butler.put(metric, ref, run=run)
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It fails when I call this and the run doesn't exist yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be clearer then if you had a line here that created the run collection rather than without commentary relying on Butler to create it behind the scenes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it's weird if/because I'm thinking about it wrong though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without a comment here it all seems superfluous. Whereas:
if run:
self.butler.registry.registerCollection(run, type=CollectionType.RUN)
is explicit about what you need. @TallJimbo do you prefer the "create a new butler for each run" paradigm? Maybe that's better for the general user. If that is the case I'd like a comment to explain why you are having to create a new butler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. That makes sense, works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outside of tests, users are never going to create new runs or even call put
themselves (it'll all be operations like ingest or pipetask
doing that). So I think we can go with whatever maximizes test code readability. I don't have a super strong preference; the "new butler for each run" pattern is maybe a bit more familiar to those coming from Gen2, but I agree that it's less explicit.
""" | ||
if not datasetType: | ||
datasetType = self.datasetType | ||
ref = DatasetRef(datasetType, dataId, id=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary. You do not need to create a ref for the put.
tests/test_cliCmdQueryCollections.py
Outdated
butlerCfg = Butler.makeRepo("here") | ||
# the purpose of this call is to create some collections | ||
_ = Butler(butlerCfg, run=run, tags=[tag], collections=[tag]) | ||
_ = Butler(butlerCfg, run=run, tags=[tag], collections=[tag], writeable=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need the _ =
tests/test_cliCmdQueryCollections.py
Outdated
def testChained(self): | ||
with self.runner.isolated_filesystem(): | ||
|
||
# Create a butler and add some chained collections |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...and replace the datastore with a mock
tests/test_cliCmdQueryCollections.py
Outdated
|
||
# Create a butler and add some chained collections | ||
butlerCfg = Butler.makeRepo("here") | ||
with unittest.mock.patch.object(Datastore, "fromConfig", spec=Datastore.fromConfig): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the with
a bit confusing here since it implies that something is freed or reversed when the block exits. This seems to be how it works though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't totally understand it (it's cargo culted from something @TallJimbo wrote), so I did more research. The deal is that Datastore.fromConfig
gets patched: replaced with a function that returns a magicmock
. The butler initializer calls Datastore.fromConfig
, which is a MagicMock
instance and calling it returns a new MagicMock
instance, so now the butler's self.datastore
is a magicmock. Init finishes, and export
and get
are monkey patched onto the butler.datastore
magic mock. Then the with
block exits, which does restore Datastore.fromConfig
to its normal function, but butler.datastore
is of course not changed because how would it and anyway we don't want it to be - it's why we did all the above anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, all that seems kind of complicated and it seems like it's not necessary to patch Datastore.fromConfig
- the butler init works just fine with it not patched. We do run into problems when we call butler1.import_(filename=os.path.join(TESTDIR, "data", "registry", "base.yaml"))
without patching, but before making that call we can simply replace the datastore
functions that need replacing, without any context manager. Doing this seems to be enough:
butler1.datastore.export = self._mock_export
butler1.datastore.get = self._mock_get
butler1.datastore.ingest = MagicMock()
(note that instead of datastore
be a MagicMock with export
and get
monkey patched on, I've replaced those two with our alternative impls, and made ingest
a mock because it will get called in the course of calling butler.ingest_
, but does not seem to need to do anything.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TallJimbo if there's a reason to use the patch context manager, please let me know? In the meantime I'll change it to the simpler version I just described and you and @timj can 👍 or 👎 that impl if y'all want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(hmm, it seems to need the context manager in test_simpleButler. working on understanding why...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aha. The butlers were getting created slightly differently, the one that worked was being init'd with the config returned by Butler.makeRepo
, and the one that did not was building a config itself, and needed config["root"]
to be set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the first time I ever really used unittest.mock
(other than small changes to tests others had written), so if you find a simpler way to accomplish what my code did, assume I just wasn't aware of that alternative.
tests/test_cliCmdQueryCollections.py
Outdated
formatter="lsst.daf.butler.formatters.json.JsonFormatter") | ||
|
||
@staticmethod | ||
def _mock_get(ref: DatasetRef, parameters: Optional[Mapping[str, Any]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than copying this from test_simpleButler.py, please put the mocks in a DatastoreMock base class in the tests hierarchy.
python/lsst/daf/butler/_butler.py
Outdated
collectionType = self.registry.getCollectionType(name) | ||
if purge and not unstore: | ||
raise PruneCollectionsArgsError(PruneCollectionsArgsError.Reason.PURGE_WITHOUT_UNSTORE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with a new subclass for specificity but couldn't we also make three different exceptions? You could have a new base class of PruneCollectionsArgsError and then have three subclasses of that such as PurgeWithoutUnstoreCollectionsError etc. They could have the error message burned in and you wouldn't need the enum here and you wouldn't need to check the reason later on -- you could simply catch the specific exception and replace it with a new error message suitable for click users.
Adds an execption type (inherits from TypeError) for the ways arguments to Butler.pruneCollections can fail. This allows the CLI script function to catch the exception, know the reason for the failure, and format a message to report the error in a way that will make sense on the command line. If this seems like an acceptible solution to reviewers this pattern may be used elsewhere to improve CLI error reporting as needed.
Adds various ways of formatting the collections.
Puts duplicated test repo code for the various CLI tests in a shared location & adds some API for modifying the repo.
Numpy recently added a deprecation warning for "ragged" arrays; in this case it was ragged because of a type mismatch; CollectionSearch vs str. Making the CollectionSearch a string in the list comprehension prevents us from encountering the warning.
2efb461
to
8d2c58b
Compare
removes use of @patch, and just replaces mocked functions.
8d2c58b
to
5e9389d
Compare
No description provided.