New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-21246: butler cleanup and support for deferred run/collection passing #224
Conversation
This rescues __init__ from having to save its args in a tuple and will let us force keyword arguments in the future.
Also switch to `from typing import X`, because all of the `typing.X` was getting gnarly.
I'm not totally thrilled with this change, but I think that means we need to clean up the organization of core, probably by moving more stuff out of it. Butler (which is outside core) shouldn't be depending on the internal organization of modules inside core.
eb93216
to
90e59d1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nice cleanups here. I don't think we are testing construction of a Butler from another butler though.
python/lsst/daf/butler/_butler.py
Outdated
self.registry = butler.registry | ||
self.datastore = butler.datastore | ||
self._storageClasses = butler.storageClasses | ||
self._composites = butler.composites |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the attribute has changed to an internal attribute of _composites
how do we copy them from another butler as composites
? Similarly for the lines above and below. Either I'm missing something or our test coverage is poor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this is poor test coverage in daf_butler
, though code exercising it in obs_base is tested in ci_hsc
(both of them, I think), and those did fail - Jenkins just hadn't gotten that far yet when I put this into review.
Instead of adding tests to daf_butler
, though, I may see if I can just remove the butler
argument and instead change the obs_base
code to stop using it - this ticket was originally created because I wanted to make a butler before I knew the run/collection in that code, and I added the butler
argument so I could quickly make new butlers with different runs/collections as a workaround. I figure less code to test is even better than more tests. But if that doesn't work, I'll add a test to daf_butler
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem crazy to seed a new butler from another butler and then change the collection. A test seems trivial (create the butler then create another from it).
@@ -470,21 +607,41 @@ def getDeferred(self, datasetRefOrType: typing.Union[DatasetRef, DatasetType, st | |||
A `dict` of `Dimension` link name, value pairs that label the | |||
`DatasetRef` within a Collection. When `None`, a `DatasetRef` | |||
should be provided as the first argument. | |||
collection : `str`, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collection is documented twice.
# create a table, it means someone is trying to create a read-only | ||
# butler client for an empty repo. That should be okay, as long | ||
# as they then try to get any datasets before some other client | ||
# creates the table. Chances are they'rejust validating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in comment rejust
57ddb97
to
98bd497
Compare
The registry attribute definitely needs to be public. The datastore might for particular datastore implementations (though maybe not any of the ones we have now). Other attributes are definitely implementation details - I was actually thinking about removing config, since we don't *need* it after construction, and while that looks inconvenient now, we should leave the possibility open. This means we have a bit of test code using a private attribute (._config), but I think a bit of fragility there is better than removing those tests.
It might be good to have more checks on registry.isWriteable() throughout Datastore implementations (which they can use as an indication of whether they should also be writeable), but I think this is the only one necessary to get the behavior we want.
This comes with a small user-visible change in behavior: datasetExists now raises LookupError instead of FileNotFoundError when the dataset does not exist in the Registry. That makes it consistent with get and getURI.
We now resolve the DatasetRef by looking it up in the Registry before passing it to the proxy object, which means that proxy doesn't need to hold nearly as much. This also removes the closure-based hiding of the butler, as we actually do have valid use cases for getting the butler from these objects.
We apparently have PipelineTask code that used it, which was only marginally safe before (because some data ID keys could have been in kwargs, but wouldn't have been in PipelineTasks).
579dfe0
to
0634d41
Compare
No description provided.