Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-21246: butler cleanup and support for deferred run/collection passing #224

Merged
merged 17 commits into from Jan 30, 2020

Conversation

TallJimbo
Copy link
Member

No description provided.

This rescues __init__ from having to save its args in a tuple and will
let us force keyword arguments in the future.
Also switch to `from typing import X`, because all of the
`typing.X` was getting gnarly.
I'm not totally thrilled with this change, but I think that means we
need to clean up the organization of core, probably by moving more
stuff out of it.  Butler (which is outside core) shouldn't be
depending on the internal organization of modules inside core.
Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nice cleanups here. I don't think we are testing construction of a Butler from another butler though.

self.registry = butler.registry
self.datastore = butler.datastore
self._storageClasses = butler.storageClasses
self._composites = butler.composites
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the attribute has changed to an internal attribute of _composites how do we copy them from another butler as composites? Similarly for the lines above and below. Either I'm missing something or our test coverage is poor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this is poor test coverage in daf_butler, though code exercising it in obs_base is tested in ci_hsc (both of them, I think), and those did fail - Jenkins just hadn't gotten that far yet when I put this into review.

Instead of adding tests to daf_butler, though, I may see if I can just remove the butler argument and instead change the obs_base code to stop using it - this ticket was originally created because I wanted to make a butler before I knew the run/collection in that code, and I added the butler argument so I could quickly make new butlers with different runs/collections as a workaround. I figure less code to test is even better than more tests. But if that doesn't work, I'll add a test to daf_butler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem crazy to seed a new butler from another butler and then change the collection. A test seems trivial (create the butler then create another from it).

@@ -470,21 +607,41 @@ def getDeferred(self, datasetRefOrType: typing.Union[DatasetRef, DatasetType, st
A `dict` of `Dimension` link name, value pairs that label the
`DatasetRef` within a Collection. When `None`, a `DatasetRef`
should be provided as the first argument.
collection : `str`, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collection is documented twice.

# create a table, it means someone is trying to create a read-only
# butler client for an empty repo. That should be okay, as long
# as they then try to get any datasets before some other client
# creates the table. Chances are they'rejust validating
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment rejust

@TallJimbo TallJimbo force-pushed the tickets/DM-21246 branch 2 times, most recently from 57ddb97 to 98bd497 Compare January 29, 2020 18:47
The registry attribute definitely needs to be public.  The datastore
might for particular datastore implementations (though maybe not any of
the ones we have now).  Other attributes are definitely implementation
details - I was actually thinking about removing config, since we don't
*need* it after construction, and while that looks inconvenient now,
we should leave the possibility open.

This means we have a bit of test code using a private attribute
(._config), but I think a bit of fragility there is better than
removing those tests.
It might be good to have more checks on registry.isWriteable()
throughout Datastore implementations (which they can use as an
indication of whether they should also be writeable), but I think this
is the only one necessary to get the behavior we want.
This comes with a small user-visible change in behavior: datasetExists
now raises LookupError instead of FileNotFoundError when the dataset
does not exist in the Registry.  That makes it consistent with get
and getURI.
We now resolve the DatasetRef by looking it up in the Registry before
passing it to the proxy object, which means that proxy doesn't need to
hold nearly as much.

This also removes the closure-based hiding of the butler, as we
actually do have valid use cases for getting the butler from these
objects.
We apparently have PipelineTask code that used it, which was only
marginally safe before (because some data ID keys could have been in
kwargs, but wouldn't have been in PipelineTasks).
@TallJimbo TallJimbo merged commit 72083d5 into master Jan 30, 2020
@TallJimbo TallJimbo deleted the tickets/DM-21246 branch January 30, 2020 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants