New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-24851: Add getURIs method #288
Conversation
ref = ref.resolved(id=0, run=self.run) | ||
return self.datastore.getURIs(ref, predict) | ||
|
||
def getURI(self, datasetRefOrType: Union[DatasetRef, DatasetType, str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what this is doing, but I was surprised to see this single URI form still here. Might be worth having a comment pointing to getURIs as the more future proof function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. People shouldn't really use it. I left it around mostly because it seems more like the gen2 version. I'll ask on Slack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple comments and questions. Otherwise good to merge.
components : `dict` | ||
URIs to any components associated with the dataset artifact. | ||
Can be empty if there are no components. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it raise if dataset not found in Datastore or returns None, {}? Question is more about the docstring than the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It raises if the dataset is missing and predict=False. ie you should never get None, {}
returned. You either get something real, something that might be real, or an exception.
return "mem://{}".format(name) | ||
uri, _ = self.getURIs(ref, predict) | ||
if uri is None: | ||
raise RuntimeError(f"Unexpectedly got no URI for in-memory datastore for {ref}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking that the changing of the type of exception raised was intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really an assertion (but I never use assert). I could change it to AssertionError. It's really a "this is impossible" check since getURIs will never do disassembly (so components is false) and None should never be returned if getURIs returns.
# Should *not* be disassembled | ||
datasets = list(butler.registry.queryDatasets(..., collections="ingest")) | ||
self.assertEqual(len(datasets), 1) | ||
uri, components = butler.getURIs(datasets[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably missed it, but is there a test of getURIs for a dataset type that isn't composite. If I understood these tests correctly they test composite (with components) that was written as a single file, and the later one tests composite written as multiple files. getURIs for a dataset type that isn't composite should give the same results as the single file, but doesn't guarantee that the code accidentally requires components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. We don't actually run any of the tests here with a non-composite so adding such a thing is probably more than I want to do now.
python/lsst/daf/butler/_butler.py
Outdated
primary, _ = self.getURIs(datasetRefOrType, dataId=dataId, predict=predict, | ||
collections=collections, run=run, **kwds) | ||
if primary is None: | ||
raise RuntimeError(f"Found dataset but no single URI retrieved for it {datasetRefOrType}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether this error message (which is repeated a few times in the code) means found dataset in registry, but either 0 or > 1 URI retrieved for it (related to previous question about what getURIs does if dataset in registry but 0 datastore entries), or whether found dataset in datastore (and registry), but found > 1 URI for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is specifically trapping for the case where someone is using getURI and the dataset does exist but it's disassembled so getURI is at that point useless to you. The if None
here also convinces mypy that I'm never going to return Optional[ButlerURI]
but always ButlerURI
.
Currently getUri is implemented using getURIs and still returns a string.
Name change matches getURI and since we are breaking the return value (no longer str) safer to explicitly break an downstream code.
Some of the annotations have to be a bit loose because we are allowing "objects that have a .name method"
This supports composite disassembly in datastore and will also be used for virtual composite datasets.