Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-24851: Add getURIs method #288

Merged
merged 8 commits into from May 20, 2020
Merged

DM-24851: Add getURIs method #288

merged 8 commits into from May 20, 2020

Conversation

timj
Copy link
Member

@timj timj commented May 20, 2020

This supports composite disassembly in datastore and will also be used for virtual composite datasets.

ref = ref.resolved(id=0, run=self.run)
return self.datastore.getURIs(ref, predict)

def getURI(self, datasetRefOrType: Union[DatasetRef, DatasetType, str],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what this is doing, but I was surprised to see this single URI form still here. Might be worth having a comment pointing to getURIs as the more future proof function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. People shouldn't really use it. I left it around mostly because it seems more like the gen2 version. I'll ask on Slack.

Copy link
Contributor

@MichelleGower MichelleGower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple comments and questions. Otherwise good to merge.

components : `dict`
URIs to any components associated with the dataset artifact.
Can be empty if there are no components.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it raise if dataset not found in Datastore or returns None, {}? Question is more about the docstring than the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It raises if the dataset is missing and predict=False. ie you should never get None, {} returned. You either get something real, something that might be real, or an exception.

return "mem://{}".format(name)
uri, _ = self.getURIs(ref, predict)
if uri is None:
raise RuntimeError(f"Unexpectedly got no URI for in-memory datastore for {ref}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking that the changing of the type of exception raised was intentional.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really an assertion (but I never use assert). I could change it to AssertionError. It's really a "this is impossible" check since getURIs will never do disassembly (so components is false) and None should never be returned if getURIs returns.

# Should *not* be disassembled
datasets = list(butler.registry.queryDatasets(..., collections="ingest"))
self.assertEqual(len(datasets), 1)
uri, components = butler.getURIs(datasets[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably missed it, but is there a test of getURIs for a dataset type that isn't composite. If I understood these tests correctly they test composite (with components) that was written as a single file, and the later one tests composite written as multiple files. getURIs for a dataset type that isn't composite should give the same results as the single file, but doesn't guarantee that the code accidentally requires components.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. We don't actually run any of the tests here with a non-composite so adding such a thing is probably more than I want to do now.

primary, _ = self.getURIs(datasetRefOrType, dataId=dataId, predict=predict,
collections=collections, run=run, **kwds)
if primary is None:
raise RuntimeError(f"Found dataset but no single URI retrieved for it {datasetRefOrType}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether this error message (which is repeated a few times in the code) means found dataset in registry, but either 0 or > 1 URI retrieved for it (related to previous question about what getURIs does if dataset in registry but 0 datastore entries), or whether found dataset in datastore (and registry), but found > 1 URI for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specifically trapping for the case where someone is using getURI and the dataset does exist but it's disassembled so getURI is at that point useless to you. The if None here also convinces mypy that I'm never going to return Optional[ButlerURI] but always ButlerURI.

timj added 2 commits May 20, 2020 16:05
Some of the annotations have to be a bit loose because
we are allowing "objects that have a .name method"
@timj timj merged commit a2bedf3 into master May 20, 2020
@timj timj deleted the tickets/DM-24851 branch May 20, 2020 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants