New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-25000: Make registry query methods work with composition again. #289
Conversation
We stopped using these code paths outside of tests on DM-24288, and removing them is a huge simplification for Registry. In detail: - DatasetRef.components has been removed, along with the flatten and allRefs methods. This means DatasetRefs now only have two states (resolved and unresolved), instead of it being unclear whether an empty dict means there are no components or that we don't know about the components. DatasetRef.makeComponentRef has been added to make it easy to make a ref for a component dataset from the parent ref. - All "recursive" kwargs have been removed, along with some occasionally-complex logic needed to implement them. - The dataset_composition table has been removed. It is worth noting that I found a couple of spots where we were still calling insertDatasets with recursive=True (ingest and import from yaml). So I think we were still generating unnecessary database rows sometimes, including in contexts where a lot of datasets were involved. I don't think those rows were ever used after insertion, which is why this didn't break anything.
We now just implement __repr__ and let __str__ delegate to that, and aim to make it informative and non-confusing because we've never been able to make the string eval'able as in the ideal case. Note that the biggest problem with the old repr (which is what is used by __str__ in built-in containers of data IDs, regardless of what we'd prefer) is that it printed the full DimensionGraph, which often has implied dimensions and hence more "keys" than the mapping interface actually provided access to. That it impossible to infer what the dict-like form actually was, and it's the dict-like form that people actually want to know.
We now have an option to control whether components are included in the results. We expect users to consider them noise if the parent is also included in the results, so the default is now to not include them unless the parent is not in the results.
cee2c98
to
f7999e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok.
childType = registry.getDatasetType("permabias.wcs") | ||
parentRefResolved = registry.findDataset(parentType, collections=collection, | ||
instrument="Cam1", detector=1) | ||
self.assertIsNotNone(parentRefResolved) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could be more positive in this test and use:
self.assertIsInstance(parentRefResolved, DatasetRef)
?
parentRefUnresolved = parentRefResolved.unresolved() | ||
# Search for a single dataset with findDataset. | ||
childRef1 = registry.findDataset("permabias.wcs", collections=collection, | ||
dataId=parentRefUnresolved.dataId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the dataId of parentRefUnresolved the same as the dataId for parentRefResolved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch; the unresolved variable's existence is a relic of a previous incarnation of the test, and I've removed it.
97367b7
to
11c8da9
Compare
No description provided.