New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-26683: Make exporting dimension data friendlier #375
Conversation
Without generics, decoration hid all type information about the wrapped type.
This was a relic from when DimensionUniverse inherited from DimensionGraph and DimensionGraph defined __str__.
This should not change functionality at all; it's just moving code around. A separate package gives the code a bit more room for expansion and more clearly separates between interfaces and implementations. Moving it out of 'core' fixes the existing circular dependency issues: the import/export code needs to depend on both Registry and Datastore, and hence shouldn't go in core. But the registry tests depend on it, so there is one new circular dependency in registry.tests, but because tests conceptually depend on everything I'm not bothered by this (maybe we should have named that 'registry_tests' instead of making it a registry subpackage, but I'm not going to).
This should address the core complaint of DM-26683.
Strings are now accepted as well as DimensionElement instances, and passing elements without tables (like 'htmN') is now correctly ignored (because the user shouldn't care which elements have tables).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I eventually realized that there were very few changes here but a lot of reorganization. I think this looks fine.
tests/test_butler.py
Outdated
@@ -931,6 +931,9 @@ def runImportExportTest(self, storageClass): | |||
# Test that the repo actually has at least one dataset. | |||
datasets = list(exportButler.registry.queryDatasets(..., collections=...)) | |||
self.assertGreater(len(datasets), 0) | |||
# Add a DimensionRecord that's unused by those datasets. | |||
skymapRecord = {"name": "example_skymap", "hash": (50).to_bytes(8, byteorder="little")} | |||
exportButler.registry.insertDimensionData("skymap", skymapRecord) | |||
# Export those datasets. We used TemporaryDirectory because there | |||
# doesn't seem to be a way to get the filename (as opposed to the file | |||
# object) from any of tempfile's temporary-file context managers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tempfile.NamedTemporaryFile
? Also, what does transfer-on-exist mean below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was vaguely hoping you knew what either of these meant when I saw them. git blame
has proved my guilt on both. I'm just going to remove them, because I don't think they're helpful - nobody knows what transfer-on-exist
means anymore, and I don't see an obvious way to replace temporary directory usage with temporary file usage here (or why that would be better).
|
||
def _finish(self) -> None: | ||
"""Delegate to the backend to finish the export process. | ||
|
||
For use by `Butler.export` only. | ||
""" | ||
for element in self._registry.dimensions.sorted(self._records.keys()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this sorting help us out at all with DM-26324 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a start, but we'd also need to sort the iterable provided on the line below, and that's harder, because we don't have a way to sort data IDs or dimension records (something I realized after my Jira post about trying it out).
|
||
def _finish(self) -> None: | ||
"""Delegate to the backend to finish the export process. | ||
|
||
For use by `Butler.export` only. | ||
""" | ||
for element in self._registry.dimensions.sorted(self._records.keys()): | ||
self._backend.saveDimensionData(element, *self._records[element].values()) | ||
for (datasetType, run), records in self._datasets.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this benefit from a sort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add one. As in the previous case, we'll need to add sorting in the line below as well to make the order fully deterministic, but we might as well start.
I wrote these a year ago and don't understand them anymore.
No description provided.