Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-26683: Make exporting dimension data friendlier #375

Merged
merged 10 commits into from Sep 17, 2020
Merged

Conversation

TallJimbo
Copy link
Member

No description provided.

Without generics, decoration hid all type information about the
wrapped type.
This was a relic from when DimensionUniverse inherited from
DimensionGraph and DimensionGraph defined __str__.
This should not change functionality at all; it's just moving code
around.  A separate package gives the code a bit more room for
expansion and more clearly separates between interfaces and
implementations.

Moving it out of 'core' fixes the existing circular dependency issues:
the import/export code needs to depend on both Registry and Datastore,
and hence shouldn't go in core.  But the registry tests depend on it,
so there is one new circular dependency in registry.tests, but because
tests conceptually depend on everything I'm not bothered by this
(maybe we should have named that 'registry_tests' instead of making it
a registry subpackage, but I'm not going to).
This should address the core complaint of DM-26683.
Strings are now accepted as well as DimensionElement instances, and
passing elements without tables (like 'htmN') is now correctly ignored
(because the user shouldn't care which elements have tables).
Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I eventually realized that there were very few changes here but a lot of reorganization. I think this looks fine.

@@ -931,6 +931,9 @@ def runImportExportTest(self, storageClass):
# Test that the repo actually has at least one dataset.
datasets = list(exportButler.registry.queryDatasets(..., collections=...))
self.assertGreater(len(datasets), 0)
# Add a DimensionRecord that's unused by those datasets.
skymapRecord = {"name": "example_skymap", "hash": (50).to_bytes(8, byteorder="little")}
exportButler.registry.insertDimensionData("skymap", skymapRecord)
# Export those datasets. We used TemporaryDirectory because there
# doesn't seem to be a way to get the filename (as opposed to the file
# object) from any of tempfile's temporary-file context managers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tempfile.NamedTemporaryFile ? Also, what does transfer-on-exist mean below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was vaguely hoping you knew what either of these meant when I saw them. git blame has proved my guilt on both. I'm just going to remove them, because I don't think they're helpful - nobody knows what transfer-on-exist means anymore, and I don't see an obvious way to replace temporary directory usage with temporary file usage here (or why that would be better).


def _finish(self) -> None:
"""Delegate to the backend to finish the export process.

For use by `Butler.export` only.
"""
for element in self._registry.dimensions.sorted(self._records.keys()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this sorting help us out at all with DM-26324 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a start, but we'd also need to sort the iterable provided on the line below, and that's harder, because we don't have a way to sort data IDs or dimension records (something I realized after my Jira post about trying it out).


def _finish(self) -> None:
"""Delegate to the backend to finish the export process.

For use by `Butler.export` only.
"""
for element in self._registry.dimensions.sorted(self._records.keys()):
self._backend.saveDimensionData(element, *self._records[element].values())
for (datasetType, run), records in self._datasets.items():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this benefit from a sort?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add one. As in the previous case, we'll need to add sorting in the line below as well to make the order fully deterministic, but we might as well start.

@TallJimbo TallJimbo merged commit 6ea99c5 into master Sep 17, 2020
@TallJimbo TallJimbo deleted the tickets/DM-26683 branch September 17, 2020 04:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants