New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-33148: avoid conflicts due to existing dimension records in import #681
Conversation
These SQLite-missing-functionality workarounds date back to when we expected to use (id, origin) compound primary keys to get uniqueness across different data repositories. We're going with UUIDs instead, so we don't need them anymore.
7a934a5
to
4b9a395
Compare
Support for this syntax is now upstream in SQLAlchemy.
4b9a395
to
7505ecb
Compare
Codecov Report
@@ Coverage Diff @@
## main #681 +/- ##
=======================================
Coverage 84.29% 84.30%
=======================================
Files 243 243
Lines 31089 31057 -32
Branches 5235 5228 -7
=======================================
- Hits 26208 26183 -25
+ Misses 3714 3712 -2
+ Partials 1167 1162 -5
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Only a couple of minor comments.
@@ -372,7 +372,7 @@ def load( | |||
for element, dimensionRecords in self.dimensions.items(): | |||
if skip_dimensions and element in skip_dimensions: | |||
continue | |||
self.registry.insertDimensionData(element, *dimensionRecords) | |||
self.registry.insertDimensionData(element, *dimensionRecords, skip_existing=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can there be a comment here to say that we are using skip_existing because we are assuming that records that are being imported from an export from another registry are assumed to be trustworthy and not to have been manually altered and we are doing this for speed reasons because sync would be too slow?
) | ||
butler2.import_(filename=file.name, skip_dimensions=dimensions) | ||
# Import it again | ||
butler2.import_(filename=file.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why the code coverage tool does not think this line ran...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! It's right; I tried sabotaging it and it seems to always be skipped.
At the top of the base test case in this file, we have:
datasetsIdType = int
and then this test method has:
if self.datasetsIdType is not uuid.UUID:
self.skipTest("This test can only work for UUIDs")
but at the bottom we have two derived test cases that both set
datasetsIdType = uuid.UUID
so it's not clear to me what's going on. I'll see if I can figure it out, but ideas are most welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, nope, nothing so complicated as that. I had accidentally renamed the test method to tesImportTwice
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😄 On the plus side code coverage caught it.
7505ecb
to
086c1bd
Compare
Checklist
doc/changes