Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-33148: avoid conflicts due to existing dimension records in import #681

Merged
merged 5 commits into from May 2, 2022

Conversation

TallJimbo
Copy link
Member

@TallJimbo TallJimbo commented May 2, 2022

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

These SQLite-missing-functionality workarounds date back to when we
expected to use (id, origin) compound primary keys to get uniqueness
across different data repositories.  We're going with UUIDs instead, so
we don't need them anymore.
@codecov
Copy link

codecov bot commented May 2, 2022

Codecov Report

Merging #681 (086c1bd) into main (b7ebdee) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main     #681   +/-   ##
=======================================
  Coverage   84.29%   84.30%           
=======================================
  Files         243      243           
  Lines       31089    31057   -32     
  Branches     5235     5228    -7     
=======================================
- Hits        26208    26183   -25     
+ Misses       3714     3712    -2     
+ Partials     1167     1162    -5     
Impacted Files Coverage Δ
python/lsst/daf/butler/registries/remote.py 0.00% <ø> (ø)
python/lsst/daf/butler/registry/_registry.py 72.52% <ø> (ø)
python/lsst/daf/butler/registries/sql.py 81.25% <100.00%> (ø)
...n/lsst/daf/butler/registry/databases/postgresql.py 79.18% <100.00%> (+0.32%) ⬆️
...ython/lsst/daf/butler/registry/databases/sqlite.py 84.39% <100.00%> (-0.66%) ⬇️
...hon/lsst/daf/butler/registry/dimensions/caching.py 94.73% <100.00%> (+0.19%) ⬆️
...on/lsst/daf/butler/registry/dimensions/governor.py 93.82% <100.00%> (+0.32%) ⬆️
...ython/lsst/daf/butler/registry/dimensions/query.py 74.24% <100.00%> (ø)
...thon/lsst/daf/butler/registry/dimensions/skypix.py 76.66% <100.00%> (ø)
...ython/lsst/daf/butler/registry/dimensions/table.py 85.40% <100.00%> (+0.99%) ⬆️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b7ebdee...086c1bd. Read the comment docs.

Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Only a couple of minor comments.

@@ -372,7 +372,7 @@ def load(
for element, dimensionRecords in self.dimensions.items():
if skip_dimensions and element in skip_dimensions:
continue
self.registry.insertDimensionData(element, *dimensionRecords)
self.registry.insertDimensionData(element, *dimensionRecords, skip_existing=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there be a comment here to say that we are using skip_existing because we are assuming that records that are being imported from an export from another registry are assumed to be trustworthy and not to have been manually altered and we are doing this for speed reasons because sync would be too slow?

)
butler2.import_(filename=file.name, skip_dimensions=dimensions)
# Import it again
butler2.import_(filename=file.name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why the code coverage tool does not think this line ran...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! It's right; I tried sabotaging it and it seems to always be skipped.

At the top of the base test case in this file, we have:

datasetsIdType = int

and then this test method has:

if self.datasetsIdType is not uuid.UUID:
    self.skipTest("This test can only work for UUIDs")

but at the bottom we have two derived test cases that both set

datasetsIdType = uuid.UUID

so it's not clear to me what's going on. I'll see if I can figure it out, but ideas are most welcome.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, nope, nothing so complicated as that. I had accidentally renamed the test method to tesImportTwice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄 On the plus side code coverage caught it.

@TallJimbo TallJimbo merged commit 479bf8a into main May 2, 2022
@TallJimbo TallJimbo deleted the tickets/DM-33148 branch May 2, 2022 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants