DM-33148: avoid conflicts due to existing dimension records in import #681

TallJimbo · 2022-05-02T14:46:37Z

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes

These SQLite-missing-functionality workarounds date back to when we expected to use (id, origin) compound primary keys to get uniqueness across different data repositories. We're going with UUIDs instead, so we don't need them anymore.

Support for this syntax is now upstream in SQLAlchemy.

codecov · 2022-05-02T15:04:30Z

Codecov Report

Merging #681 (086c1bd) into main (b7ebdee) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main     #681   +/-   ##
=======================================
  Coverage   84.29%   84.30%           
=======================================
  Files         243      243           
  Lines       31089    31057   -32     
  Branches     5235     5228    -7     
=======================================
- Hits        26208    26183   -25     
+ Misses       3714     3712    -2     
+ Partials     1167     1162    -5

Impacted Files	Coverage Δ
python/lsst/daf/butler/registries/remote.py	`0.00% <ø> (ø)`
python/lsst/daf/butler/registry/_registry.py	`72.52% <ø> (ø)`
python/lsst/daf/butler/registries/sql.py	`81.25% <100.00%> (ø)`
...n/lsst/daf/butler/registry/databases/postgresql.py	`79.18% <100.00%> (+0.32%)`	⬆️
...ython/lsst/daf/butler/registry/databases/sqlite.py	`84.39% <100.00%> (-0.66%)`	⬇️
...hon/lsst/daf/butler/registry/dimensions/caching.py	`94.73% <100.00%> (+0.19%)`	⬆️
...on/lsst/daf/butler/registry/dimensions/governor.py	`93.82% <100.00%> (+0.32%)`	⬆️
...ython/lsst/daf/butler/registry/dimensions/query.py	`74.24% <100.00%> (ø)`
...thon/lsst/daf/butler/registry/dimensions/skypix.py	`76.66% <100.00%> (ø)`
...ython/lsst/daf/butler/registry/dimensions/table.py	`85.40% <100.00%> (+0.99%)`	⬆️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b7ebdee...086c1bd. Read the comment docs.

timj

Looks good to me. Only a couple of minor comments.

timj · 2022-05-02T16:27:39Z

python/lsst/daf/butler/transfers/_yaml.py

@@ -372,7 +372,7 @@ def load(
        for element, dimensionRecords in self.dimensions.items():
            if skip_dimensions and element in skip_dimensions:
                continue
-            self.registry.insertDimensionData(element, *dimensionRecords)
+            self.registry.insertDimensionData(element, *dimensionRecords, skip_existing=True)


Can there be a comment here to say that we are using skip_existing because we are assuming that records that are being imported from an export from another registry are assumed to be trustworthy and not to have been manually altered and we are doing this for speed reasons because sync would be too slow?

timj · 2022-05-02T16:28:24Z

tests/test_simpleButler.py

-            )
-            butler2.import_(filename=file.name, skip_dimensions=dimensions)
+            # Import it again
+            butler2.import_(filename=file.name)


I wonder why the code coverage tool does not think this line ran...

Good catch! It's right; I tried sabotaging it and it seems to always be skipped.

At the top of the base test case in this file, we have:

datasetsIdType = int

and then this test method has:

if self.datasetsIdType is not uuid.UUID: self.skipTest("This test can only work for UUIDs")

but at the bottom we have two derived test cases that both set

datasetsIdType = uuid.UUID

so it's not clear to me what's going on. I'll see if I can figure it out, but ideas are most welcome.

Ha, nope, nothing so complicated as that. I had accidentally renamed the test method to tesImportTwice.

😄 On the plus side code coverage caught it.

TallJimbo force-pushed the tickets/DM-33148 branch from 7a934a5 to 4b9a395 Compare May 2, 2022 14:50

TallJimbo added 3 commits May 2, 2022 10:51

Remove custom sqlalchemy SQLite UPSERT extensions.

83d1d91

Support for this syntax is now upstream in SQLAlchemy.

Allow Database.ensure to ignore only primary key constraints.

4cb703f

Allow insertDimensionData to ignore existing records.

e32e1d8

TallJimbo force-pushed the tickets/DM-33148 branch from 4b9a395 to 7505ecb Compare May 2, 2022 14:51

timj approved these changes May 2, 2022

View reviewed changes

Ignore conflicts with existing dimension records in import.

086c1bd

TallJimbo force-pushed the tickets/DM-33148 branch from 7505ecb to 086c1bd Compare May 2, 2022 19:22

TallJimbo merged commit 479bf8a into main May 2, 2022

TallJimbo deleted the tickets/DM-33148 branch May 2, 2022 19:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-33148: avoid conflicts due to existing dimension records in import #681

DM-33148: avoid conflicts due to existing dimension records in import #681

TallJimbo commented May 2, 2022 •

edited

codecov bot commented May 2, 2022 •

edited

timj left a comment

timj May 2, 2022

timj May 2, 2022

TallJimbo May 2, 2022

TallJimbo May 2, 2022

timj May 2, 2022

DM-33148: avoid conflicts due to existing dimension records in import #681

DM-33148: avoid conflicts due to existing dimension records in import #681

Conversation

TallJimbo commented May 2, 2022 • edited

Checklist

codecov bot commented May 2, 2022 • edited

Codecov Report

timj left a comment

Choose a reason for hiding this comment

timj May 2, 2022

Choose a reason for hiding this comment

timj May 2, 2022

Choose a reason for hiding this comment

TallJimbo May 2, 2022

Choose a reason for hiding this comment

TallJimbo May 2, 2022

Choose a reason for hiding this comment

timj May 2, 2022

Choose a reason for hiding this comment

TallJimbo commented May 2, 2022 •

edited

codecov bot commented May 2, 2022 •

edited