New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-31287: Add protection to datasets manager import method #583
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I hurt my brain thinking about two potential concerns, and I think my conclusions are that no changes are needed:
Could we get away with fewer queries in _validateImport
? I think so, maybe, but if so it'd be a lot harder to read and more fragile, depending on the details of various foreign keys, so not worth doing since we don't seem to have a performance problem here.
Is this subject to race conditions with READ COMMITTED isolation? Only in the sense that concurrent writes that involve identical datasets would cause constraint violations in the actual INSERT
, instead of being ignored as they would be if not concurrent. Since that should be rare and doesn't result in any broken invariants, I think we're good.
Does that make sense to you?
python/lsst/daf/butler/registry/datasets/byDimensions/_storage.py
Outdated
Show resolved
Hide resolved
eee7f6a
to
9f3863c
Compare
Codecov Report
@@ Coverage Diff @@
## master #583 +/- ##
==========================================
+ Coverage 83.47% 83.55% +0.07%
==========================================
Files 241 241
Lines 30136 30253 +117
Branches 4497 4512 +15
==========================================
+ Hits 25156 25277 +121
+ Misses 3786 3784 -2
+ Partials 1194 1192 -2
Continue to review full report at Codecov.
|
Dataset storage manager now has an additional protection against inconsistent dataset definitions when importing datasets that use UUID for dataset ID.
Concurrent inserts are indeed a potential issue. I do not think we can solve it with locking (except with the whole table lock which we don't want), I believe Postgres does not know how to lock non-existing rows. I think getting an error on INSERT in this situation and re-trying the whole thing should be acceptable, the only question is that it may need human intervention until we teach the code to recognize this sort of failure. (I rebased this against current master and restarted Jenkins, will merge when it finishes) |
9f3863c
to
d27114f
Compare
Dataset storage manager now has an additional protection against
inconsistent dataset definitions when importing datasets that use UUID
for dataset ID.
Checklist
doc/changes