New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-37704: Make all DatasetRefs resolved. #841
Conversation
I started removal of unresolved references, but I found that `Butler.put` is somewhat broken with resolved refs (it accepts the same resolved ref twice without error). I need to fix the method and I want to extend unit test to cover that case. Looking at the unit tests I feel that they also need an update. Checking with mypy should help me with all following updates, and I think I fixed one bug (maybe more) in the test itself. Also updated Butler constructor to support ResourcePath, and removed duplicate configuration parsing.
The purpose of this commit is to reproduce an issue with Butler.put and resolved refs. This breaks unit test - for InMemoryDatastore Butler.put does not raise for duplicate put, for other cases it raises sqlalchemy IntegrityError instead of expected ConflictingDefinitionError.
The attribute is not used any more. This also removes code branches that depended on it and the test code for it. `put()` is still broken for resolved refs, will fix it next.
Butler.put now raises an exception for duplicate attempt to write the same resolved ref. There is still a difference in put() behavior between resolved and unresolved (DatasetType+DataId) cases when writing a dataset that has registry records but no artifacts. This is now reflected in the unit test, and I am not sure what the correct behavior would be in this case.
DatasetRefs are now required to have defined dataset ID and run. This also removes methods that made unresolved refs or resolved unresolved refs. Many changes everywhere to avoid checking for unresolved refs.
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #841 +/- ##
==========================================
+ Coverage 87.73% 87.90% +0.16%
==========================================
Files 268 268
Lines 35354 35213 -141
Branches 7442 7391 -51
==========================================
- Hits 31019 30953 -66
+ Misses 3169 3119 -50
+ Partials 1166 1141 -25
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. I have some minor comments. Main ones are:
- Should
_importDatasets
check that the UUIDs it is returning match those it's given? We seem to be testing that outside of the import and that seems unreliable. - Maybe check that a DeferredDatasetHandle really can get the ref from the butler it's given?
I have added a check: cfa6ae3
Is not the most common case for deferred get is when the ref comes from a quantum graph and supposed to exist? There is also a potential race here, checking that ref exists now does not mean it will exist later? |
Deprecation warning is now issued in the CLI method in `commands.py`. Unit tests updated to not use `reuse_ids`.
d3031a7
to
75ed3cb
Compare
DatasetRefs are now required to have defined dataset ID and run. This also
removes methods that made unresolved refs or resolved unresolved refs.
Many changes everywhere to avoid checking for unresolved refs.
Butler.put()
method is updated to handle duplicate put attempts with the samedataset ID. The behavior is still slightly different between resolved ref and
unresolved (DatasetType, DataId) in case dataset only exists in Registry but not in
datastore. Unit test reflects this difference in behavior.
Checklist
doc/changes