Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-37704: Make all DatasetRefs resolved. #841

Merged
merged 10 commits into from May 26, 2023
Merged

DM-37704: Make all DatasetRefs resolved. #841

merged 10 commits into from May 26, 2023

Conversation

andy-slac
Copy link
Contributor

DatasetRefs are now required to have defined dataset ID and run. This also
removes methods that made unresolved refs or resolved unresolved refs.
Many changes everywhere to avoid checking for unresolved refs.

Butler.put() method is updated to handle duplicate put attempts with the same
dataset ID. The behavior is still slightly different between resolved ref and
unresolved (DatasetType, DataId) in case dataset only exists in Registry but not in
datastore. Unit test reflects this difference in behavior.

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

I started removal of unresolved references, but I found that `Butler.put`
is somewhat broken with resolved refs (it accepts the same resolved ref
twice without error). I need to fix the method and I want to extend unit
test to cover that case. Looking at the unit tests I feel that they also
need an update. Checking with mypy should help me with all following updates,
and I think I fixed one bug (maybe more) in the test itself.

Also updated Butler constructor to support ResourcePath, and removed duplicate
configuration parsing.
The purpose of this commit is to reproduce an issue with Butler.put and
resolved refs. This breaks unit test - for InMemoryDatastore Butler.put
does not raise for duplicate put, for other cases it raises sqlalchemy
IntegrityError instead of expected ConflictingDefinitionError.
The attribute is not used any more. This also removes code branches
that depended on it and the test code for it. `put()` is still broken
for resolved refs, will fix it next.
Butler.put now raises an exception for duplicate attempt to write the same
resolved ref. There is still a difference in put() behavior between resolved
and unresolved (DatasetType+DataId) cases when writing a dataset that has
registry records but no artifacts. This is now reflected in the unit test,
and I am not sure what the correct behavior would be in this case.
DatasetRefs are now required to have defined dataset ID and run. This also
removes methods that made unresolved refs or resolved unresolved refs.
Many changes everywhere to avoid checking for unresolved refs.
@codecov
Copy link

codecov bot commented May 23, 2023

Codecov Report

Patch coverage: 93.95% and project coverage change: +0.16 🎉

Comparison is base (98bc225) 87.73% compared to head (75ed3cb) 87.90%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #841      +/-   ##
==========================================
+ Coverage   87.73%   87.90%   +0.16%     
==========================================
  Files         268      268              
  Lines       35354    35213     -141     
  Branches     7442     7391      -51     
==========================================
- Hits        31019    30953      -66     
+ Misses       3169     3119      -50     
+ Partials     1166     1141      -25     
Impacted Files Coverage Δ
python/lsst/daf/butler/core/datastore.py 94.73% <ø> (+4.32%) ⬆️
...on/lsst/daf/butler/datastores/inMemoryDatastore.py 91.66% <ø> (+3.90%) ⬆️
python/lsst/daf/butler/registry/_registry.py 94.55% <ø> (ø)
...hon/lsst/daf/butler/registry/interfaces/_bridge.py 89.06% <ø> (-0.34%) ⬇️
...n/lsst/daf/butler/registry/interfaces/_datasets.py 100.00% <ø> (ø)
python/lsst/daf/butler/script/butlerImport.py 77.77% <ø> (ø)
python/lsst/daf/butler/script/queryDatasets.py 89.39% <ø> (+1.89%) ⬆️
python/lsst/daf/butler/tests/_testRepo.py 92.36% <ø> (-0.06%) ⬇️
python/lsst/daf/butler/transfers/_interfaces.py 100.00% <ø> (ø)
tests/test_quantumBackedButler.py 99.06% <ø> (ø)
... and 23 more

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I have some minor comments. Main ones are:

  • Should _importDatasets check that the UUIDs it is returning match those it's given? We seem to be testing that outside of the import and that seems unreliable.
  • Maybe check that a DeferredDatasetHandle really can get the ref from the butler it's given?

python/lsst/daf/butler/core/datasets/ref.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/_butler.py Show resolved Hide resolved
python/lsst/daf/butler/_butler.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/_butler.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/_butler.py Show resolved Hide resolved
tests/test_butler.py Show resolved Hide resolved
tests/test_butler.py Show resolved Hide resolved
tests/test_butler.py Show resolved Hide resolved
tests/test_butler.py Outdated Show resolved Hide resolved
tests/test_butler.py Outdated Show resolved Hide resolved
@andy-slac
Copy link
Contributor Author

Should _importDatasets check that the UUIDs it is returning match those it's given? We seem to be testing that outside of the import and that seems unreliable.

I have added a check: cfa6ae3

Maybe check that a DeferredDatasetHandle really can get the ref from the butler it's given?

Is not the most common case for deferred get is when the ref comes from a quantum graph and supposed to exist? There is also a potential race here, checking that ref exists now does not mean it will exist later?

Deprecation warning is now issued in the CLI method in `commands.py`.
Unit tests updated to not use `reuse_ids`.
@andy-slac
Copy link
Contributor Author

I think I resolved all comments, two new commits were added: 408ec68 and 75ed3cb (plus cfa6ae3 I mentioned already).

@andy-slac andy-slac merged commit dca1f0b into main May 26, 2023
13 checks passed
@andy-slac andy-slac deleted the tickets/DM-37704 branch May 26, 2023 01:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants