DM-37704: Make all DatasetRefs resolved. #841

andy-slac · 2023-05-23T17:47:52Z

DatasetRefs are now required to have defined dataset ID and run. This also
removes methods that made unresolved refs or resolved unresolved refs.
Many changes everywhere to avoid checking for unresolved refs.

Butler.put() method is updated to handle duplicate put attempts with the same
dataset ID. The behavior is still slightly different between resolved ref and
unresolved (DatasetType, DataId) in case dataset only exists in Registry but not in
datastore. Unit test reflects this difference in behavior.

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes

I started removal of unresolved references, but I found that `Butler.put` is somewhat broken with resolved refs (it accepts the same resolved ref twice without error). I need to fix the method and I want to extend unit test to cover that case. Looking at the unit tests I feel that they also need an update. Checking with mypy should help me with all following updates, and I think I fixed one bug (maybe more) in the test itself. Also updated Butler constructor to support ResourcePath, and removed duplicate configuration parsing.

The purpose of this commit is to reproduce an issue with Butler.put and resolved refs. This breaks unit test - for InMemoryDatastore Butler.put does not raise for duplicate put, for other cases it raises sqlalchemy IntegrityError instead of expected ConflictingDefinitionError.

The attribute is not used any more. This also removes code branches that depended on it and the test code for it. `put()` is still broken for resolved refs, will fix it next.

Butler.put now raises an exception for duplicate attempt to write the same resolved ref. There is still a difference in put() behavior between resolved and unresolved (DatasetType+DataId) cases when writing a dataset that has registry records but no artifacts. This is now reflected in the unit test, and I am not sure what the correct behavior would be in this case.

DatasetRefs are now required to have defined dataset ID and run. This also removes methods that made unresolved refs or resolved unresolved refs. Many changes everywhere to avoid checking for unresolved refs.

codecov · 2023-05-23T18:09:05Z

Codecov Report

Patch coverage: 93.95% and project coverage change: +0.16 🎉

Comparison is base (98bc225) 87.73% compared to head (75ed3cb) 87.90%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #841      +/-   ##
==========================================
+ Coverage   87.73%   87.90%   +0.16%     
==========================================
  Files         268      268              
  Lines       35354    35213     -141     
  Branches     7442     7391      -51     
==========================================
- Hits        31019    30953      -66     
+ Misses       3169     3119      -50     
+ Partials     1166     1141      -25

Impacted Files	Coverage Δ
python/lsst/daf/butler/core/datastore.py	`94.73% <ø> (+4.32%)`	⬆️
...on/lsst/daf/butler/datastores/inMemoryDatastore.py	`91.66% <ø> (+3.90%)`	⬆️
python/lsst/daf/butler/registry/_registry.py	`94.55% <ø> (ø)`
...hon/lsst/daf/butler/registry/interfaces/_bridge.py	`89.06% <ø> (-0.34%)`	⬇️
...n/lsst/daf/butler/registry/interfaces/_datasets.py	`100.00% <ø> (ø)`
python/lsst/daf/butler/script/butlerImport.py	`77.77% <ø> (ø)`
python/lsst/daf/butler/script/queryDatasets.py	`89.39% <ø> (+1.89%)`	⬆️
python/lsst/daf/butler/tests/_testRepo.py	`92.36% <ø> (-0.06%)`	⬇️
python/lsst/daf/butler/transfers/_interfaces.py	`100.00% <ø> (ø)`
tests/test_quantumBackedButler.py	`99.06% <ø> (ø)`
... and 23 more

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

timj

Looks great. I have some minor comments. Main ones are:

Should _importDatasets check that the UUIDs it is returning match those it's given? We seem to be testing that outside of the import and that seems unreliable.
Maybe check that a DeferredDatasetHandle really can get the ref from the butler it's given?

python/lsst/daf/butler/core/datasets/ref.py

python/lsst/daf/butler/_butler.py

tests/test_butler.py

andy-slac · 2023-05-25T03:58:52Z

Should _importDatasets check that the UUIDs it is returning match those it's given? We seem to be testing that outside of the import and that seems unreliable.

I have added a check: cfa6ae3

Maybe check that a DeferredDatasetHandle really can get the ref from the butler it's given?

Is not the most common case for deferred get is when the ref comes from a quantum graph and supposed to exist? There is also a potential race here, checking that ref exists now does not mean it will exist later?

Deprecation warning is now issued in the CLI method in `commands.py`. Unit tests updated to not use `reuse_ids`.

andy-slac · 2023-05-25T19:15:48Z

I think I resolved all comments, two new commits were added: 408ec68 and 75ed3cb (plus cfa6ae3 I mentioned already).

andy-slac added 7 commits May 19, 2023 13:26

Add exception message check to few assertRegex tests.

544050d

Remove Butler._allow_put_of_predefined_dataset attribute.

a08d7a5

The attribute is not used any more. This also removes code branches that depended on it and the test code for it. `put()` is still broken for resolved refs, will fix it next.

Make all DatasetRefs resolved.

6538f4e

DatasetRefs are now required to have defined dataset ID and run. This also removes methods that made unresolved refs or resolved unresolved refs. Many changes everywhere to avoid checking for unresolved refs.

Add news fragments

c2fde95

timj approved these changes May 24, 2023

View reviewed changes

Add dataset ID consistency check to Registry._importDatasets

cfa6ae3

timj mentioned this pull request May 25, 2023

DM-37704: Update rewrite_sqlite_registry script for changes in Butler lsst-dm/daf_butler_migrate#22

Merged

1 task

andy-slac added 2 commits May 25, 2023 11:06

Remove reuse_ids parameter from butlerImport script.

408ec68

Deprecation warning is now issued in the CLI method in `commands.py`. Unit tests updated to not use `reuse_ids`.

Few fixes from code review

75ed3cb

andy-slac force-pushed the tickets/DM-37704 branch from d3031a7 to 75ed3cb Compare May 25, 2023 19:14

andy-slac merged commit dca1f0b into main May 26, 2023
13 checks passed

andy-slac deleted the tickets/DM-37704 branch May 26, 2023 01:31

kfindeisen mentioned this pull request Jun 13, 2023

DM-39653: Prompt processing unit tests have bitrotted lsst-dm/prompt_processing#70

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-37704: Make all DatasetRefs resolved. #841

DM-37704: Make all DatasetRefs resolved. #841

andy-slac commented May 23, 2023

codecov bot commented May 23, 2023 •

edited

timj left a comment

andy-slac commented May 25, 2023

andy-slac commented May 25, 2023

DM-37704: Make all DatasetRefs resolved. #841

DM-37704: Make all DatasetRefs resolved. #841

Conversation

andy-slac commented May 23, 2023

Checklist

codecov bot commented May 23, 2023 • edited

Codecov Report

timj left a comment

Choose a reason for hiding this comment

andy-slac commented May 25, 2023

andy-slac commented May 25, 2023

codecov bot commented May 23, 2023 •

edited