Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-34247: simplify dataset subquery logic and fix edge-case bugs #670

Merged
merged 10 commits into from Apr 11, 2022

Conversation

TallJimbo
Copy link
Member

@TallJimbo TallJimbo commented Mar 31, 2022

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@codecov
Copy link

codecov bot commented Mar 31, 2022

Codecov Report

Merging #670 (d168fa6) into main (6ac0126) will increase coverage by 0.02%.
The diff coverage is 91.00%.

@@            Coverage Diff             @@
##             main     #670      +/-   ##
==========================================
+ Coverage   84.29%   84.32%   +0.02%     
==========================================
  Files         242      242              
  Lines       31019    31055      +36     
  Branches     5219     5219              
==========================================
+ Hits        26149    26187      +38     
  Misses       3706     3706              
+ Partials     1164     1162       -2     
Impacted Files Coverage Δ
...n/lsst/daf/butler/registry/interfaces/_database.py 87.19% <62.50%> (-0.34%) ⬇️
...ython/lsst/daf/butler/registry/queries/_builder.py 89.30% <90.90%> (-0.65%) ⬇️
.../butler/registry/datasets/byDimensions/_storage.py 84.34% <91.30%> (+0.79%) ⬆️
python/lsst/daf/butler/registries/sql.py 81.25% <100.00%> (ø)
...n/lsst/daf/butler/registry/interfaces/_datasets.py 73.11% <100.00%> (+0.29%) ⬆️
python/lsst/daf/butler/registry/tests/_registry.py 99.15% <100.00%> (+0.17%) ⬆️
...ython/lsst/daf/butler/registry/queries/_results.py 84.69% <0.00%> (+0.71%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6ac0126...d168fa6. Read the comment docs.

@TallJimbo TallJimbo marked this pull request as draft April 5, 2022 18:05
@TallJimbo TallJimbo force-pushed the tickets/DM-34247 branch 2 times, most recently from cdf20cd to 94de327 Compare April 6, 2022 18:12
@TallJimbo TallJimbo marked this pull request as ready for review April 6, 2022 19:20
Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, a couple of minor comments.

Comment on lines 220 to 243
else:
# We can never find a non-calibration dataset in a
# CALIBRATION collection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reading pre-existing code - I think this comment is not quite right, in this else branch dataset type can still be calibration (if collectionRecord.name is not in explicitCollections).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe this was an edge-case bug that is now fixed, and the comment is now correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nevermind; this wasn't fixed - I had started, and then decided to the refactor first, and then forgot to come back to it. Fixed on another commit.

ingestDate=SimpleQuery.Select if isResult else None,
# If this dataset type has no dimensions, we're in danger of
# generating an invalid subquery that has no columns in the
# SELECT clause. Any easy fix is to just select some arbitrary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any -> An?

@TallJimbo TallJimbo force-pushed the tickets/DM-34247 branch 2 times, most recently from 80709d5 to 35424d9 Compare April 7, 2022 03:55
Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, minor comments.

# find-first search is just a regular result subquery.
if len(collections) == 1:
# find-first search is just a regular result subquery. Same is true
# if this is a doomed query with no collectoins to search.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: collectoins

@TallJimbo TallJimbo force-pushed the tickets/DM-34247 branch 2 times, most recently from c36db32 to db1303e Compare April 8, 2022 18:11
This used to be one massive method; now it's two medium-large methods
and two very simple ones, and it's much easier to follow.

This shouldn't change the query behavior at all, but it does collapse
some UNION ALL constructs by merging their IN clauses when they are
otherwise the same - the original code had an unnecessary distinction
between CALIBRATION and other collection types (as well as some
necessary distinctions, which this preserves).
...which I just merged without a changelog entry.
This prevents query construction from failing when someone queries for
a skypix-dimension dataset type in collections where none exist.  When
the datasets do exist, we rely on the dataset table to bring in the
skypix dimension, since those don't have tables of their own, and we
need to do the same when the query is doomed to avoid triggering an
exception about skypix dimensions not being directly joinable.
@@ -511,6 +514,7 @@ def transaction(
# `Connection.in_nested_transaction()` method.
savepoint = savepoint or connection.info.get(_IN_SAVEPOINT_TRANSACTION, False)
connection.info[_IN_SAVEPOINT_TRANSACTION] = savepoint
trans: Union[sqlalchemy.engine.Transaction, sqlalchemy.engine.NestedTransaction]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NestedTransaction should be a subclass of Transaction, do you need explicit NestedTransaction here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MyPy didn't seem to think they had a direct subclass relationship, but the docs claim they do, so given the uncertainty about whether the stubs package I used is official, I'll just replace this with Transaction until we're ready to start depending on sqlalchemy stubs.

else:
rejections.append(
f"Not searching for dataset {datasetType.name!r} in CALIBRATION collection "
f"{collectionRecord.name!r} because temporal calibration queries are't "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: are't, same below.

This fixes an edge-case bug involving mixed CALIBRATION/non-CALIBRATION
searches introduced in the previous refactoring commit.

This involves a small internal API change for DatasetRecordStorage -
it now returns a SQLAlchemy object directly instead of one of our
SimpleQuery instances, since it sometimes now returns a UNION.
These were revealed by installing the sqlalchemy-stubs package locally.
That's something we should consider doing more broadly (e.g. in CI), as
it revealed a lot more than just these, but it seems tangled up with
migrating more fully to SQLAlchemy 2.0 APIs (there is a different stubs
package for SQLAlchemy 2), and since that's not even fully released yet
doing much more seems premature.
@TallJimbo TallJimbo merged commit cbf5f61 into main Apr 11, 2022
@TallJimbo TallJimbo deleted the tickets/DM-34247 branch April 11, 2022 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants