New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-34247: simplify dataset subquery logic and fix edge-case bugs #670
Conversation
Codecov Report
@@ Coverage Diff @@
## main #670 +/- ##
==========================================
+ Coverage 84.29% 84.32% +0.02%
==========================================
Files 242 242
Lines 31019 31055 +36
Branches 5219 5219
==========================================
+ Hits 26149 26187 +38
Misses 3706 3706
+ Partials 1164 1162 -2
Continue to review full report at Codecov.
|
198df4b
to
68e4855
Compare
cdf20cd
to
94de327
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, a couple of minor comments.
else: | ||
# We can never find a non-calibration dataset in a | ||
# CALIBRATION collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-reading pre-existing code - I think this comment is not quite right, in this else
branch dataset type can still be calibration (if collectionRecord.name
is not in explicitCollections
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe this was an edge-case bug that is now fixed, and the comment is now correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, nevermind; this wasn't fixed - I had started, and then decided to the refactor first, and then forgot to come back to it. Fixed on another commit.
ingestDate=SimpleQuery.Select if isResult else None, | ||
# If this dataset type has no dimensions, we're in danger of | ||
# generating an invalid subquery that has no columns in the | ||
# SELECT clause. Any easy fix is to just select some arbitrary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any -> An?
80709d5
to
35424d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, minor comments.
# find-first search is just a regular result subquery. | ||
if len(collections) == 1: | ||
# find-first search is just a regular result subquery. Same is true | ||
# if this is a doomed query with no collectoins to search. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: collectoins
c36db32
to
db1303e
Compare
This used to be one massive method; now it's two medium-large methods and two very simple ones, and it's much easier to follow. This shouldn't change the query behavior at all, but it does collapse some UNION ALL constructs by merging their IN clauses when they are otherwise the same - the original code had an unnecessary distinction between CALIBRATION and other collection types (as well as some necessary distinctions, which this preserves).
...which I just merged without a changelog entry.
This prevents query construction from failing when someone queries for a skypix-dimension dataset type in collections where none exist. When the datasets do exist, we rely on the dataset table to bring in the skypix dimension, since those don't have tables of their own, and we need to do the same when the query is doomed to avoid triggering an exception about skypix dimensions not being directly joinable.
db1303e
to
10b2f76
Compare
@@ -511,6 +514,7 @@ def transaction( | |||
# `Connection.in_nested_transaction()` method. | |||
savepoint = savepoint or connection.info.get(_IN_SAVEPOINT_TRANSACTION, False) | |||
connection.info[_IN_SAVEPOINT_TRANSACTION] = savepoint | |||
trans: Union[sqlalchemy.engine.Transaction, sqlalchemy.engine.NestedTransaction] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NestedTransaction should be a subclass of Transaction, do you need explicit NestedTransaction here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MyPy didn't seem to think they had a direct subclass relationship, but the docs claim they do, so given the uncertainty about whether the stubs package I used is official, I'll just replace this with Transaction
until we're ready to start depending on sqlalchemy stubs.
else: | ||
rejections.append( | ||
f"Not searching for dataset {datasetType.name!r} in CALIBRATION collection " | ||
f"{collectionRecord.name!r} because temporal calibration queries are't " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: are't
, same below.
This fixes an edge-case bug involving mixed CALIBRATION/non-CALIBRATION searches introduced in the previous refactoring commit. This involves a small internal API change for DatasetRecordStorage - it now returns a SQLAlchemy object directly instead of one of our SimpleQuery instances, since it sometimes now returns a UNION.
These were revealed by installing the sqlalchemy-stubs package locally. That's something we should consider doing more broadly (e.g. in CI), as it revealed a lot more than just these, but it seems tangled up with migrating more fully to SQLAlchemy 2.0 APIs (there is a different stubs package for SQLAlchemy 2), and since that's not even fully released yet doing much more seems premature.
10b2f76
to
d168fa6
Compare
Checklist
doc/changes