Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-26590: Check query expressions for completeness and consistency #433

Merged
merged 5 commits into from Nov 17, 2020

Conversation

TallJimbo
Copy link
Member

No description provided.

@TallJimbo TallJimbo force-pushed the tickets/DM-26590 branch 3 times, most recently from d7ad481 to 68cd4a4 Compare November 13, 2020 14:15
Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK, there is small bunch of comments mostly for docstrings. I did not manage to check every piece of normal form logic, I hope unit tests cover that extensively.

Couple of general comments. I wish you did not have to implement whole DNF/CNF logic yourself and we could you some third party package for that. The one that I know of is sympy but that is a huge dependency, not worth adding it to daf_butler, not sure if there is anything else out there.

After checking all visitor login I'm not sure how it handles IN operator, e.g. tract = 0 AND detector IN (1,2,3). I tried ci_hsc_gen3 by replacing patch=69 with patch IN (69,70) and it seems to have worked when parsing that expression but then it failed with unrelated error.

tests/test_normalFormExpression.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/registry/queries/expressions.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/registry/queries/expressions.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/registry/queries/expressions.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/registry/queries/expressions.py Outdated Show resolved Hide resolved
These are not used yet in the query system, but when we hook them up
they will let us detect and fix common errors (like missing governor
constraints) and determine which governor values are in play in a
query - that will in turn be important for knowing whether we have the
necessary spatial overlaps materialized to even execute a query.
Some things probably would have broken if anyone had tried to change
one at the wrong time before, and that'll become more true in the
future.

This also removes dataclass usage from QuerySummary and
QueryWhereExpression; it had outlived its usefulness (we already needed
custom __init__s) and using dataclass(frozen=True) to make these
immutable would have broken their __init__s.
Future functionality will require this, to track which expression
branches provide different bits of information, but it's better to
get this refactoring-only change done first.
This restructures the QueryWhereExpression into a pair of classes,
with the new QuereWhereClause holding information about what's in the
query and picking up some attributes that were previously part of
QuerySummary.  QuerySummary now has a QueryWhereClause instead of a
QueryWhereExpression, and transforming from the latter to the former
(via QueryWhereExpression.attach) is what performs all of the new
checking.

That checking is implemented in registry/queries/expressions.py, which
includes some small modifications to InspectionVisitor to make it
something the new CheckVisitor can delegate to.  CheckVisitor and its
Summary helper classes is where all of the new checks happen.

Because checking can produce some false positives, all of the Registry
query methods now have a 'check' kwarg that can be used to turn it
off.  At present I'm not planning to expose that in command-line
utilities - I'm hoping we can get by with always checking those
(especially the queries that come from QuantumGraph generation).

This implements DM-26950, but I'm reserving the right to follow this
with a commit that changes the default behavior to check=False, and
let DM-26590 to revert that commit later, if check=True causes too much
disruption downstream.
@TallJimbo
Copy link
Member Author

Sympy is the only thing I could find, too.

After checking all visitor logic I'm not sure how it handles IN operator, e.g. tract = 0 AND detector IN (1,2,3)

The IN operator is handled correctly for dependent dimensions, like tract and detector. There's one at least one (preexisting) test in daf_butler that uses that syntax (on tract and patch), and these changes correctly caused it to fail before I added a skymap constraint to the expression. I have not tried to make IN work as a way to provide governor dimension constraints, but explicitly called that out in the error message to tell users they need to rewrite their expressions with plain = for those.

@TallJimbo TallJimbo merged commit 0c9fe88 into master Nov 17, 2020
@TallJimbo TallJimbo deleted the tickets/DM-26590 branch November 17, 2020 04:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants