New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-26590: Check query expressions for completeness and consistency #433
Conversation
d7ad481
to
68cd4a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK, there is small bunch of comments mostly for docstrings. I did not manage to check every piece of normal form logic, I hope unit tests cover that extensively.
Couple of general comments. I wish you did not have to implement whole DNF/CNF logic yourself and we could you some third party package for that. The one that I know of is sympy
but that is a huge dependency, not worth adding it to daf_butler
, not sure if there is anything else out there.
After checking all visitor login I'm not sure how it handles IN
operator, e.g. tract = 0 AND detector IN (1,2,3)
. I tried ci_hsc_gen3
by replacing patch=69
with patch IN (69,70)
and it seems to have worked when parsing that expression but then it failed with unrelated error.
python/lsst/daf/butler/registry/queries/exprParser/normalForm.py
Outdated
Show resolved
Hide resolved
python/lsst/daf/butler/registry/queries/exprParser/normalForm.py
Outdated
Show resolved
Hide resolved
python/lsst/daf/butler/registry/queries/exprParser/normalForm.py
Outdated
Show resolved
Hide resolved
d2440ac
to
a22c3db
Compare
These are not used yet in the query system, but when we hook them up they will let us detect and fix common errors (like missing governor constraints) and determine which governor values are in play in a query - that will in turn be important for knowing whether we have the necessary spatial overlaps materialized to even execute a query.
Some things probably would have broken if anyone had tried to change one at the wrong time before, and that'll become more true in the future. This also removes dataclass usage from QuerySummary and QueryWhereExpression; it had outlived its usefulness (we already needed custom __init__s) and using dataclass(frozen=True) to make these immutable would have broken their __init__s.
Future functionality will require this, to track which expression branches provide different bits of information, but it's better to get this refactoring-only change done first.
This restructures the QueryWhereExpression into a pair of classes, with the new QuereWhereClause holding information about what's in the query and picking up some attributes that were previously part of QuerySummary. QuerySummary now has a QueryWhereClause instead of a QueryWhereExpression, and transforming from the latter to the former (via QueryWhereExpression.attach) is what performs all of the new checking. That checking is implemented in registry/queries/expressions.py, which includes some small modifications to InspectionVisitor to make it something the new CheckVisitor can delegate to. CheckVisitor and its Summary helper classes is where all of the new checks happen. Because checking can produce some false positives, all of the Registry query methods now have a 'check' kwarg that can be used to turn it off. At present I'm not planning to expose that in command-line utilities - I'm hoping we can get by with always checking those (especially the queries that come from QuantumGraph generation). This implements DM-26950, but I'm reserving the right to follow this with a commit that changes the default behavior to check=False, and let DM-26590 to revert that commit later, if check=True causes too much disruption downstream.
a22c3db
to
e574864
Compare
Sympy is the only thing I could find, too.
The IN operator is handled correctly for dependent dimensions, like tract and detector. There's one at least one (preexisting) test in daf_butler that uses that syntax (on tract and patch), and these changes correctly caused it to fail before I added a skymap constraint to the expression. I have not tried to make IN work as a way to provide governor dimension constraints, but explicitly called that out in the error message to tell users they need to rewrite their expressions with plain |
No description provided.