-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Extend the optimisations for equalities #50792
SQL: Extend the optimisations for equalities #50792
Conversation
This commit supplements the optimisations of equalities in conjunctions and disjunctions: * for conjunctions, the existing optimizations with ranges are extended with not-equalities and inequalities; these lead to a fast resolution, the conjunction either being evaluate to a FALSE, or the non-equality conditions being dropped as superfluous; * optimisations for disjunctions are added to be applied against ranges, inequalities and not-equalities; these lead to disjunction either becoming TRUE or the equality being dropped, either as superfluous or merged into a range/inequality.
Pinging @elastic/es-search (:Search/SQL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments but otherwise LGTM!
List<BinaryComparison> equals = new ArrayList<>(); | ||
// Only equalities, not-equalities and inequalities with a foldable .right are extracted separately; | ||
// the others go into the general 'exps'. | ||
List<BinaryComparison> frEquals = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does fr
stand for? Can you expand the name to the full word?
// a = 2 OR a != 5 -> TRUE; a = 2 OR a = 5 -> nop | ||
private Expression propagate(Or or) { | ||
List<Expression> exps = new ArrayList<>(); | ||
List<Equals> frEquals = new ArrayList<>(); // foldable right term Equals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that fr comes from foldable right - I would remove the prefix since there's no foldable left and the collections are used for compacting or selecting the expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, removed it. I didn't like it either, tbh, but was trying to avoid the confusion as to why not all references of a certain type are part of the respective container. But with the added comment (and code) it should be clear enough, indeed.
List<BinaryComparison> frInequalities = new ArrayList<>(); // foldable right term (=limit) BinaryComparision | ||
|
||
// split expressions by type | ||
for (Expression ex : Predicates.splitOr(or)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this handled already by the folding rule + BooleanSimplification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's not, no.
(But I'll happily be corrected.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or at:
elasticsearch/x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/optimizer/Optimizer.java
Line 838 in 54c1a59
if (bc instanceof Or) { |
And at:
elasticsearch/x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/optimizer/Optimizer.java
Line 809 in 54c1a59
if (FALSE.equals(l) || FALSE.equals(r)) { |
That is each individual and/or is evaluated if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look at that, but the simplification added by the code isn't covered there. Maybe that's what the // TODO: eliminate conjunction/disjunction
was hinting to?
(BTW and related to the last suggestions below, the comments (a || b) && (a || c) => a && (b || c)
and (a && b) || (a && c) => a || (b & c)
will also need fixing (the code does the right thing, though) -- I'll address this too.
updated = true; | ||
} else if (lowerComp != null && upperComp != null) { | ||
if (0 < lowerComp && upperComp < 0) { // a = 2 OR 1 < a < 3 | ||
iterEq.remove(); // equality is superfluous |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth reviewing the code that does removal through position (for (int i = ...) { ranges.remove(i); }
) since it's incorrect as after a removal it skips the next item as the counter is not decremented.
Moving that to use iterator removal is better (though more verbose).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the ranges
index-based looping doesn't do removals in the ranges
lists, only updates at current index (for cases like a = 2 OR 2 < a < ? -> 2 <= a < ?
); in this case, the removal is operated on the equals
list, but that's done through the iterator.
(But let me know please if I still miss something.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removal since safe since it is followed by an add the iteration is not affected but inside CombineBinaryComparisons
there are some removals that are not compensated for
elasticsearch/x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/optimizer/Optimizer.java
Line 1138 in 54c1a59
bcs.remove(j); |
which means the next entry in the
bcs
list might skipped.Worth reviewing (and documenting).
But that's a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Sure, I'll address that in a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice simplification work.
I have mostly cosmetic related comments, but also one that I believe needs a second look over.
List<BinaryComparison> equals = new ArrayList<>(); | ||
// Only equalities, not-equalities and inequalities with a foldable .right are extracted separately; | ||
// the others go into the general 'exps'. | ||
List<BinaryComparison> frEquals = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me few seconds to determine what fr
stands for. My personal preference is to name these variables as foldRightEq
or foldRightEquals
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it wasn't a great solution.. I've gone ahead and adopted Costin's solution to simply remove the prefix. I've considered the foldableRight
prefix, but thought that having it would slightly reduce the var distinguishability due to lenght (i.e. foldableRightEqual
vs foldableRightNotEqual
vs foldableRightInequalities
etc.); and fold
isn't super clear imo: foldable vs folded vs..?
But no strong feelings about it.
// evaluate all inequalities against the Equal | ||
for (Iterator<BinaryComparison> iter = frInequalities.iterator(); iter.hasNext(); ) { | ||
BinaryComparison bc = iter.next(); | ||
if (! eq.left().semanticEquals(bc.left())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we prefer the eq.left().semanticEquals(bc.left()) == false
format for better visibility.
Actually, I would have kept the same "style" as the previous conditional statements and not use continue
, but the positive outcome of semanticEquals
and include the following statements in a bigger if
clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we prefer the eq.left().semanticEquals(bc.left()) == false format for better visibility.
Ah, correct, I think it's even in a guideline somewhere the format with just a variable.
I would have kept the same "style" as the previous conditional statements and not use
continue
I find the more convoluted loops easier to read when flattened, but it's only a slight preference, so I've refactored it in this case.
for (Iterator<NotEquals> iter = frNotEquals.iterator(); iter.hasNext(); ) { | ||
NotEquals neq = iter.next(); | ||
if (eq.left().semanticEquals(neq.left())) { | ||
Integer comp = BinaryComparison.compare(eq.right().fold(), neq.right().fold()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of eq.right().fold()
you could have used eqValue
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right! Thanks.
} | ||
} | ||
|
||
return changed ? Predicates.combineAnd(CollectionUtils.combine(exps, frEquals, frNotEquals, frInequalities, ranges)) : and; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return changed == true ? Predicates......
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd keep this as is, inline with the style of the rest of the class (my change only affected the parameters passed to CollectionUtils.combine
).
|
||
// Equals OR NotEquals | ||
for (NotEquals neq : frNotEquals) { | ||
if (eq.left().semanticEquals(neq.left())) { // a = 2 OR a != ? -> TRUE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this evaluated to true right away?
An OR
is translated in ES query DSL as a combination of two should
statements for a bool
query and saying field_x = 2
OR field_x != 4
translates into give me all documents that have field_x = 2
OR field_x != 4
which is in fact a collection of documents with field_x = 2
and documents with field_x != 4
. I don't think that statement should be evaluated straight to TRUE, but simplified to field_x != 4
.
Evaluating it to TRUE
means give me all documents.
Keeping the OR
as is means give me the docs with field_x != 4
.
These two ^ are not equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot, thanks!
The TRUE
is only valid if the disjunction terms refer the same value. I've fixed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test to double check this is the case.
// Equals OR Range | ||
for (int i = 0; i < ranges.size(); i ++) { // might modify list, so use index loop | ||
Range range = ranges.get(i); | ||
if (! eq.left().semanticEquals(range.value())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above: == false
instead of !
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
// Equals OR Inequality | ||
for (int i = 0; i < frInequalities.size(); i ++) { | ||
BinaryComparison bc = frInequalities.get(i); | ||
if (! eq.left().semanticEquals(bc.left())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
== false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
continue; | ||
} else if (comp == 0 && bc instanceof GreaterThan) { // a = 2 OR a > 2 -> a >= 2 | ||
frInequalities.set(i, new GreaterThanOrEqual(bc.source(), bc.left(), bc.right())); | ||
} // else (0 < comp) : // a = 3 OR a > 2 -> a > 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commented potential condition here - 0 < comp
- is not entirely true. A potential else
branch can also happen for bc instanceof GreaterThanOrEqual
. I don't think it's changing the logic, but just saying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, better be thorough also in comments. Fixed in both affected places.
iterEq.remove(); // update range with equality instead or simply superfluous | ||
updated = true; | ||
} else if (bc instanceof LessThan || bc instanceof LessThanOrEqual) { | ||
if (0 < comp) { // a = 2 OR a < 1 -> nop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you, please, be consistent in the code format? The previous if
branch used (comp < 0)
and then else if
. In this one you are switching places between variable and constant and there is no else
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally prefer the "0x axis" spacial ordering for faster understanding - the same reason why ranges are mostly given like a < X < b
and less so the other way around -, but I've updated the ordering here.
} | ||
|
||
return changed ? Predicates.combineAnd(CollectionUtils.combine(exps, equals, ranges)) : and; | ||
return updated ? Predicates.combineOr(CollectionUtils.combine(exps, frEquals, frNotEquals, frInequalities, ranges)) : or; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated == true ? Predicates.....
* Fix the bug around wrongly optimizing 'a=2 OR a!=?', which only yields TRUE for same values in equality and inequality. * Var renamings, code style adjustments, comments corrections.
As a separate improvements, we should move away from returning return FALSE -> return Literal.of(bc, Boolean.FALSE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
- fix a few code comments; - extend the Equals OR NotEquals optimitsation (a=2 OR a!=5 -> a!=5); - extend the Equals OR Range optimisation on limits equality (a=2 OR 2<=a<5 -> 2<=a<5); - in case an equality is being removed in a conjunction, the rest of possible optimisations to test is now skipped.
- s/rmEqual/removeEquals
…als-optimisations
I've addressed that in #50962 |
* Extend the optimizations for equalities This commit supplements the optimisations of equalities in conjunctions and disjunctions: * for conjunctions, the existing optimizations with ranges are extended with not-equalities and inequalities; these lead to a fast resolution, the conjunction either being evaluate to a FALSE, or the non-equality conditions being dropped as superfluous; * optimisations for disjunctions are added to be applied against ranges, inequalities and not-equalities; these lead to disjunction either becoming TRUE or the equality being dropped, either as superfluous or merged into a range/inequality. * Adress review notes * Fix the bug around wrongly optimizing 'a=2 OR a!=?', which only yields TRUE for same values in equality and inequality. * Var renamings, code style adjustments, comments corrections. * Address further review comments. Extend optim. - fix a few code comments; - extend the Equals OR NotEquals optimitsation (a=2 OR a!=5 -> a!=5); - extend the Equals OR Range optimisation on limits equality (a=2 OR 2<=a<5 -> 2<=a<5); - in case an equality is being removed in a conjunction, the rest of possible optimisations to test is now skipped. * rename one var for better legiblity - s/rmEqual/removeEquals (cherry picked from commit 62e7c6a)
* Extend the optimizations for equalities This commit supplements the optimisations of equalities in conjunctions and disjunctions: * for conjunctions, the existing optimizations with ranges are extended with not-equalities and inequalities; these lead to a fast resolution, the conjunction either being evaluate to a FALSE, or the non-equality conditions being dropped as superfluous; * optimisations for disjunctions are added to be applied against ranges, inequalities and not-equalities; these lead to disjunction either becoming TRUE or the equality being dropped, either as superfluous or merged into a range/inequality. * Adress review notes * Fix the bug around wrongly optimizing 'a=2 OR a!=?', which only yields TRUE for same values in equality and inequality. * Var renamings, code style adjustments, comments corrections. * Address further review comments. Extend optim. - fix a few code comments; - extend the Equals OR NotEquals optimitsation (a=2 OR a!=5 -> a!=5); - extend the Equals OR Range optimisation on limits equality (a=2 OR 2<=a<5 -> 2<=a<5); - in case an equality is being removed in a conjunction, the rest of possible optimisations to test is now skipped. * rename one var for better legiblity - s/rmEqual/removeEquals
This PR supplements the optimisations of equalities in conjunctions
and disjunctions:
with not-equalities and inequalities; these lead to a fast resolution,
the conjunction either being evaluate to a FALSE, or the non-equality
conditions being dropped as superfluous;
inequalities and not-equalities; these lead to disjunction either
becoming TRUE or the equality being dropped, either as superfluous or
merged into a range/inequality.
Adresses #49637