Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Extend the optimisations for equalities #50792

Merged
merged 5 commits into from
Jan 14, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Iterator;
import java.util.LinkedHashMap;
import java.util.LinkedHashSet;
import java.util.LinkedList;
Expand Down Expand Up @@ -958,9 +959,11 @@ private Expression literalToTheRight(BinaryOperator<?, ?, ?, ?> be) {
}

/**
* Propagate Equals to eliminate conjuncted Ranges.
* When encountering a different Equals or non-containing {@link Range}, the conjunction becomes false.
* When encountering a containing {@link Range}, the range gets eliminated by the equality.
* Propagate Equals to eliminate conjuncted Ranges or BinaryComparisons.
* When encountering a different Equals, non-containing {@link Range} or {@link BinaryComparison}, the conjunction becomes false.
* When encountering a containing {@link Range}, {@link BinaryComparison} or {@link NotEquals}, these get eliminated by the equality.
*
* Since this rule can eliminate Ranges and BinaryComparisons, it should be applied before {@link CombineBinaryComparisons}.
*
* This rule doesn't perform any promotion of {@link BinaryComparison}s, that is handled by
* {@link CombineBinaryComparisons} on purpose as the resulting Range might be foldable
Expand All @@ -976,14 +979,20 @@ static class PropagateEquals extends OptimizerExpressionRule {
protected Expression rule(Expression e) {
if (e instanceof And) {
return propagate((And) e);
} else if (e instanceof Or) {
return propagate((Or) e);
}
return e;
}

// combine conjunction
private Expression propagate(And and) {
List<Range> ranges = new ArrayList<>();
List<BinaryComparison> equals = new ArrayList<>();
// Only equalities, not-equalities and inequalities with a foldable .right are extracted separately;
// the others go into the general 'exps'.
List<BinaryComparison> frEquals = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me few seconds to determine what fr stands for. My personal preference is to name these variables as foldRightEq or foldRightEquals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it wasn't a great solution.. I've gone ahead and adopted Costin's solution to simply remove the prefix. I've considered the foldableRight prefix, but thought that having it would slightly reduce the var distinguishability due to lenght (i.e. foldableRightEqual vs foldableRightNotEqual vs foldableRightInequalities etc.); and fold isn't super clear imo: foldable vs folded vs..?
But no strong feelings about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does fr stand for? Can you expand the name to the full word?

List<NotEquals> frNotEquals = new ArrayList<>();
List<BinaryComparison> frInequalities = new ArrayList<>();
List<Expression> exps = new ArrayList<>();

boolean changed = false;
Expand All @@ -995,36 +1004,43 @@ private Expression propagate(And and) {
BinaryComparison otherEq = (BinaryComparison) ex;
// equals on different values evaluate to FALSE
if (otherEq.right().foldable()) {
for (BinaryComparison eq : equals) {
// cannot evaluate equals so skip it
if (!eq.right().foldable()) {
continue;
}
for (BinaryComparison eq : frEquals) {
if (otherEq.left().semanticEquals(eq.left())) {
if (eq.right().foldable() && otherEq.right().foldable()) {
Integer comp = BinaryComparison.compare(eq.right().fold(), otherEq.right().fold());
if (comp != null) {
// var cannot be equal to two different values at the same time
if (comp != 0) {
return FALSE;
}
Integer comp = BinaryComparison.compare(eq.right().fold(), otherEq.right().fold());
if (comp != null) {
// var cannot be equal to two different values at the same time
if (comp != 0) {
return FALSE;
}
}
}
}
frEquals.add(otherEq);
} else {
exps.add(otherEq);
}
} else if (ex instanceof GreaterThan || ex instanceof GreaterThanOrEqual ||
ex instanceof LessThan || ex instanceof LessThanOrEqual) {
BinaryComparison bc = (BinaryComparison) ex;
if (bc.right().foldable()) {
frInequalities.add(bc);
} else {
exps.add(ex);
}
} else if (ex instanceof NotEquals) {
NotEquals otherNotEq = (NotEquals) ex;
if (otherNotEq.right().foldable()) {
frNotEquals.add(otherNotEq);
} else {
exps.add(ex);
}
equals.add(otherEq);
} else {
exps.add(ex);
}
}

// check
for (BinaryComparison eq : equals) {
// cannot evaluate equals so skip it
if (!eq.right().foldable()) {
continue;
}
for (BinaryComparison eq : frEquals) {
Object eqValue = eq.right().fold();

for (int i = 0; i < ranges.size(); i++) {
Expand Down Expand Up @@ -1060,9 +1076,171 @@ private Expression propagate(And and) {
changed = true;
}
}

// evaluate all NotEquals against the Equal
for (Iterator<NotEquals> iter = frNotEquals.iterator(); iter.hasNext(); ) {
NotEquals neq = iter.next();
if (eq.left().semanticEquals(neq.left())) {
Integer comp = BinaryComparison.compare(eq.right().fold(), neq.right().fold());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of eq.right().fold() you could have used eqValue, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! Thanks.

if (comp != null) {
if (comp == 0) {
return FALSE; // clashing and conflicting: a = 1 AND a != 1
} else {
iter.remove(); // clashing and redundant: a = 1 AND a != 2
changed = true;
}
}
}
}

// evaluate all inequalities against the Equal
for (Iterator<BinaryComparison> iter = frInequalities.iterator(); iter.hasNext(); ) {
BinaryComparison bc = iter.next();
if (! eq.left().semanticEquals(bc.left())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we prefer the eq.left().semanticEquals(bc.left()) == false format for better visibility.
Actually, I would have kept the same "style" as the previous conditional statements and not use continue, but the positive outcome of semanticEquals and include the following statements in a bigger if clause.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we prefer the eq.left().semanticEquals(bc.left()) == false format for better visibility.

Ah, correct, I think it's even in a guideline somewhere the format with just a variable.

I would have kept the same "style" as the previous conditional statements and not use continue

I find the more convoluted loops easier to read when flattened, but it's only a slight preference, so I've refactored it in this case.

continue;
}

Integer compare = BinaryComparison.compare(eqValue, bc.right().fold());
if (compare == null) {
continue;
}

if (bc instanceof LessThan || bc instanceof LessThanOrEqual) { // a = 2 AND a </<= ?
if ((compare == 0 && bc instanceof LessThan) || // a = 2 AND a < 2
0 < compare) { // a = 2 AND a </<= 1
return FALSE;
}
} else if (bc instanceof GreaterThan || bc instanceof GreaterThanOrEqual) { // a = 2 AND a >/>= ?
if ((compare == 0 && bc instanceof GreaterThan) || // a = 2 AND a > 2
compare < 0) { // a = 2 AND a >/>= 3
return FALSE;
}
}

iter.remove();
changed = true;
}
}

return changed ? Predicates.combineAnd(CollectionUtils.combine(exps, frEquals, frNotEquals, frInequalities, ranges)) : and;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return changed == true ? Predicates......

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep this as is, inline with the style of the rest of the class (my change only affected the parameters passed to CollectionUtils.combine).

}

// combine disjunction:
// a = 2 OR a > 3 -> nop; a = 2 OR a > 1 -> a > 1
// a = 2 OR a < 3 -> a < 3; a = 2 OR a < 1 -> nop
// a = 2 OR 3 < a < 5 -> nop; a = 2 OR 1 < a < 3 -> 1 < a < 3; a = 2 OR 0 < a < 1 -> nop
// a = 2 OR a != 5 -> TRUE; a = 2 OR a = 5 -> nop
Copy link
Contributor

@astefan astefan Jan 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a = 2 OR a != 5 -> TRUE I think it's not right. Please, see above more details about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, thanks. Will update along the a = 2 OR a != 5 -> a != 5 change.

private Expression propagate(Or or) {
List<Expression> exps = new ArrayList<>();
List<Equals> frEquals = new ArrayList<>(); // foldable right term Equals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that fr comes from foldable right - I would remove the prefix since there's no foldable left and the collections are used for compacting or selecting the expressions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, removed it. I didn't like it either, tbh, but was trying to avoid the confusion as to why not all references of a certain type are part of the respective container. But with the added comment (and code) it should be clear enough, indeed.

List<NotEquals> frNotEquals = new ArrayList<>(); // foldable right term NotEquals
List<Range> ranges = new ArrayList<>();
List<BinaryComparison> frInequalities = new ArrayList<>(); // foldable right term (=limit) BinaryComparision

// split expressions by type
for (Expression ex : Predicates.splitOr(or)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this handled already by the folding rule + BooleanSimplification?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's not, no.
(But I'll happily be corrected.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or at:


And at:

That is each individual and/or is evaluated if possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look at that, but the simplification added by the code isn't covered there. Maybe that's what the // TODO: eliminate conjunction/disjunction was hinting to?

(BTW and related to the last suggestions below, the comments (a || b) && (a || c) => a && (b || c) and (a && b) || (a && c) => a || (b & c) will also need fixing (the code does the right thing, though) -- I'll address this too.

if (ex instanceof Equals) {
Equals eq = (Equals) ex;
if (eq.right().foldable()) {
frEquals.add(eq);
} else {
exps.add(ex);
}
} else if (ex instanceof NotEquals) {
NotEquals neq = (NotEquals) ex;
if (neq.right().foldable()) {
frNotEquals.add(neq);
} else {
exps.add(ex);
}
} else if (ex instanceof Range) {
ranges.add((Range) ex);
} else if (ex instanceof BinaryComparison) {
BinaryComparison bc = (BinaryComparison) ex;
if (bc.right().foldable()) {
frInequalities.add(bc);
} else {
exps.add(ex);
}
} else {
exps.add(ex);
}
}

boolean updated = false; // has the expression been modified?

// evaluate the impact of each Equal over the different types of Expressions
for (Iterator<Equals> iterEq = frEquals.iterator(); iterEq.hasNext(); ) {
Equals eq = iterEq.next();

// Equals OR NotEquals
for (NotEquals neq : frNotEquals) {
if (eq.left().semanticEquals(neq.left())) { // a = 2 OR a != ? -> TRUE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this evaluated to true right away?

An OR is translated in ES query DSL as a combination of two should statements for a bool query and saying field_x = 2 OR field_x != 4 translates into give me all documents that have field_x = 2 OR field_x != 4 which is in fact a collection of documents with field_x = 2 and documents with field_x != 4. I don't think that statement should be evaluated straight to TRUE, but simplified to field_x != 4.

Evaluating it to TRUE means give me all documents.
Keeping the OR as is means give me the docs with field_x != 4.
These two ^ are not equivalent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot, thanks!
The TRUE is only valid if the disjunction terms refer the same value. I've fixed it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test to double check this is the case.

return TRUE;
}
}

Object eqValue = eq.right().fold();

// Equals OR Range
for (int i = 0; i < ranges.size(); i ++) { // might modify list, so use index loop
Range range = ranges.get(i);
if (! eq.left().semanticEquals(range.value())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: == false instead of !.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

continue;
}

Integer lowerComp = range.lower().foldable() ? BinaryComparison.compare(eqValue, range.lower().fold()) : null;
Integer upperComp = range.upper().foldable() ? BinaryComparison.compare(eqValue, range.upper().fold()) : null;

if (lowerComp != null && lowerComp == 0 && !range.includeLower()) { // a = 2 OR 2 < a < ? -> 2 <= a < ?
iterEq.remove(); // update range with lower equality instead
ranges.set(i, new Range(range.source(), range.value(), range.lower(), true, range.upper(), range.includeUpper()));
updated = true;
} else if (upperComp != null && upperComp == 0 && !range.includeUpper()) { // a = 2 OR ? < a < 2 -> ? < a <= 2
iterEq.remove(); // update range with upper equality instead
ranges.set(i, new Range(range.source(), range.value(), range.lower(), range.includeLower(), range.upper(), true));
updated = true;
} else if (lowerComp != null && upperComp != null) {
if (0 < lowerComp && upperComp < 0) { // a = 2 OR 1 < a < 3
iterEq.remove(); // equality is superfluous
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth reviewing the code that does removal through position (for (int i = ...) { ranges.remove(i); } ) since it's incorrect as after a removal it skips the next item as the counter is not decremented.
Moving that to use iterator removal is better (though more verbose).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the ranges index-based looping doesn't do removals in the ranges lists, only updates at current index (for cases like a = 2 OR 2 < a < ? -> 2 <= a < ?); in this case, the removal is operated on the equals list, but that's done through the iterator.
(But let me know please if I still miss something.)

Copy link
Member

@costin costin Jan 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal since safe since it is followed by an add the iteration is not affected but inside CombineBinaryComparisons there are some removals that are not compensated for


which means the next entry in the bcs list might skipped.
Worth reviewing (and documenting).

But that's a separate issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Sure, I'll address that in a separate issue.

updated = true;
}
}
}

// Equals OR Inequality
for (int i = 0; i < frInequalities.size(); i ++) {
BinaryComparison bc = frInequalities.get(i);
if (! eq.left().semanticEquals(bc.left())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

== false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

continue;
}

Integer comp = BinaryComparison.compare(eqValue, bc.right().fold());
if (comp == null) {
continue;
}
if (bc instanceof GreaterThan || bc instanceof GreaterThanOrEqual) {
if (comp < 0) { // a = 1 OR a > 2 -> nop
continue;
} else if (comp == 0 && bc instanceof GreaterThan) { // a = 2 OR a > 2 -> a >= 2
frInequalities.set(i, new GreaterThanOrEqual(bc.source(), bc.left(), bc.right()));
} // else (0 < comp) : // a = 3 OR a > 2 -> a > 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented potential condition here - 0 < comp - is not entirely true. A potential else branch can also happen for bc instanceof GreaterThanOrEqual. I don't think it's changing the logic, but just saying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, better be thorough also in comments. Fixed in both affected places.

iterEq.remove(); // update range with equality instead or simply superfluous
updated = true;
} else if (bc instanceof LessThan || bc instanceof LessThanOrEqual) {
if (0 < comp) { // a = 2 OR a < 1 -> nop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you, please, be consistent in the code format? The previous if branch used (comp < 0) and then else if. In this one you are switching places between variable and constant and there is no else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally prefer the "0x axis" spacial ordering for faster understanding - the same reason why ranges are mostly given like a < X < b and less so the other way around -, but I've updated the ordering here.

continue;
}
if (comp == 0 && bc instanceof LessThan) { // a = 2 OR a < 2 -> a <= 2
frInequalities.set(i, new LessThanOrEqual(bc.source(), bc.left(), bc.right()));
} // else (comp < 0) { // a = 2 OR a < 3 -> a < 3
iterEq.remove(); // update range with equality instead or simply superfluous
updated = true;
}
}
}

return changed ? Predicates.combineAnd(CollectionUtils.combine(exps, equals, ranges)) : and;
return updated ? Predicates.combineOr(CollectionUtils.combine(exps, frEquals, frNotEquals, frInequalities, ranges)) : or;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated == true ? Predicates.....

}
}

Expand Down
Loading