Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Extend the optimisations for equalities #50792

Merged
merged 5 commits into from
Jan 14, 2020

Conversation

bpintea
Copy link
Contributor

@bpintea bpintea commented Jan 9, 2020

This PR supplements the optimisations of equalities in conjunctions
and disjunctions:

  • for conjunctions, the existing optimizations with ranges are extended
    with not-equalities and inequalities; these lead to a fast resolution,
    the conjunction either being evaluate to a FALSE, or the non-equality
    conditions being dropped as superfluous;
  • optimisations for disjunctions are added to be applied against ranges,
    inequalities and not-equalities; these lead to disjunction either
    becoming TRUE or the equality being dropped, either as superfluous or
    merged into a range/inequality.

Adresses #49637

This commit supplements the optimisations of equalities in conjunctions
and disjunctions:
* for conjunctions, the existing optimizations with ranges are extended
with not-equalities and inequalities; these lead to a fast resolution,
the conjunction either being evaluate to a FALSE, or the non-equality
conditions being dropped as superfluous;
* optimisations for disjunctions are added to be applied against ranges,
inequalities and not-equalities; these lead to disjunction either
becoming TRUE or the equality being dropped, either as superfluous or
merged into a range/inequality.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/SQL)

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments but otherwise LGTM!

List<BinaryComparison> equals = new ArrayList<>();
// Only equalities, not-equalities and inequalities with a foldable .right are extracted separately;
// the others go into the general 'exps'.
List<BinaryComparison> frEquals = new ArrayList<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does fr stand for? Can you expand the name to the full word?

// a = 2 OR a != 5 -> TRUE; a = 2 OR a = 5 -> nop
private Expression propagate(Or or) {
List<Expression> exps = new ArrayList<>();
List<Equals> frEquals = new ArrayList<>(); // foldable right term Equals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that fr comes from foldable right - I would remove the prefix since there's no foldable left and the collections are used for compacting or selecting the expressions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, removed it. I didn't like it either, tbh, but was trying to avoid the confusion as to why not all references of a certain type are part of the respective container. But with the added comment (and code) it should be clear enough, indeed.

List<BinaryComparison> frInequalities = new ArrayList<>(); // foldable right term (=limit) BinaryComparision

// split expressions by type
for (Expression ex : Predicates.splitOr(or)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this handled already by the folding rule + BooleanSimplification?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's not, no.
(But I'll happily be corrected.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or at:


And at:

That is each individual and/or is evaluated if possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look at that, but the simplification added by the code isn't covered there. Maybe that's what the // TODO: eliminate conjunction/disjunction was hinting to?

(BTW and related to the last suggestions below, the comments (a || b) && (a || c) => a && (b || c) and (a && b) || (a && c) => a || (b & c) will also need fixing (the code does the right thing, though) -- I'll address this too.

updated = true;
} else if (lowerComp != null && upperComp != null) {
if (0 < lowerComp && upperComp < 0) { // a = 2 OR 1 < a < 3
iterEq.remove(); // equality is superfluous
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth reviewing the code that does removal through position (for (int i = ...) { ranges.remove(i); } ) since it's incorrect as after a removal it skips the next item as the counter is not decremented.
Moving that to use iterator removal is better (though more verbose).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the ranges index-based looping doesn't do removals in the ranges lists, only updates at current index (for cases like a = 2 OR 2 < a < ? -> 2 <= a < ?); in this case, the removal is operated on the equals list, but that's done through the iterator.
(But let me know please if I still miss something.)

Copy link
Member

@costin costin Jan 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal since safe since it is followed by an add the iteration is not affected but inside CombineBinaryComparisons there are some removals that are not compensated for


which means the next entry in the bcs list might skipped.
Worth reviewing (and documenting).

But that's a separate issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Sure, I'll address that in a separate issue.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice simplification work.
I have mostly cosmetic related comments, but also one that I believe needs a second look over.

List<BinaryComparison> equals = new ArrayList<>();
// Only equalities, not-equalities and inequalities with a foldable .right are extracted separately;
// the others go into the general 'exps'.
List<BinaryComparison> frEquals = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me few seconds to determine what fr stands for. My personal preference is to name these variables as foldRightEq or foldRightEquals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it wasn't a great solution.. I've gone ahead and adopted Costin's solution to simply remove the prefix. I've considered the foldableRight prefix, but thought that having it would slightly reduce the var distinguishability due to lenght (i.e. foldableRightEqual vs foldableRightNotEqual vs foldableRightInequalities etc.); and fold isn't super clear imo: foldable vs folded vs..?
But no strong feelings about it.

// evaluate all inequalities against the Equal
for (Iterator<BinaryComparison> iter = frInequalities.iterator(); iter.hasNext(); ) {
BinaryComparison bc = iter.next();
if (! eq.left().semanticEquals(bc.left())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we prefer the eq.left().semanticEquals(bc.left()) == false format for better visibility.
Actually, I would have kept the same "style" as the previous conditional statements and not use continue, but the positive outcome of semanticEquals and include the following statements in a bigger if clause.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we prefer the eq.left().semanticEquals(bc.left()) == false format for better visibility.

Ah, correct, I think it's even in a guideline somewhere the format with just a variable.

I would have kept the same "style" as the previous conditional statements and not use continue

I find the more convoluted loops easier to read when flattened, but it's only a slight preference, so I've refactored it in this case.

for (Iterator<NotEquals> iter = frNotEquals.iterator(); iter.hasNext(); ) {
NotEquals neq = iter.next();
if (eq.left().semanticEquals(neq.left())) {
Integer comp = BinaryComparison.compare(eq.right().fold(), neq.right().fold());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of eq.right().fold() you could have used eqValue, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! Thanks.

}
}

return changed ? Predicates.combineAnd(CollectionUtils.combine(exps, frEquals, frNotEquals, frInequalities, ranges)) : and;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return changed == true ? Predicates......

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep this as is, inline with the style of the rest of the class (my change only affected the parameters passed to CollectionUtils.combine).


// Equals OR NotEquals
for (NotEquals neq : frNotEquals) {
if (eq.left().semanticEquals(neq.left())) { // a = 2 OR a != ? -> TRUE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this evaluated to true right away?

An OR is translated in ES query DSL as a combination of two should statements for a bool query and saying field_x = 2 OR field_x != 4 translates into give me all documents that have field_x = 2 OR field_x != 4 which is in fact a collection of documents with field_x = 2 and documents with field_x != 4. I don't think that statement should be evaluated straight to TRUE, but simplified to field_x != 4.

Evaluating it to TRUE means give me all documents.
Keeping the OR as is means give me the docs with field_x != 4.
These two ^ are not equivalent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot, thanks!
The TRUE is only valid if the disjunction terms refer the same value. I've fixed it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test to double check this is the case.

// Equals OR Range
for (int i = 0; i < ranges.size(); i ++) { // might modify list, so use index loop
Range range = ranges.get(i);
if (! eq.left().semanticEquals(range.value())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: == false instead of !.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

// Equals OR Inequality
for (int i = 0; i < frInequalities.size(); i ++) {
BinaryComparison bc = frInequalities.get(i);
if (! eq.left().semanticEquals(bc.left())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

== false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

continue;
} else if (comp == 0 && bc instanceof GreaterThan) { // a = 2 OR a > 2 -> a >= 2
frInequalities.set(i, new GreaterThanOrEqual(bc.source(), bc.left(), bc.right()));
} // else (0 < comp) : // a = 3 OR a > 2 -> a > 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented potential condition here - 0 < comp - is not entirely true. A potential else branch can also happen for bc instanceof GreaterThanOrEqual. I don't think it's changing the logic, but just saying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, better be thorough also in comments. Fixed in both affected places.

iterEq.remove(); // update range with equality instead or simply superfluous
updated = true;
} else if (bc instanceof LessThan || bc instanceof LessThanOrEqual) {
if (0 < comp) { // a = 2 OR a < 1 -> nop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you, please, be consistent in the code format? The previous if branch used (comp < 0) and then else if. In this one you are switching places between variable and constant and there is no else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally prefer the "0x axis" spacial ordering for faster understanding - the same reason why ranges are mostly given like a < X < b and less so the other way around -, but I've updated the ordering here.

}

return changed ? Predicates.combineAnd(CollectionUtils.combine(exps, equals, ranges)) : and;
return updated ? Predicates.combineOr(CollectionUtils.combine(exps, frEquals, frNotEquals, frInequalities, ranges)) : or;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated == true ? Predicates.....

* Fix the bug around wrongly optimizing 'a=2 OR a!=?', which only yields
TRUE for same values in equality and inequality.
* Var renamings, code style adjustments, comments corrections.
@costin
Copy link
Member

costin commented Jan 10, 2020

As a separate improvements, we should move away from returning Literal.TRUE/FALSE since these constants don't preserve the Source, which typically is not needed but why drop it.
Instead the Source of the folded expression should be kept around, e.g.:

return FALSE -> return Literal.of(bc, Boolean.FALSE);

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

- fix a few code comments;
- extend the Equals OR NotEquals optimitsation (a=2 OR a!=5 -> a!=5);
- extend the Equals OR Range optimisation on limits equality (a=2 OR
  2<=a<5 -> 2<=a<5);
- in case an equality is being removed in a conjunction, the rest of
  possible optimisations to test is now skipped.
@bpintea bpintea requested a review from astefan January 13, 2020 15:43
@costin
Copy link
Member

costin commented Jan 14, 2020

As a separate improvements, we should move away from returning Literal.TRUE/FALSE since these constants don't preserve the Source, which typically is not needed but why drop it.
Instead the Source of the folded expression should be kept around, e.g.:

return FALSE -> return Literal.of(bc, Boolean.FALSE);

I've addressed that in #50962

@bpintea bpintea merged commit 62e7c6a into elastic:master Jan 14, 2020
@bpintea bpintea deleted the enh/propagate-equals-optimisations branch January 14, 2020 16:59
@bpintea bpintea added v7.7.0 and removed v7.6.0 labels Jan 15, 2020
bpintea added a commit that referenced this pull request Jan 16, 2020
* Extend the optimizations for equalities

This commit supplements the optimisations of equalities in conjunctions
and disjunctions:
* for conjunctions, the existing optimizations with ranges are extended
with not-equalities and inequalities; these lead to a fast resolution,
the conjunction either being evaluate to a FALSE, or the non-equality
conditions being dropped as superfluous;
* optimisations for disjunctions are added to be applied against ranges,
inequalities and not-equalities; these lead to disjunction either
becoming TRUE or the equality being dropped, either as superfluous or
merged into a range/inequality.

* Adress review notes

* Fix the bug around wrongly optimizing 'a=2 OR a!=?', which only yields
TRUE for same values in equality and inequality.
* Var renamings, code style adjustments, comments corrections.

* Address further review comments. Extend optim.

- fix a few code comments;
- extend the Equals OR NotEquals optimitsation (a=2 OR a!=5 -> a!=5);
- extend the Equals OR Range optimisation on limits equality (a=2 OR
  2<=a<5 -> 2<=a<5);
- in case an equality is being removed in a conjunction, the rest of
  possible optimisations to test is now skipped.

* rename one var for better legiblity

- s/rmEqual/removeEquals

(cherry picked from commit 62e7c6a)
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
* Extend the optimizations for equalities

This commit supplements the optimisations of equalities in conjunctions
and disjunctions:
* for conjunctions, the existing optimizations with ranges are extended
with not-equalities and inequalities; these lead to a fast resolution,
the conjunction either being evaluate to a FALSE, or the non-equality
conditions being dropped as superfluous;
* optimisations for disjunctions are added to be applied against ranges,
inequalities and not-equalities; these lead to disjunction either
becoming TRUE or the equality being dropped, either as superfluous or
merged into a range/inequality.

* Adress review notes

* Fix the bug around wrongly optimizing 'a=2 OR a!=?', which only yields
TRUE for same values in equality and inequality.
* Var renamings, code style adjustments, comments corrections.

* Address further review comments. Extend optim.

- fix a few code comments;
- extend the Equals OR NotEquals optimitsation (a=2 OR a!=5 -> a!=5);
- extend the Equals OR Range optimisation on limits equality (a=2 OR
  2<=a<5 -> 2<=a<5);
- in case an equality is being removed in a conjunction, the rest of
  possible optimisations to test is now skipped.

* rename one var for better legiblity

- s/rmEqual/removeEquals
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants