Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Optimisation fixes for conjunction merges #50703

Merged
merged 2 commits into from
Jan 8, 2020

Conversation

bpintea
Copy link
Contributor

@bpintea bpintea commented Jan 7, 2020

This PR fixes the following issues around the way comparisions are
merged with ranges in conjunctions:

  • the decision to include the equality of the lower limit is corrected;
  • the selection of the upper limit is corrected to use the upper bound
    of the range;
  • the list of terms in the conjunction is sorted to have the ranges at
    the bottom; this allows subsequent binary comarisions to find compatible
    ranges and potentially be merged away. The end guarantee being that the
    optimisation takes place irrespective of the order of the conjunction
    terms in the statement.

Some comments are also corrected.

Addresses #49637

This commit fixes the following issues around the way comparisions are
merged with ranges in conjunctions:
* the decision to include the equality of the lower limit is corrected;
* the selection of the upper limit is corrected to use the upper bound
of the range;
* the list of terms in the conjunction is sorted to have the ranges at
the bottom; this allows subsequent binary comarisions to find compatible
ranges and potentially be merged away. The end guarantee being that the
optimisation takes place irrespective of the order of the conjunction
terms in the statement.

Some comments are also corrected.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/SQL)

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Thanks for catching the bug while at it - can you please double check all cases are covered by the existing test cases?

@@ -1316,7 +1331,7 @@ private boolean findConjunctiveComparisonInRange(BinaryComparison main, List<Ran
ranges.remove(i);
ranges.add(i,
new Range(other.source(), other.value(),
main.right(), lowerEq ? true : other.includeLower(),
main.right(), lowerEq ? false : main instanceof GreaterThanOrEqual,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a big miss - were there no tests for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed much like a c&p fallout.
There wasn't a test, no, so I've simply implemented the test listed as comment in the source (2 < a AND (2 <= a < 3) -> 2 < a < 3 / testCombineBinaryComparisonsAndRangeLower())

@@ -1325,19 +1340,19 @@ private boolean findConjunctiveComparisonInRange(BinaryComparison main, List<Ran
}
}
} else if (main instanceof LessThan || main instanceof LessThanOrEqual) {
if (other.lower().foldable()) {
Integer comp = BinaryComparison.compare(value, other.lower().fold());
if (other.upper().foldable()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise - a whole optimization being skipped...

@bpintea
Copy link
Contributor Author

bpintea commented Jan 8, 2020

Thanks for catching the bug while at it - can you please double check all cases are covered by the existing test cases?

I hope the coverage should be good now.
There are however some further tunings possible. For instance smth like (1 < a < 3) OR (2 < a < 4) -> (1 < a < 4) isn't optimised, or (1 < a < 2) AND (3 < a < 4) results into 3 < a < 2, which will eventually still yield the correct result, but it isn't optimised into a FALSE right away. So I'd extend this optimisation, but as a followup PR.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one comment. Otherwise, LGTM.

for (Expression ex : Predicates.splitAnd(and)) {
List<Expression> andExps = Predicates.splitAnd(and);
// Ranges need to show up before BinaryComparisons in list, to allow the latter be optimized away into a Range, if possible
andExps.sort(new Comparator<Expression>() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe extract this custom Comparator into its own variable inside CombineBinaryComparisons and re-use that, without creating it each time the combine method is called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks.
I've however replaced with a (non-capturing) lambda, which should then be optimised by the jvm.

Replace anonymous comparator of split AND Expressions with a lambda.
@costin
Copy link
Member

costin commented Jan 8, 2020

(1 < a < 3) OR (2 < a < 4) -> (1 < a < 4) that would incorrect though - take 3.5 - it doesn't pass the initial condition but it does the latter.

(1 < a < 2) AND (3 < a < 4) results into 3 < a < 2
should be optimized on the second run. The idea is that each rule modifies on piece, subsequent optimizations are applied by going through the set of rules multiple times until none match.

@bpintea
Copy link
Contributor Author

bpintea commented Jan 8, 2020

(1 < a < 3) OR (2 < a < 4) -> (1 < a < 4) that would incorrect though - take 3.5 - it doesn't pass the initial condition but it does the latter.

For conjunction, that wouldn't work, true, but overlapping disjunctions can be optimised with a simple union (3.5 satisfies the 2nd range of the disjunction).

(1 < a < 2) AND (3 < a < 4) results into 3 < a < 2
should be optimized on the second run. The idea is that each rule modifies on piece, subsequent optimizations are applied by going through the set of rules multiple times until none match.

Sure, sure. I was only wondering if this couldn't be achieved in one go.

Anyways, future potential optimisations.

@costin
Copy link
Member

costin commented Jan 8, 2020

For conjunction, that wouldn't work, true, but overlapping disjunctions can be optimised with a simple union (3.5 satisfies the 2nd range of the disjunction).

You're right, we should add that too. If you don't plan on working on it in the near future, please make an issue so we don't lose track of it.

Thanks.

@bpintea bpintea merged commit 9828cb1 into elastic:master Jan 8, 2020
@bpintea bpintea deleted the fix/conjunctive-comp-in-range branch January 8, 2020 21:36
bpintea added a commit to bpintea/elasticsearch that referenced this pull request Jan 13, 2020
* SQL: Optimisation fixes for conjunction merges

This commit fixes the following issues around the way comparisions are
merged with ranges in conjunctions:
* the decision to include the equality of the lower limit is corrected;
* the selection of the upper limit is corrected to use the upper bound
of the range;
* the list of terms in the conjunction is sorted to have the ranges at
the bottom; this allows subsequent binary comarisions to find compatible
ranges and potentially be merged away. The end guarantee being that the
optimisation takes place irrespective of the order of the conjunction
terms in the statement.

Some comments are also corrected.

* adress review observation on anon. comparator

Replace anonymous comparator of split AND Expressions with a lambda.

(cherry picked from commit 9828cb1)
bpintea added a commit that referenced this pull request Jan 13, 2020
* SQL: Optimisation fixes for conjunction merges

This commit fixes the following issues around the way comparisions are
merged with ranges in conjunctions:
* the decision to include the equality of the lower limit is corrected;
* the selection of the upper limit is corrected to use the upper bound
of the range;
* the list of terms in the conjunction is sorted to have the ranges at
the bottom; this allows subsequent binary comarisions to find compatible
ranges and potentially be merged away. The end guarantee being that the
optimisation takes place irrespective of the order of the conjunction
terms in the statement.

Some comments are also corrected.

* adress review observation on anon. comparator

Replace anonymous comparator of split AND Expressions with a lambda.

(cherry picked from commit 9828cb1)
bpintea added a commit that referenced this pull request Jan 13, 2020
* SQL: Optimisation fixes for conjunction merges

This commit fixes the following issues around the way comparisions are
merged with ranges in conjunctions:
* the decision to include the equality of the lower limit is corrected;
* the selection of the upper limit is corrected to use the upper bound
of the range;
* the list of terms in the conjunction is sorted to have the ranges at
the bottom; this allows subsequent binary comarisions to find compatible
ranges and potentially be merged away. The end guarantee being that the
optimisation takes place irrespective of the order of the conjunction
terms in the statement.

Some comments are also corrected.

* adress review observation on anon. comparator

Replace anonymous comparator of split AND Expressions with a lambda.

(cherry picked from commit 9828cb1)
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
* SQL: Optimisation fixes for conjunction merges

This commit fixes the following issues around the way comparisions are
merged with ranges in conjunctions:
* the decision to include the equality of the lower limit is corrected;
* the selection of the upper limit is corrected to use the upper bound
of the range;
* the list of terms in the conjunction is sorted to have the ranges at
the bottom; this allows subsequent binary comarisions to find compatible
ranges and potentially be merged away. The end guarantee being that the
optimisation takes place irrespective of the order of the conjunction
terms in the statement.

Some comments are also corrected.

* adress review observation on anon. comparator

Replace anonymous comparator of split AND Expressions with a lambda.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants