Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QL: Simplify arithmetic operations in binary comps #66022

Merged
merged 26 commits into from
Feb 1, 2021

Conversation

bpintea
Copy link
Contributor

@bpintea bpintea commented Dec 8, 2020

This PR adds an optimizer rule to simplify the arithmetic operations
in binary comparison expressions, which in turn will allow for further
expression compounding by the optimiser.

Only the negation and plus, minus, multiplication and division are
currently considered and only when two of the operands are a literal.

For instance ((a + 1) / 2 - 3) * 4 >= 14 becomes a >= 12.

Closes #65394.

This commit adds an optimizer rule to simplify the arithmetic operations
in binary comparison expressions, which in turn will allow for further
expression compounding by the optimiser.

Only the negation and plus, minus, multiplication and division are
currently considered and only when two of the operands are a literal.

For instance `((a + 1) / 2 - 3) * 4 >= 14` becomes `a >= 12`.
Revert unintentional change of a source file.
@elasticmachine elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label Dec 8, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is changing one behavior that might not be so obvious at first: a query like SELECT salary FROM test_emp WHERE salary + 10 >= 'x' returns no documents in master, but throws Cannot compute [-] between [String] [Integer] in this PR.

Apart from the fact that the error message is slightly incomplete - missing and between [String] and [Integer] - it's a confusing error message for the user. The minus sign is not present in the query, but present in the error message.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see more tests including non-numeric values considering the binary operators inside SQL support intervals and dates.
The cast to double is problematic - what does 4 / 2 return vs 5 / 2? And why is the double casting having to be applied manually when we do not do that for / in the first place.
What happens in case of :

date - INTERVAL 1 MONTH > '2010-01-01T01:01:01` ? 

@costin costin requested a review from palesz December 8, 2020 14:58
Copy link
Contributor

@palesz palesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one major concern that I would like to discuss before moving ahead with the PR.

@bpintea bpintea marked this pull request as draft December 9, 2020 09:10
The optimisation is now refactor splitting the logic by add-sub and
mul-div operations.
Further restictions are put in place, to prevent the
optimiser from chaning the outcome of the filtering.
More tests have been added.
@bpintea
Copy link
Contributor Author

bpintea commented Dec 14, 2020

@elasticmachine update branch

elasticmachine and others added 3 commits December 14, 2020 01:17
Removed now unused import.
- update test condition to remove (non-optimisable) integer division.
@bpintea
Copy link
Contributor Author

bpintea commented Dec 14, 2020

This PR is changing one behavior that might not be so obvious at first: a query like SELECT salary FROM test_emp WHERE salary + 10 >= 'x' returns no documents in master, but throws Cannot compute [-] between [String] [Integer] in this PR.

Thanks, I was wrongly expecting that this check had already been in place (but #49636 is still open).

Apart from the fact that the error message is slightly incomplete - missing and between [String] and [Integer] - it's a confusing error message for the user. The minus sign is not present in the query, but present in the error message.

This hasn't been reported before, so I went ahead and fixed it part of this PR.

@bpintea bpintea marked this pull request as ready for review December 14, 2020 11:41
Copy link
Contributor

@palesz palesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Halfway through, few comments and suggestions so far.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to me like this PR is made of two distinct parts.

  1. the 'bubbling' of negations (which I would rename to SimplifyDoubleNegations).
  2. The actual simplification of arithmetic in comparisons.

2 depends on 1 however 1 has value by itself and it would be easier to reason about stand-alone.

- add isMulOrDiv and isAddOrSub methods;
- add a new factory if'ace for the arithmetic operations invertible in
  relation to a binary comparison;
- simplify NEG - NEG tree structures;
- more tests.
@bpintea bpintea marked this pull request as draft January 7, 2021 16:56
@bpintea
Copy link
Contributor Author

bpintea commented Jan 7, 2021

  1. the 'bubbling' of negations (which I would rename to SimplifyDoubleNegations).

I've converted the PR back to draft to extract the rule on negations. However, I assume the rule will mostly extract the negation up (i.e. bubbling them up: DIV/MUL - NEG - x => NEG - DIV/MUL - x), rather then reduce them (i.e. NEG - NEG - x => x), so not sure if "Simplify" would be correct.

public static final class SimplifyComparisonsArithmetics extends OptimizerExpressionRule {
BiFunction<DataType, DataType, Boolean> typesCompatible;

public SimplifyComparisonsArithmetics(BiFunction<DataType, DataType, Boolean> typesCompatible) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typesCompatible -> typeCheck

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I change it, just to 2x check: the passed arguments are DataTypes::areCompatible and SqlDataTypes::areCompatible.
The code using the var also looks like return typesCompatible.apply([type1], [type2]) == false;.
Not sure if the change would bring more clarity ("what's checked?"), so asking.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments

return literalNeg == null ? bc : bc.reverse().replaceChildren(List.of(((Neg) bc.left()).field(), literalNeg));
}

static Expression safeMaybeFold(Expression expression) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't particularly like this method and, imo, it doesn't help much understanding the code. I think the return null inside the catch block bothers me.
I would have kept the original (no separate method used) block of code, as I find it easier to read.

        private static Expression reduceNegation(BinaryComparison bc) {
            Literal bcLiteral = (Literal) bc.right();
            Expression expression = new Neg(bcLiteral.source(), bcLiteral);
            if (expression.foldable()) {
                try {
                    expression = new Literal(expression.source(), expression.fold(), expression.dataType());
                    return bc.reverse().replaceChildren(List.of(((Neg) bc.left()).field(), expression));
                } catch (ArithmeticException | DateTimeException e) {
                    // implicitly return the original BinaryComparison in case the folding results in scenarios where,
                    // for example, we have Long.MAX_VALUE+1 to fold. In this case, we skip the optimization.
                }
            }
            return bc;
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the return null inside the catch block bothers me.

The null isn't returned from the catch block, it's just assigned to the var to return.
But this still being a detail, would you mind expanding what's troubling / how it could be improved?

I would have kept the original (no separate method used)

The code is extracted as a method since it's reused. It should be acceptably concise too: "if the expression is foldable, attempt to do so. Return either folded outcome, original (if not foldable) or null (if foldable, but unsafely; i.e. exception wud be thrown)".
An alternative could be for the caller to save the input expression (as in your example), the function to return the original input, if folding is unsafe, and for caller to check returned value against the original input. But this would be more verbose (as in your example) and this pattern to be applied multiple times.

For now I've taken Costin's suggestion, but still open for change if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be left as is I guess. My initial suggestion seems to be just a personal preference. If there will be other changes to this PR, I'd add a comment similar to the one I made in the catch (ArithmeticException | DateTimeException e) { to explain the catch block "ignoring" the math failure.

if (bc.right() instanceof Literal) {
if (bc.left() instanceof ArithmeticOperation) {
return SimplifyOperation.simplify(bc, typesCompatible);
} else if (bc.left() instanceof Neg) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think only if (bc.left().... is enough. else if doesn't necessarily help here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it.

(Particularly here, the if[-else] is practically a switch, on a type that's not switchable in Java, so the change makes sense.
Generally I'm in favour of the if-else format when:

  • a return is involved, whose later replacement by a non-jumping statement, could make the next if branch eligible; i.e. less error-prone.
  • is more compact.)

- rename methods for more clarify;
- regroup simplification checks;
- eliminate calling super class' methods;
- use == for static string comps;
- use Math.signum instead of inlining the calculation.
@bpintea
Copy link
Contributor Author

bpintea commented Jan 27, 2021

@elasticmachine update branch

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

return literalNeg == null ? bc : bc.reverse().replaceChildren(List.of(((Neg) bc.left()).field(), literalNeg));
}

static Expression safeMaybeFold(Expression expression) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be left as is I guess. My initial suggestion seems to be just a personal preference. If there will be other changes to this PR, I'd add a comment similar to the one I made in the catch (ArithmeticException | DateTimeException e) { to explain the catch block "ignoring" the math failure.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left some stylist comments, no need for another review from me.

return (simplification == null || simplification.isUnsafe(typesCompatible)) ? comparison : simplification.apply();
}

private static boolean isMulOrDiv(String opSymbol) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this method necessary? I only see it used in one place and the check can check can be directly just life for ADD and SUB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's being used twice. I'm on the fence tbh - it makes things easier to read in sign(), but looks unbalanced in the if-else explicit ADD/SUB check - but would leave it as is now.

static boolean isMulOrDiv(String opSymbol) {
return MUL.symbol().equals(opSymbol) || DIV.symbol().equals(opSymbol);
// - post adjustments
Expression opSpecificPostApply(BinaryComparison binaryComparison) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

postApply or postProcess. opSpecific is somewhat meaningless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went for postProcess.

@@ -1221,7 +1245,7 @@ static Expression safeMaybeFold(Expression expression) {
}

// can it be quickly fast-tracked that the operation can't be reduced?
boolean cannotSimplify(BiFunction<DataType, DataType, Boolean> typesCompatible) {
boolean isUnsafe(BiFunction<DataType, DataType, Boolean> typesCompatible) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't this to be subclass because of the dedicated method, consider making it final.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, thanks, applied! Applicable to both class' main methods.

return true;
}

return opSpecificIsUnsafe();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about check, doCheck without opSpecific?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

- rename methods;
- add comments.
Both upstream and current branch updated line in the same way
(incrementing a count), so merging left it unchanged. However the net
result needs to reflect both modifications (i.e. inc by two).
@bpintea
Copy link
Contributor Author

bpintea commented Feb 1, 2021

@elasticmachine update branch

@bpintea bpintea merged commit f5c2982 into elastic:master Feb 1, 2021
@bpintea bpintea deleted the enh/optimize_arithm_in_bc branch February 1, 2021 19:09
bpintea added a commit to bpintea/elasticsearch that referenced this pull request Feb 1, 2021
* Simplify arithmetic operations in binary comps

This commit adds an optimizer rule to simplify the arithmetic operations
in binary comparison expressions, which in turn will allow for further
expression compounding by the optimiser.

Only the negation and plus, minus, multiplication and division are
currently considered and only when two of the operands are a literal.

For instance `((a + 1) / 2 - 3) * 4 >= 14` becomes `a >= 12`.

(cherry picked from commit f5c2982)
bpintea added a commit that referenced this pull request Feb 2, 2021
… (#68331)

* QL: Simplify arithmetic operations in binary comps (#66022)

* Simplify arithmetic operations in binary comps

This commit adds an optimizer rule to simplify the arithmetic operations
in binary comparison expressions, which in turn will allow for further
expression compounding by the optimiser.

Only the negation and plus, minus, multiplication and division are
currently considered and only when two of the operands are a literal.

For instance `((a + 1) / 2 - 3) * 4 >= 14` becomes `a >= 12`.

(cherry picked from commit f5c2982)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

QL: Move literals to the right in comparison expressions
7 participants