Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ESQL] Migrate SimplifiyComparisonArithmetics optimization rule #108200

Conversation

not-napoleon
Copy link
Member

@not-napoleon not-napoleon commented May 2, 2024

This PR migrates the QL optimization rule SimplifyComparisonArithmetics, which folds constant arithmetic in binary comparisons when possible. I didn't find any tests for this optimization rule in the existing code base, so I wrote a few that cover the cases I could think of. I also slightly refactored the optimization to not rely on string comparing the symbols to determine the operation. That was necessary in the generic code, since the operations were in a different module, but by pulling this into the ESQL module, we have direct access to the relevant enum. This commit isolates that refactoring, which might be easier to review.

I migrated the tests from OptimizerRunTests into LogicalPlanOptimizerTests. As you can see, there were some significant difficulties in doing that. I've opened issues and applied the @AwaitsFix label for now. My recommendation is that we merge this first and fix those issues in a follow up, as this review is already quite lengthy and I'm hesitant to make more tricky changes in it.

In the course of migrating the tests for this, I ran into a lot of issues:
- Resolves #108388
- #108524
- #108525
- #108519

I was able to resolve the first one, by replacing the usage of Literal.of with directly calling new Literal(), as of() was setting the data type incorrectly.

@not-napoleon not-napoleon added >non-issue auto-backport-and-merge Automatically create backport pull requests and merge when ready :Analytics/ES|QL AKA ESQL v8.14.1 v8.15.0 labels May 2, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 2, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding the tests 👍 .

For reference, there is one test that's EQL specific from the original PR here org.elasticsearch.xpack.eql.optimizer.OptimizerTests.testReduceBinaryComparisons.

But, there are many more tests here https://github.com/elastic/elasticsearch/pull/66022/files#diff-e76f0734cca452532a386eba427f1c473a72f507740086ac04db5a370f3fbc6f

And many many more here https://github.com/elastic/elasticsearch/pull/66022/files#diff-66ac572ee225421b48743da1a178df51ef4568b9ee965acdf39125ca845efca7. The sql-spec files test ES SQL against in-memory H2 database, that's why there are no results for those queries (the results from ES and H2 are compared with eachother).

@not-napoleon
Copy link
Member Author

Thanks @astefan , I'll work on getting those tests migrated.

@elasticsearchmachine
Copy link
Collaborator

Hi @not-napoleon, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Hi @not-napoleon, I've updated the changelog YAML for you.

if (expression.foldable()) {
try {
expression = new Literal(expression.source(), expression.fold(), expression.dataType());
} catch (ArithmeticException | DateTimeException e) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the source of the issue in #108519, as ES|QL will never throw from folding. It's not as simple as just checking if we got a non-null value in the literal here though, as generating the null literal will have put the warning on the stack already.

final Expression apply() {
// force float point folding for FlP field
Literal bcl = operation.dataType().isRational()
? new Literal(bcLiteral.source(), ((Number) bcLiteral.value()).doubleValue(), DataTypes.DOUBLE)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is changed from the original rule. The original version read ? Literal.of(bcLiteral, ((Number) bcLiteral.value()).doubleValue()), however Literal.of() takes the data type from the first parameter, not the second. This was causing us to create (e.g.) Literals with a value of 4.0 and a type of INTEGER, which in turn caused a class cast exception. See #108388

@not-napoleon
Copy link
Member Author

@astefan I finished migrating the tests from OptimizerRunTests (with AwaitsFix notations, as we discussed). I would like to merge this now, without including the SQL spec tests, and add them in a follow up PR. Migrating the spec tests will be a lot of work, as the tests as written assume the existence of a reference database implementation to validate against. Migrating them will require first re-writing them into ESQL, then working out the expected results for them. This PR already contains one real bugfix, and there's several follow up bug fixes that we can't really address until this merges, and I'd rather not delay landing those. What do you think?

@not-napoleon
Copy link
Member Author

After discussion on the general approach to this migration, I am closing this PR in favor of #108744, which migrates the SQL tests, and the first step of #106679 which will pull over the relevant optimization rule.

not-napoleon added a commit that referenced this pull request Jun 3, 2024
This pulls the SimplifyComparisonArithmetics optimization into ESQL proper, from core, and in doing so resolves two of the issues we uncovered with that optimization.

Rather than moving this into a sub-class of OptimzierRules, I've opted to create a new package optimizer.rules, and put this as a top level class there. I have another PR open that migrates a bunch of other rules into OptimizerRules, and that's in general an area of the code that's being touched a lot lately. Having top level classes in a package, rather than sub-classes in a single class file, will create fewer conflicts while working on things that are actually unrelated in this type of situation.

Although this is a bugfix, I did not mark it for backporting to 8.14. To my knowledge, no users have reported the associated bugs (we found them through migrating tests), and this fix depends on tests and refactorings that were not backported to 8.14. It would be possible to write an 8.14 version of this, but IMHO it's easier to do as a fresh ticket than as a backport and conflict resolution process. If we think that's important, I'm happy to do so, but at the moment I think fixing it for 8.15 is fine.

Note that this is a re-application of the changes in #108200 after we did other refactoring, which that was closed to enable.

Resolves #108388
Resolves #108743
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport-and-merge Automatically create backport pull requests and merge when ready >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.14.1 v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ESQL] Class cast exception mixing division and comparison
3 participants