Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Failing Group By queries due to different ExpressionIds #43072

Open
wants to merge 8 commits into
base: master
from

Conversation

@emasab
Copy link
Contributor

commented Jun 10, 2019

Fix an issue that arises from the use of ExpressionIds as keys in a lookup map
that helps the QueryTranslator to identify the grouping columns. The issue is
that the same expression in different parts of the query (SELECT clause and GROUP BY clause)
ends up with different ExpressionIds so the lookup fails. So, instead of ExpressionIds
use the hashCode() of NamedExpression.

Fixes: #41159
Fixes: #40001
Fixes: #40240
Fixes: #33361
Partially fixes: #44950
Partially fixes: #36074

@emasab emasab changed the title bugfix for https://github.com/elastic/elasticsearch/issues/42041 SQL: Failing Group By queries due to different ExpressionIds Jun 10, 2019
@atorok atorok added the :Search/SQL label Jun 11, 2019
@elasticmachine

This comment has been minimized.

Copy link
Collaborator

commented Jun 11, 2019

@javanna javanna requested a review from matriv Jul 15, 2019
Copy link
Contributor

left a comment

Hello @emasab!

First of all apologies for the delay on picking up this PR and many thanks for your effort here.
The issue with the ExpressionIds is quite complicated and there are plans to solve it overall through a substantial refactoring. Your changes though, solve a substantial subset of the issues
so we decided to go ahead and proceed with this fix. I've changed the title and description of the PR and link to the issues that it fixes.

For some cases (without aliases or numeric refs) queries still fail, e.g.:
SELECT PI() * emp_no FROM test_emp GROUP BY PI() * emp_no ORDER BY PI() * emp_no LIMIT 10

Moreover I'd like to ask from you to add some integration tests for the issues that this fix addresses:
In agg.sql-spec file: https://github.com/emasab/elasticsearch/blob/bugfix/42041-failing-group-by-queries/x-pack/plugin/sql/qa/src/main/resources/agg.sql-spec#L52 the following:

groupByCastScalar
SELECT CAST(ABS(EXTRACT(YEAR FROM "birth_date")) AS BIGINT) FROM test_emp GROUP BY CAST(ABS(EXTRACT(YEAR FROM "birth_date")) AS BIGINT) ORDER BY CAST(ABS(EXTRACT(YEAR FROM "birth_date")) AS BIGINT) NULLS FIRST;
groupByCastScalarWithAlias
SELECT CAST(ABS(EXTRACT(YEAR FROM "birth_date")) AS BIGINT) as "cast" FROM test_emp GROUP BY "cast" ORDER BY "cast" NULLS FIRST;

and in agg.csv-spec file: https://github.com/emasab/elasticsearch/blob/bugfix/42041-failing-group-by-queries/x-pack/plugin/sql/qa/src/main/resources/agg.csv-spec#L220
the following:

// Fails for H2
groupByCastScalarWithNumericRef
SELECT CAST(ABS(EXTRACT(YEAR FROM "birth_date")) AS BIGINT) FROM test_emp GROUP BY 1 ORDER BY 1 NULLS FIRST;

CAST(ABS(EXTRACT(YEAR FROM "birth_date")) AS BIGINT):l
------------------------------------------------------
null
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
;

groupByConvertScalar
SELECT CONVERT(ABS(EXTRACT(YEAR FROM "birth_date")), SQL_BIGINT) FROM test_emp GROUP BY CONVERT(ABS(EXTRACT(YEAR FROM "birth_date")), SQL_BIGINT) ORDER BY CONVERT(ABS(EXTRACT(YEAR FROM "birth_date")), SQL_BIGINT) NULLS FIRST;


CONVERT(ABS(EXTRACT(YEAR FROM "birth_date")), SQL_BIGINT):l
-----------------------------------------------------------
null
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
;


groupByConvertScalarWithAlias
SELECT CONVERT(ABS(EXTRACT(YEAR FROM "birth_date")), SQL_BIGINT) as "convert" FROM test_emp GROUP BY "convert" ORDER BY "convert" NULLS FIRST;

convert:l
---------
null
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
;

groupByConvertScalarWithNumericRef
SELECT CONVERT(ABS(EXTRACT(YEAR FROM "birth_date")), SQL_BIGINT) FROM test_emp GROUP BY 1 ORDER BY 1 NULLS FIRST;

CONVERT(ABS(EXTRACT(YEAR FROM "birth_date")), SQL_BIGINT):l
-----------------------------------------------------------
null
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
;

// AwaitsFix https://github.com/elastic/elasticsearch/issues/36074
//groupByConstantScalar
//SELECT PI() * emp_no FROM test_emp GROUP BY PI() * emp_no ORDER BY PI() * emp_no LIMIT 10;

//PI() * emp_no:d
//---------------
//31419.0681285515
//31422.2097212051
//31425.3513138587
//31428.4929065123
//31431.6344991659
//31434.7760918195
//31437.9176844731
//31441.0592771266
//31444.2008697802
//31447.3424624338
//;

groupByConstantScalarWithAlias
SELECT PI() * emp_no AS "value" FROM test_emp GROUP BY value ORDER BY value LIMIT 10;

value:d
-------
31419.0681285515
31422.2097212051
31425.3513138587
31428.4929065123
31431.6344991659
31434.7760918195
31437.9176844731
31441.0592771266
31444.2008697802
31447.3424624338
;

groupByConstantScalarWithNumericRef
SELECT PI() * emp_no FROM test_emp GROUP BY 1 ORDER BY 1 LIMIT 10;

PI() * emp_no:d
---------------
31419.0681285515
31422.2097212051
31425.3513138587
31428.4929065123
31431.6344991659
31434.7760918195
31437.9176844731
31441.0592771266
31444.2008697802
31447.3424624338
;
@@ -779,6 +779,85 @@ public void testAllCountVariantsWithHavingGenerateCorrectAggregations() {
+ "\"gap_policy\":\"skip\"}}}}}"));
}

public void testGroupSelection() {

This comment has been minimized.

Copy link
@matriv

matriv Jul 29, 2019

Contributor

It would be great to split this up to more methods with more descriptive names and add more tests so that all 3 cases (no alias, alias, numeric reference) are covered, similarly to the integration tests in my comment: #43072 (review)
Of course some of the cases, as mentioned, should be skipped as they won't work even with this fix.

@emasab

This comment has been minimized.

Copy link
Contributor Author

commented Jul 30, 2019

Hi @matriv , happy to hear from you. I understand, the checks to do are a lot before giving an ok. I'm going to include these integration tests, change the unit tests and see if there is a simple fix for the failing query that you've highlighted.

@emasab

This comment has been minimized.

Copy link
Contributor Author

commented Aug 3, 2019

Fixed the groupByConstantScalar case too, mainly unifying the semantic of NamedExpression.equals to that of NamedExpression.hashCode . The ExpressionId is still there but is used only as a unique identifier for the object instance. Changed the semantic of Failure.equals because now two NamedExpression are equal if they have the same tree even if the position in the source is different so Failure takes into account the source position not the parsed node.

@jakelandis jakelandis added v7.3.2 and removed v7.3.1 labels Aug 22, 2019
@pcsanwald pcsanwald added v6.8.4 and removed v6.8.3 labels Aug 29, 2019
@colings86 colings86 added v7.5.0 and removed v7.4.0 labels Aug 30, 2019
@polyfractal polyfractal added v7.3.3 and removed v7.3.2 labels Sep 6, 2019
@emasab

This comment has been minimized.

Copy link
Contributor Author

commented Sep 7, 2019

Sorry, the latest commit

4902015

even if more correct theorically because of the difference between hashCode and equals, solves the specified case and passes the unit tests, causes a lot of failures with the integration test org.elasticsearch.xpack.sql.qa.single_node.JdbcSqlSpecIT. The previous one

8720563

is not affected

@emasab

This comment has been minimized.

Copy link
Contributor Author

commented Sep 10, 2019

With the 9b7878b I've solved the integration tests failures while keeping the equals and hashCode methods consistent. The order by constant scalar is still excluded. With my previous commit there was an incomplete change to ResolveMissingRefs that made it work but I've reverted it because this other query failed:

SELECT last_name FROM "test_emp" ORDER BY birth_date::TIME, emp_no LIMIT 5

@emasab

This comment has been minimized.

Copy link
Contributor Author

commented Sep 19, 2019

In the latest commit these is a fix for groupByConstantScalar case with ORDER BY, while keeping the equals consistent with the hashCode, and the integration tests successful

The result of the ordering is still not correct in the DESC case or when multiplied by -1 but the query seems to be translated correctly before it's processed by the Querier

@emasab emasab force-pushed the emasab:bugfix/42041-failing-group-by-queries branch from e904df4 to 4111876 Sep 20, 2019
@emasab

This comment has been minimized.

Copy link
Contributor Author

commented Sep 20, 2019

rebased on current master
changed expected result according to the behavior enforced by testGeoShapeInWhereClause

@emasab emasab force-pushed the emasab:bugfix/42041-failing-group-by-queries branch from 4111876 to 4a57627 Sep 25, 2019
@emasab emasab force-pushed the emasab:bugfix/42041-failing-group-by-queries branch from 4a57627 to 2ccb219 Oct 9, 2019
@emasab

This comment has been minimized.

Copy link
Contributor Author

commented Oct 9, 2019

In the latest commit there is a fix for the results ordering when there is a grouped ScalarFunctionAttribute in the ORDER BY clause, like:
SELECT PI() * emp_no FROM test_emp GROUP BY PI() * emp_no ORDER BY PI() * emp_no DESC LIMIT 10;

That should be the last change for this PR, unless other issues come out, you request some changes or rebases to master

emasab added 7 commits Aug 3, 2019
added test cases and refactored into multiple unit tests
unified semantic of equals to that of hashCode for NamedExpression
changed AttributeMapTests according to the new semantic
changed equals semantic of Failure
…ionId.

Minor changes to adapt to the change of NamedExpression.equals and hashCode.
Changed unit test because of the same reason.
rebased
@tomcallahan tomcallahan added v6.8.5 and removed v6.8.4 labels Oct 22, 2019
@emasab emasab force-pushed the emasab:bugfix/42041-failing-group-by-queries branch from 2ccb219 to 697713f Oct 22, 2019
@emasab

This comment has been minimized.

Copy link
Contributor Author

commented Oct 22, 2019

rebased

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.