New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: fix COUNT DISTINCT filtering #37176
Conversation
Pinging @elastic/es-search |
run the gradle build tests 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I would ask to add some tests to the QueryTranslatorTests
too, on top of the integ tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix. Can you please check that we have duplicates in the data and do a test with COUNT(column)
and COUNT(DISTINCT column)
to check they return different (and correct) results?
Potentially by using the gender field instead.
COUNT(*) vs COUNT(column) vs COLUMN (DISTINCT column) should all return different results especially since we NULL
data.
@costin while testing COUNT in a separate PR (still WIP), COUNT(column) and COUNT(DISTINCT column) are treated as the same agg. One first issue is that So, at this moment, a test with |
run the gradle build tests 2 |
@astefan +1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
run the gradle build tests 1 |
1 similar comment
run the gradle build tests 1 |
* Use `_count` aggregation value only for not-DISTINCT COUNT function calls * COUNT DISTINCT will use the _exact_ version of a field (the `keyword` sub-field for example), if there is one (cherry picked from commit 3fad9d2)
* Use `_count` aggregation value only for not-DISTINCT COUNT function calls * COUNT DISTINCT will use the _exact_ version of a field (the `keyword` sub-field for example), if there is one (cherry picked from commit 3fad9d2)
closes #37087 |
This PR fixes #37086 bug where aggregation filtering (HAVING) on COUNT(DISTINCT) was behaving just like it does for non-DISTINCT version. Also, COUNT(DISTINCT) will try to use an exact version of the field whenever possible (the common scenario is the one of a
text
field), since it's using thecardinality
aggregation which will not work ontext
fields.