New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Fix bug regarding histograms usage in scripting #36866
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,6 +36,8 @@ The histogram function takes all matching values and divides them into buckets w | |
bucket_key = Math.floor(value / interval) * interval | ||
---- | ||
|
||
NOTE:: The histogram in SQL does *NOT* return empty buckets for missing intervals as the traditional <<search-aggregations-bucket-histogram-aggregation, histogram>> and <<search-aggregations-bucket-datehistogram-aggregation, date histogram>>. Such behavior does not fit conceptually in SQL which treats all missing values as `NULL`; as such the histogram places all missing values in the `NULL` group. | ||
|
||
`Histogram` can be applied on either numeric fields: | ||
|
||
|
||
|
@@ -51,4 +53,26 @@ or date/time fields: | |
include-tagged::{sql-specs}/docs.csv-spec[histogramDate] | ||
---- | ||
|
||
Expressions inside the histogram are also supported as long as the | ||
return type is numeric: | ||
|
||
["source","sql",subs="attributes,callouts,macros"] | ||
---- | ||
include-tagged::{sql-specs}/docs.csv-spec[histogramNumericExpression] | ||
---- | ||
|
||
Do note that histograms (and grouping functions in general) allow custom expressions but cannot have any functions applied to them in the `GROUP BY`.In other words, the following statement is *NOT* allowed: | ||
|
||
["source","sql",subs="attributes,callouts,macros"] | ||
---- | ||
include-tagged::{sql-specs}/docs.csv-spec[expressionOnHistogramNotAllowed] | ||
---- | ||
|
||
as it requires two groupings (one for histogram and then another for apply the function on top of the histogram groups). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of "then another for apply the function", I think it's better "then another for applying the function" or "then another to apply the function". |
||
|
||
Instead one can rewrite the query to move the expression on the histogram _inside_ of it: | ||
|
||
["source","sql",subs="attributes,callouts,macros"] | ||
---- | ||
include-tagged::{sql-specs}/docs.csv-spec[histogramDateExpression] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The query here is exactly the same as the one referenced in the previous section: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch. |
||
---- |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -262,9 +262,51 @@ SELECT HISTOGRAM(birth_date, INTERVAL 1 YEAR) AS h, COUNT(*) as c FROM test_emp | |
null |10 | ||
; | ||
|
||
histogramDateWithDateFunction-Ignore | ||
SELECT YEAR(HISTOGRAM(birth_date, INTERVAL 1 YEAR)) AS h, COUNT(*) as c FROM test_emp GROUP BY h ORDER BY h DESC; | ||
histogramDateWithMonthOnTop | ||
schema::h:i|c:l | ||
SELECT HISTOGRAM(MONTH(birth_date), 2) AS h, COUNT(*) as c FROM test_emp GROUP BY h ORDER BY h DESC; | ||
|
||
h | c | ||
---------------+--------------- | ||
12 |7 | ||
10 |17 | ||
8 |16 | ||
6 |16 | ||
4 |18 | ||
2 |10 | ||
0 |6 | ||
null |10 | ||
; | ||
|
||
histogramDateWithYearOnTop | ||
schema::h:i|c:l | ||
SELECT HISTOGRAM(MONTH(birth_date), 2) AS h, COUNT(*) as c FROM test_emp GROUP BY h ORDER BY h DESC; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
h | c | ||
---------------+--------------- | ||
1964 |5 | ||
1962 |13 | ||
1960 |16 | ||
1958 |16 | ||
1956 |9 | ||
1954 |12 | ||
1952 |19 | ||
null |10 | ||
; | ||
|
||
|
||
|
||
histogramNumericWithExpression | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we need that, since we have this: https://github.com/elastic/elasticsearch/pull/36866/files#diff-271679983098ae41c89a482374a0e984R731 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It was a typo - the incorrect query should have had the MONTH out of the histogram - I've updated that now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no I mean this and the next one are similar, emp_no instead of salary, it's only the order by. |
||
schema::h:i|c:l | ||
SELECT HISTOGRAM(emp_no % 100, 10) AS h, COUNT(*) as c FROM test_emp GROUP BY h ORDER BY h DESC; | ||
|
||
h | c | ||
---------------+--------------- | ||
90 |10 | ||
80 |10 | ||
70 |10 | ||
60 |10 | ||
50 |10 | ||
40 |10 | ||
30 |10 | ||
20 |10 | ||
10 |10 | ||
0 |10 | ||
; |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,8 @@ | |
import org.elasticsearch.xpack.sql.expression.function.FunctionAttribute; | ||
import org.elasticsearch.xpack.sql.expression.function.Functions; | ||
import org.elasticsearch.xpack.sql.expression.function.Score; | ||
import org.elasticsearch.xpack.sql.expression.function.aggregate.AggregateFunctionAttribute; | ||
import org.elasticsearch.xpack.sql.expression.function.grouping.GroupingFunctionAttribute; | ||
import org.elasticsearch.xpack.sql.expression.function.scalar.ScalarFunction; | ||
import org.elasticsearch.xpack.sql.expression.predicate.conditional.ConditionalFunction; | ||
import org.elasticsearch.xpack.sql.expression.predicate.operator.comparison.In; | ||
|
@@ -224,6 +226,7 @@ Collection<Failure> verify(LogicalPlan plan) { | |
validateConditional(p, localFailures); | ||
|
||
checkFilterOnAggs(p, localFailures); | ||
checkFilterOnGrouping(p, localFailures); | ||
|
||
if (!groupingFailures.contains(p)) { | ||
checkGroupBy(p, localFailures, resolvedFunctions, groupingFailures); | ||
|
@@ -419,7 +422,7 @@ private static boolean checkGroupByHavingHasOnlyAggs(Expression e, Node<?> sourc | |
return true; | ||
} | ||
// skip aggs (allowed to refer to non-group columns) | ||
if (Functions.isAggregate(e)) { | ||
if (Functions.isAggregate(e) || Functions.isGrouping(e)) { | ||
return true; | ||
} | ||
|
||
|
@@ -448,6 +451,21 @@ private static boolean checkGroupByAgg(LogicalPlan p, Set<Failure> localFailures | |
} | ||
})); | ||
|
||
a.groupings().forEach(e -> { | ||
if (Functions.isGrouping(e) == false) { | ||
e.collectFirstChildren(c -> { | ||
if (Functions.isGrouping(c)) { | ||
localFailures.add(fail(c, | ||
"Cannot combine [%s] grouping function inside GROUP BY, found [%s];" | ||
+ " consider moving the expression inside the histogram", | ||
Expressions.name(c), Expressions.name(e))); | ||
return true; | ||
} | ||
return false; | ||
}); | ||
} | ||
}); | ||
|
||
if (!localFailures.isEmpty()) { | ||
return false; | ||
} | ||
|
@@ -547,19 +565,30 @@ private static void checkFilterOnAggs(LogicalPlan p, Set<Failure> localFailures) | |
if (p instanceof Filter) { | ||
Filter filter = (Filter) p; | ||
if ((filter.child() instanceof Aggregate) == false) { | ||
filter.condition().forEachDown(f -> { | ||
if (Functions.isAggregate(f) || Functions.isGrouping(f)) { | ||
String type = Functions.isAggregate(f) ? "aggregate" : "grouping"; | ||
localFailures.add(fail(f, | ||
"Cannot use WHERE filtering on %s function [%s], use HAVING instead", type, Expressions.name(f))); | ||
filter.condition().forEachDown(e -> { | ||
if (Functions.isAggregate(e) || e instanceof AggregateFunctionAttribute) { | ||
localFailures.add( | ||
fail(e, "Cannot use WHERE filtering on aggregate function [%s], use HAVING instead", Expressions.name(e))); | ||
} | ||
|
||
}, Function.class); | ||
}, Expression.class); | ||
} | ||
} | ||
} | ||
|
||
|
||
private static void checkFilterOnGrouping(LogicalPlan p, Set<Failure> localFailures) { | ||
if (p instanceof Filter) { | ||
Filter filter = (Filter) p; | ||
filter.condition().forEachDown(e -> { | ||
if (Functions.isGrouping(e) || e instanceof GroupingFunctionAttribute) { | ||
localFailures | ||
.add(fail(e, "Cannot filter on grouping function [%s], use the grouped field instead", Expressions.name(e))); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe it's too verbose, but I suggest to rephrase to |
||
} | ||
}, Expression.class); | ||
} | ||
} | ||
|
||
|
||
private static void checkForScoreInsideFunctions(LogicalPlan p, Set<Failure> localFailures) { | ||
// Make sure that SCORE is only used in "top level" functions | ||
p.forEachExpressions(e -> | ||
|
@@ -647,4 +676,4 @@ private static boolean areTypesCompatible(DataType left, DataType right) { | |
(left.isNumeric() && right.isNumeric()); | ||
} | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
/* | ||
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
* or more contributor license agreements. Licensed under the Elastic License; | ||
* you may not use this file except in compliance with the Elastic License. | ||
*/ | ||
package org.elasticsearch.xpack.sql.expression.gen.script; | ||
|
||
import org.elasticsearch.xpack.sql.expression.function.grouping.GroupingFunctionAttribute; | ||
|
||
class Grouping extends Param<GroupingFunctionAttribute> { | ||
|
||
Grouping(GroupingFunctionAttribute groupRef) { | ||
super(groupRef); | ||
} | ||
|
||
String groupName() { | ||
return value().functionId(); | ||
} | ||
|
||
@Override | ||
public String prefix() { | ||
return "g"; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
White space after
applied to them in the
GROUP BY.