New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Handle NaN returned for aggs in case of nulls #35164
Changes from all commits
904f55a
597f157
4ea1fca
1987b56
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,6 +15,7 @@ class org.elasticsearch.xpack.sql.expression.function.scalar.whitelist.InternalS | |
boolean nullSafeFilter(Boolean) | ||
double nullSafeSortNumeric(Number) | ||
String nullSafeSortString(Object) | ||
Number nanSafeFilter(Number) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you put the elements in this group in alphabetical order, please? I think that was the original intention with the list of functions here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This name is misleading - the above methods are used to wrap an expression used in filtering; this method should be used against all aggs, regardless of their position or whether the expression is a filter or not. |
||
|
||
# | ||
# Comparison | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please try other aggs as well?
For example use min and max to trigger a stats agg, kurtosis for matrix, sum (which should always return zero), percentiles and percentile ranks.
If indeed the bucket selector gives us just the value, I think the proper fix should actually be inside ES...
@polyfractal this looks like another instanceof #34903 - what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, playing catch-up here. What's the underlying aggregation structure being generated? Aggs + bucket_selector? Which value(s) being aggregated are NaN/null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the example above
HAVING maxl <=2
becomes a Painless script doing a bucket selector pointing toMAX
agg with the content of (max_agg <=2
).This issue appears though for any agg - when running against a null bucket, instead of returning nulls, the aggs selected by the bucket selector their default values (like in #34903). In case of Max is a
NaN
but this clearly differs from case to case.In scripting however we have no understanding of the source so we can't determine that it's an agg since we only receive the value (
NaN
) or whatever.I would argue this is an actual bug in ES.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@polyfractal ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@costin Ahh, yes I see. I agree it's related to #34903. Bucket selector is just resolving to the internal agg object and asking for it's
value()
, which returns the primitivedouble
internal state unlike xcontent which does the null conversion.I think we can probably expose the
hasValue()
method (or whatever the mechanism ends up being) in the bucket_selector painless context so that it can be used from scripts. I don't think it's something we can do automatically though, because of how the pipeline framework works right now. It's a similar issue to #27377 (not quite the same, but similar root issue). I don't think we can fix that easily until the transport client goes away, or else we'll have a big breaking change to how agg objects work (double
toDouble
orOptional
)So the script might need to do a check if there is a value before actually using the value. Something like:
Not ideal but probably the path of least resistance right now. Or maybe introduce a new
valueOrNull()
method in the context or something.I'll make a note on #34903 about pipelines