-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Define sum of empty exponential histograms as null instead of 0.0 #138349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
| required_capability: enrich_load | ||
| required_capability: fix_replace_missing_field_with_null_duplicate_name_id_in_layout | ||
| required_capability: dense_vector_agg_metric_double_if_fns | ||
| required_capability: exponential_histogram |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed anymore, because in EsqlSpecTestCase we already ensure that we only load the histogram-CSV data into an index if the capability is present.
Same for lookup-join.csv-spec
dnhatn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Part of #137988 Implement #138349 for the t-digest field type. This changes the behavior of the sum sub-field of t-digest when the digest is empty. Prior to this PR we treated it as 0, and this changes it to be null. This avoids a division by zero error in ESQL when trying to calculate the average of an empty histogram. We are adopting the same behavior for the exponential histogram field (see PR linked above), and this is important to keep the semantics of the two fields as close as possible.
Followup for #138177.
Prior to this PR, we defined the
SUMof an empty histogram to be0.0in ES|QL.As a result, this rightfully would yield a division-by-zero warning when trying to compute the average:
SUM/COUNT = 0.0/0.0.Empty histograms are an edge case: however, they will be more common when we support cumulative histograms.
Then, the "delta" of an unchanged counter histogram will be an empty histogram.
In this PR, we change the definition of
SUMto benullfor empty histograms so that the average computation isnull/0.0instead, which correctly returnsnullwithout warnings.To stay efficient, we also apply this change to the doc-values when we store exponential histograms. This prevents us from having to do any conversions on the
sumblock when loading, we can just load it as-is as the invariants between doc values and the block match.Because this changes the invariants of the block, it is inherently backwards-incompatible. This is not a problem because the field is hidden behind a feature flag, but we still run bwc tests with feature flags enabled.
For this reason I replaced the ES|QL capabilities with a single, new one, which essentially disables BWC tests for this feature up until now.