-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL: Implement first/last_over_time for exponential histograms #138639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
84b7227 to
6ed7422
Compare
594f6c9 to
9c323ab
Compare
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
dnhatn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some comments, but this looks good. Thanks Jonas!
.../compute/src/main/java/org/elasticsearch/compute/aggregation/ExponentialHistogramStates.java
Outdated
Show resolved
Hide resolved
.../compute/src/main/java/org/elasticsearch/compute/aggregation/ExponentialHistogramStates.java
Outdated
Show resolved
Hide resolved
| assert histogramValue != null; | ||
| ensureCapacity(groupId); | ||
| Releasables.close(histogramValues.get(groupId)); | ||
| histogramValues.set(groupId, ExponentialHistogram.builder(histogramValue, breaker).build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we copy every candidate we see. This is fine for last_over_time with tsdb, but for first_over_time we may copy and discard many values. Is it possible to make ExponentialHistogram ref-counted and delay copying until the end? If so, we can improve this in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine for last_over_time with tsdb, but for first_over_time we may copy and discard many values
Just for me understanding, isn't it the other way around? In TSDB, we iterate over the values sorted by time.
So for last, we actually see the desired value last and therefore keep overriding the state all the time?
Is it possible to make ExponentialHistogram ref-counted and delay copying until the end? If so, we can improve this in a follow-up.
The exponential histograms we operate on here directly work on the byte[] owned by the block. So to keep a reference to the histogram, we'd need to keep the reference to the entire block. I assume that we want to avoid this, as it could hog a lot of memory?
If we want to avoid the above, I think we can't get away without copying. But we can at least avoid the allocations and the decoding/encoding of the histogram.
Similar to BreakingBytesRefBuilder, we could add a corresponding histogram builder, which directly copies the encoded histogram bytes and can be reused. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created an issue:
#138809
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Jonas! That works. It is quite optional since it only affects first_over_time, which I think is not commonly used.
.../compute/src/main/java/org/elasticsearch/compute/aggregation/ExponentialHistogramStates.java
Outdated
Show resolved
Hide resolved
# Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/exponential_histogram.csv-spec
a36e755 to
9961590
Compare
Implements
first_over_timeandlast_over_timeforexponential_histograms.I decided to handroll the state for
(long, ExponentialHistogram)pairs and the aggregators for the functions above,as otherwise I think the templates would get more messy with special cases.
If we eventually encounter too much copied code, we can revisit that decision.