Skip to content

Conversation

@JonasKunz
Copy link
Contributor

@JonasKunz JonasKunz commented Nov 26, 2025

Implements first_over_time and last_over_time for exponential_histograms.
I decided to handroll the state for (long, ExponentialHistogram) pairs and the aggregators for the functions above,
as otherwise I think the templates would get more messy with special cases.

If we eventually encounter too much copied code, we can revisit that decision.

@elasticsearchmachine elasticsearchmachine added external-contributor Pull request authored by a developer outside the Elasticsearch team v9.3.0 labels Nov 26, 2025
@JonasKunz JonasKunz force-pushed the exp-histo-overtime-aggs branch from 84b7227 to 6ed7422 Compare November 26, 2025 08:55
@JonasKunz JonasKunz changed the title Exp histo overtime aggs ESQL: Implement first/last_over_time for exponential histograms Nov 26, 2025
@JonasKunz JonasKunz force-pushed the exp-histo-overtime-aggs branch from 594f6c9 to 9c323ab Compare November 27, 2025 08:40
@JonasKunz JonasKunz marked this pull request as ready for review November 27, 2025 08:57
@JonasKunz JonasKunz requested a review from dnhatn November 27, 2025 08:57
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 27, 2025
Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some comments, but this looks good. Thanks Jonas!

assert histogramValue != null;
ensureCapacity(groupId);
Releasables.close(histogramValues.get(groupId));
histogramValues.set(groupId, ExponentialHistogram.builder(histogramValue, breaker).build());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we copy every candidate we see. This is fine for last_over_time with tsdb, but for first_over_time we may copy and discard many values. Is it possible to make ExponentialHistogram ref-counted and delay copying until the end? If so, we can improve this in a follow-up.

Copy link
Contributor Author

@JonasKunz JonasKunz Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for last_over_time with tsdb, but for first_over_time we may copy and discard many values

Just for me understanding, isn't it the other way around? In TSDB, we iterate over the values sorted by time.
So for last, we actually see the desired value last and therefore keep overriding the state all the time?

Is it possible to make ExponentialHistogram ref-counted and delay copying until the end? If so, we can improve this in a follow-up.

The exponential histograms we operate on here directly work on the byte[] owned by the block. So to keep a reference to the histogram, we'd need to keep the reference to the entire block. I assume that we want to avoid this, as it could hog a lot of memory?

If we want to avoid the above, I think we can't get away without copying. But we can at least avoid the allocations and the decoding/encoding of the histogram.

Similar to BreakingBytesRefBuilder, we could add a corresponding histogram builder, which directly copies the encoded histogram bytes and can be reused. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created an issue:
#138809

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Jonas! That works. It is quite optional since it only affects first_over_time, which I think is not commonly used.

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/exponential_histogram.csv-spec
@JonasKunz JonasKunz force-pushed the exp-histo-overtime-aggs branch from a36e755 to 9961590 Compare December 1, 2025 09:24
@JonasKunz JonasKunz merged commit f6ffd56 into elastic:main Dec 1, 2025
34 checks passed
@JonasKunz JonasKunz deleted the exp-histo-overtime-aggs branch December 1, 2025 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL external-contributor Pull request authored by a developer outside the Elasticsearch team >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants