Skip to content

Conversation

@not-napoleon
Copy link
Member

Follow up to #137982 to support returning min, max, and sum in the synthetic source results. This also drops support for sending the count as a parameter (and thus doesn't include the total count in the synthetic source result), which matches the behavior of the exponential histogram field.

@not-napoleon not-napoleon added >non-issue :StorageEngine/Mapping The storage related side of mappings v9.3.0 labels Nov 14, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Contributor

@JonasKunz JonasKunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a nit and more of a question about min/max estimation, but that can be addressed in a follow-up if needed.

- match:
_source:
latency:
# Note that we're storing 0.1 as the min, even though it's count is 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to include centroids in the min/max estimations in the estimation when their count is zero?
Good that you raised this here, I didn't notice it before.

I would assume that we should exclude it from the estimations, which would be consistent to the current min/max queryDSL aggregations on the histogram field I think?
We don't store the empty centroids, so min/max in queryDSL will return the smallest/highest populated centroid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that if the user sent in an explicit min that did not correspond to a centroid, we'd store that and it would have the same inconsistency with the QueryDSL implementation.

I also don't want to spend a lot of time on this. The empty bucket handling is an extreme edge case. Our expectation is that users will be sending in valid t-digests, which should never produce empty buckets. To get an empty bucket, the user has to not only not use the t-digest algorithm, but also do something which doesn't really make sense for most histogram types. Frankly, I'm not even sure we should treat these as valid inputs, but that's what the histogram field does and for now I'm trying to keep this as compatible as possible.

I don't think we need to decide this on this PR. Technically, this behavior was added in #137982, I just called it out in this test. We can discuss and change it later if we want to.


private static final ParseField COUNTS_FIELD = new ParseField(COUNTS_NAME);
private static final ParseField CENTROIDS_FIELD = new ParseField(CENTROIDS_NAME);
private static final ParseField TOTAL_COUNT_FIELD = new ParseField(TOTAL_COUNT_FIELD_NAME);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also remove TOTAL_COUNT_FIELD_NAME from TDigestFieldMapper, as it is unused now.

@not-napoleon not-napoleon merged commit 2b3613b into elastic:main Nov 17, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants