Improve synthetic source for tdigest field #138121

not-napoleon · 2025-11-14T20:13:27Z

Follow up to #137982 to support returning min, max, and sum in the synthetic source results. This also drops support for sending the count as a parameter (and thus doesn't include the total count in the synthetic source result), which matches the behavior of the exponential histogram field.

elasticsearchmachine · 2025-11-14T20:13:51Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

JonasKunz

LGTM, just a nit and more of a question about min/max estimation, but that can be addressed in a follow-up if needed.

JonasKunz · 2025-11-17T10:16:34Z

x-pack/plugin/src/yamlRestTest/resources/rest-api-spec/test/analytics/t_digest_fieldtype.yml

  - match:
      _source:
        latency:
+          # Note that we're storing 0.1 as the min, even though it's count is 0.


Does it make sense to include centroids in the min/max estimations in the estimation when their count is zero?
Good that you raised this here, I didn't notice it before.

I would assume that we should exclude it from the estimations, which would be consistent to the current min/max queryDSL aggregations on the histogram field I think?
We don't store the empty centroids, so min/max in queryDSL will return the smallest/highest populated centroid.

My thinking was that if the user sent in an explicit min that did not correspond to a centroid, we'd store that and it would have the same inconsistency with the QueryDSL implementation.

I also don't want to spend a lot of time on this. The empty bucket handling is an extreme edge case. Our expectation is that users will be sending in valid t-digests, which should never produce empty buckets. To get an empty bucket, the user has to not only not use the t-digest algorithm, but also do something which doesn't really make sense for most histogram types. Frankly, I'm not even sure we should treat these as valid inputs, but that's what the histogram field does and for now I'm trying to keep this as compatible as possible.

I don't think we need to decide this on this PR. Technically, this behavior was added in #137982, I just called it out in this test. We can discuss and change it later if we want to.

JonasKunz · 2025-11-17T10:16:41Z

...k/plugin/analytics/src/main/java/org/elasticsearch/xpack/analytics/mapper/TDigestParser.java


    private static final ParseField COUNTS_FIELD = new ParseField(COUNTS_NAME);
    private static final ParseField CENTROIDS_FIELD = new ParseField(CENTROIDS_NAME);
-    private static final ParseField TOTAL_COUNT_FIELD = new ParseField(TOTAL_COUNT_FIELD_NAME);


You can also remove TOTAL_COUNT_FIELD_NAME from TDigestFieldMapper, as it is unused now.

not-napoleon added 2 commits November 14, 2025 12:37

load summary fields during synthetic source composition

dd04313

don't accept total count

0e51e0e

not-napoleon requested review from JonasKunz and kkrik-es November 14, 2025 20:13

not-napoleon added >non-issue :StorageEngine/Mapping The storage related side of mappings v9.3.0 labels Nov 14, 2025

elasticsearchmachine added the Team:StorageEngine label Nov 14, 2025

don't return NaNs in synthetic source for empty values

3016d0a

JonasKunz approved these changes Nov 17, 2025

View reviewed changes

not-napoleon added 2 commits November 17, 2025 09:06

remove unused constant

a04e94f

Merge branch 'main' into improve-synthetic-source-for-tdigest-field

b329aaa

kkrik-es approved these changes Nov 17, 2025

View reviewed changes

not-napoleon merged commit 2b3613b into elastic:main Nov 17, 2025
34 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve synthetic source for tdigest field #138121

Improve synthetic source for tdigest field #138121

Uh oh!

not-napoleon commented Nov 14, 2025

Uh oh!

elasticsearchmachine commented Nov 14, 2025

Uh oh!

JonasKunz left a comment

Uh oh!

JonasKunz Nov 17, 2025

Uh oh!

not-napoleon Nov 17, 2025

Uh oh!

JonasKunz Nov 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Improve synthetic source for tdigest field #138121

Improve synthetic source for tdigest field #138121

Uh oh!

Conversation

not-napoleon commented Nov 14, 2025

Uh oh!

elasticsearchmachine commented Nov 14, 2025

Uh oh!

JonasKunz left a comment

Choose a reason for hiding this comment

Uh oh!

JonasKunz Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

not-napoleon Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

JonasKunz Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants