Don't create a new `DoubleHistogram` instance for empty buckets. #92547

martijnvg · 2022-12-23T13:58:26Z

Currently, the percentile and percentile_ranks aggregations create a new DoubleHistogram instance each time when creating empty aggregation. This is very wasteful. Especially when numberOfSignificantValueDigits is 5. Then each instance costs ~5MB.

This change uses null instead. At reduce time InternalHDRPercentileRanks with a null state are skipped. In case all InternalHDRPercentileRanks are null then reduce uses an empty DoubleHistogram instance.

Note that buildEmptyAggregations() could also return an empty DoubleHistogram instance. However, serializing an empty DoubleHistogram instance also adds significant overhead. (ByteBuffer instance is created based on DoubleHistogram#getNeededByteBufferCapacity(), and that can cause a very large ByteBuffer instance to be created)

Currently, the `percentile` and `percentile_ranks` aggregations create a new `DoubleHistogram` instance each time when creating empty aggregation. This is very wasteful. Especially when `numberOfSignificantValueDigits` is `5`. Then each instance costs ~5MB. This change uses `null` instead. At reduce time `InternalHDRPercentileRanks` with a `null` state are skipped. In case all `InternalHDRPercentileRanks` are null then reduce uses an empty `DoubleHistogram` instance. Note that `buildEmptyAggregations()` could also return an empty `DoubleHistogram` instance. However, serializing an empty `DoubleHistogram` instance also adds significant overhead. (`ByteBuffer` instance is created based on `DoubleHistogram#getNeededByteBufferCapacity()`, and that can cause a very large ByteBuffer instance to be created)

elasticsearchmachine · 2022-12-23T13:59:51Z

Hi @martijnvg, I've created a changelog YAML for you.

elasticsearchmachine · 2022-12-23T15:52:51Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

nik9000

Makes sense to me.

…stic#92547) Currently, the `percentile` and `percentile_ranks` aggregations create a new `DoubleHistogram` instance each time when creating empty aggregation. This is very wasteful. Especially when `numberOfSignificantValueDigits` is `5`. Then each instance costs ~5MB. This change uses `null` instead. At reduce time `InternalHDRPercentileRanks` with a `null` state are skipped. In case all `InternalHDRPercentileRanks` are null then reduce uses an empty `DoubleHistogram` instance. Note that `buildEmptyAggregations()` could also return an empty `DoubleHistogram` instance. However, serializing an empty `DoubleHistogram` instance also adds significant overhead. (`ByteBuffer` instance is created based on `DoubleHistogram#getNeededByteBufferCapacity()`, and that can cause a very large ByteBuffer instance to be created)

martijnvg added the :Analytics/Aggregations Aggregations label Dec 23, 2022

elasticsearchmachine added the v8.7.0 label Dec 23, 2022

martijnvg added the >bug label Dec 23, 2022

martijnvg added 5 commits December 23, 2022 14:59

Update docs/changelog/92547.yaml

fe528a0

spotless

14a7d27

fixed npe

a5a763e

Merge remote-tracking branch 'es/main' into hdr_and_empty_buckets

6ac052c

fixed npe in mixed cluster

9f52453

martijnvg marked this pull request as ready for review December 23, 2022 15:52

martijnvg requested a review from nik9000 December 23, 2022 15:52

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Dec 23, 2022

Merge remote-tracking branch 'es/main' into hdr_and_empty_buckets

be85eb6

nik9000 approved these changes Jan 10, 2023

View reviewed changes

martijnvg merged commit a1ea6ea into elastic:main Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't create a new `DoubleHistogram` instance for empty buckets. #92547

Don't create a new `DoubleHistogram` instance for empty buckets. #92547

martijnvg commented Dec 23, 2022

elasticsearchmachine commented Dec 23, 2022

elasticsearchmachine commented Dec 23, 2022

nik9000 left a comment

Don't create a new DoubleHistogram instance for empty buckets. #92547

Don't create a new DoubleHistogram instance for empty buckets. #92547

Conversation

martijnvg commented Dec 23, 2022

elasticsearchmachine commented Dec 23, 2022

elasticsearchmachine commented Dec 23, 2022

nik9000 left a comment

Choose a reason for hiding this comment

Don't create a new `DoubleHistogram` instance for empty buckets. #92547

Don't create a new `DoubleHistogram` instance for empty buckets. #92547