Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't create a new DoubleHistogram instance for empty buckets. #92547

Merged
merged 7 commits into from
Jan 11, 2023

Conversation

martijnvg
Copy link
Member

Currently, the percentile and percentile_ranks aggregations create a new DoubleHistogram instance each time when creating empty aggregation. This is very wasteful. Especially when numberOfSignificantValueDigits is 5. Then each instance costs ~5MB.

This change uses null instead. At reduce time InternalHDRPercentileRanks with a null state are skipped. In case all InternalHDRPercentileRanks are null then reduce uses an empty DoubleHistogram instance.

Note that buildEmptyAggregations() could also return an empty DoubleHistogram instance. However, serializing an empty DoubleHistogram instance also adds significant overhead. (ByteBuffer instance is created based on DoubleHistogram#getNeededByteBufferCapacity(), and that can cause a very large ByteBuffer instance to be created)

Currently, the `percentile` and `percentile_ranks` aggregations create a new `DoubleHistogram` instance each time when creating empty aggregation. This is very wasteful. Especially when `numberOfSignificantValueDigits` is `5`. Then each instance costs ~5MB.

This change uses `null` instead. At reduce time `InternalHDRPercentileRanks` with a `null` state are skipped.
In case all `InternalHDRPercentileRanks` are null then reduce uses an empty `DoubleHistogram` instance.

Note that `buildEmptyAggregations()` could also return an empty `DoubleHistogram` instance. However, serializing an empty `DoubleHistogram` instance also adds significant overhead. (`ByteBuffer` instance is created based on `DoubleHistogram#getNeededByteBufferCapacity()`, and that can cause a very large ByteBuffer instance to be created)
@elasticsearchmachine
Copy link
Collaborator

Hi @martijnvg, I've created a changelog YAML for you.

@martijnvg martijnvg marked this pull request as ready for review December 23, 2022 15:52
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Dec 23, 2022
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

@martijnvg martijnvg merged commit a1ea6ea into elastic:main Jan 11, 2023
HiDAl pushed a commit to HiDAl/elasticsearch that referenced this pull request Jan 11, 2023
…stic#92547)

Currently, the `percentile` and `percentile_ranks` aggregations create a new `DoubleHistogram` instance each time when creating empty aggregation. This is very wasteful. Especially when `numberOfSignificantValueDigits` is `5`. Then each instance costs ~5MB.

This change uses `null` instead. At reduce time `InternalHDRPercentileRanks` with a `null` state are skipped.
In case all `InternalHDRPercentileRanks` are null then reduce uses an empty `DoubleHistogram` instance.

Note that `buildEmptyAggregations()` could also return an empty `DoubleHistogram` instance. However, serializing an empty `DoubleHistogram` instance also adds significant overhead. (`ByteBuffer` instance is created based on `DoubleHistogram#getNeededByteBufferCapacity()`, and that can cause a very large ByteBuffer instance to be created)
danielmitterdorfer pushed a commit to danielmitterdorfer/elasticsearch that referenced this pull request Jan 12, 2023
…stic#92547)

Currently, the `percentile` and `percentile_ranks` aggregations create a new `DoubleHistogram` instance each time when creating empty aggregation. This is very wasteful. Especially when `numberOfSignificantValueDigits` is `5`. Then each instance costs ~5MB.

This change uses `null` instead. At reduce time `InternalHDRPercentileRanks` with a `null` state are skipped.
In case all `InternalHDRPercentileRanks` are null then reduce uses an empty `DoubleHistogram` instance.

Note that `buildEmptyAggregations()` could also return an empty `DoubleHistogram` instance. However, serializing an empty `DoubleHistogram` instance also adds significant overhead. (`ByteBuffer` instance is created based on `DoubleHistogram#getNeededByteBufferCapacity()`, and that can cause a very large ByteBuffer instance to be created)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants