-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find a way to Deduplicate Index Settings #78892
Labels
:Distributed/Cluster Coordination
Cluster formation and cluster state publication, including cluster membership and fault detection.
>enhancement
Team:Distributed
Meta label for distributed team
Comments
original-brownbear
added
>enhancement
:Distributed/Cluster Coordination
Cluster formation and cluster state publication, including cluster membership and fault detection.
labels
Oct 10, 2021
Pinging @elastic/es-distributed (Team:Distributed) |
97 tasks
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this issue
Nov 8, 2021
This is a somewhat crude solution to elastic#78892 that addresses 95%+ of duplicate setting entry memory consumption in large clusters. The remaining duplicate structures (lists of all the same strings) are comparatively cheap in their heap consumption. In heavy benchmarking for elastic#77466 no runtime impact of adding this extra step to setting creation has been found despite pushing setting creation harder than is expected in real-world usage (part of the low relative impact here is the fact that populating a tree-map is quite expensive to begin with so adding the string interning which is fast via the CHM cache doesn't add much overhead). On the other hand, the heap use impact for use-cases that come with a large number of duplicate settings (many similar indices) is significant. As an example, 10k AuditBeat indices consume about 500M of heap for duplicate settings data structures without this change. This cahnge brings the heap consumption from duplicate settings down to O(1M) on every node in the cluster. Relates and addresses most of elastic#78892 Relates elastic#77466
original-brownbear
added a commit
that referenced
this issue
Nov 10, 2021
This is a somewhat crude solution to #78892 that addresses 95%+ of duplicate setting entry memory consumption in large clusters. The remaining duplicate structures (lists of all the same strings) are comparatively cheap in their heap consumption. In heavy benchmarking for #77466 no runtime impact of adding this extra step to setting creation has been found despite pushing setting creation harder than is expected in real-world usage (part of the low relative impact here is the fact that populating a tree-map is quite expensive to begin with so adding the string interning which is fast via the CHM cache doesn't add much overhead). On the other hand, the heap use impact for use-cases that come with a large number of duplicate settings (many similar indices) is significant. As an example, 10k AuditBeat indices consume about 500M of heap for duplicate settings data structures without this change. This cahnge brings the heap consumption from duplicate settings down to O(1M) on every node in the cluster. Relates and addresses most of #78892 Relates #77466
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this issue
Nov 10, 2021
This is a somewhat crude solution to elastic#78892 that addresses 95%+ of duplicate setting entry memory consumption in large clusters. The remaining duplicate structures (lists of all the same strings) are comparatively cheap in their heap consumption. In heavy benchmarking for elastic#77466 no runtime impact of adding this extra step to setting creation has been found despite pushing setting creation harder than is expected in real-world usage (part of the low relative impact here is the fact that populating a tree-map is quite expensive to begin with so adding the string interning which is fast via the CHM cache doesn't add much overhead). On the other hand, the heap use impact for use-cases that come with a large number of duplicate settings (many similar indices) is significant. As an example, 10k AuditBeat indices consume about 500M of heap for duplicate settings data structures without this change. This cahnge brings the heap consumption from duplicate settings down to O(1M) on every node in the cluster. Relates and addresses most of elastic#78892 Relates elastic#77466
original-brownbear
added a commit
that referenced
this issue
Nov 10, 2021
This is a somewhat crude solution to #78892 that addresses 95%+ of duplicate setting entry memory consumption in large clusters. The remaining duplicate structures (lists of all the same strings) are comparatively cheap in their heap consumption. In heavy benchmarking for #77466 no runtime impact of adding this extra step to setting creation has been found despite pushing setting creation harder than is expected in real-world usage (part of the low relative impact here is the fact that populating a tree-map is quite expensive to begin with so adding the string interning which is fast via the CHM cache doesn't add much overhead). On the other hand, the heap use impact for use-cases that come with a large number of duplicate settings (many similar indices) is significant. As an example, 10k AuditBeat indices consume about 500M of heap for duplicate settings data structures without this change. This cahnge brings the heap consumption from duplicate settings down to O(1M) on every node in the cluster. Relates and addresses most of #78892 Relates #77466
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this issue
Jan 17, 2022
This is a somewhat crude solution to elastic#78892 that addresses 95%+ of duplicate setting entry memory consumption in large clusters. The remaining duplicate structures (lists of all the same strings) are comparatively cheap in their heap consumption. In heavy benchmarking for elastic#77466 no runtime impact of adding this extra step to setting creation has been found despite pushing setting creation harder than is expected in real-world usage (part of the low relative impact here is the fact that populating a tree-map is quite expensive to begin with so adding the string interning which is fast via the CHM cache doesn't add much overhead). On the other hand, the heap use impact for use-cases that come with a large number of duplicate settings (many similar indices) is significant. As an example, 10k AuditBeat indices consume about 500M of heap for duplicate settings data structures without this change. This cahnge brings the heap consumption from duplicate settings down to O(1M) on every node in the cluster. Relates and addresses most of elastic#78892 Relates elastic#77466
original-brownbear
added a commit
that referenced
this issue
Jan 17, 2022
This is a somewhat crude solution to #78892 that addresses 95%+ of duplicate setting entry memory consumption in large clusters. The remaining duplicate structures (lists of all the same strings) are comparatively cheap in their heap consumption. In heavy benchmarking for #77466 no runtime impact of adding this extra step to setting creation has been found despite pushing setting creation harder than is expected in real-world usage (part of the low relative impact here is the fact that populating a tree-map is quite expensive to begin with so adding the string interning which is fast via the CHM cache doesn't add much overhead). On the other hand, the heap use impact for use-cases that come with a large number of duplicate settings (many similar indices) is significant. As an example, 10k AuditBeat indices consume about 500M of heap for duplicate settings data structures without this change. This cahnge brings the heap consumption from duplicate settings down to O(1M) on every node in the cluster. Relates and addresses most of #78892 Relates #77466
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Distributed/Cluster Coordination
Cluster formation and cluster state publication, including cluster membership and fault detection.
>enhancement
Team:Distributed
Meta label for distributed team
Relates #77466
Unlike the mapping metadata which we store in compressed+serialized for on this master node, we do store the settings for each index as deserialized
Settings
object. This can be a significant source of heap usage if there's a large number of indices with non-trivial settings. An example of this would be the audit-beats template which contains a fairly long list of field names inindex.query.default_field
. In this example, handling 10k audit-beat indices takes almost 500MB of master heap just for storing the duplicate lists of field names inSettings
instances.I will look for an easy win here, it shouldn't be too hard to deduplicate these in some form when building index metadata.
The text was updated successfully, but these errors were encountered: