Find a way to Deduplicate Index Settings #78892

original-brownbear · 2021-10-10T17:05:32Z

Unlike the mapping metadata which we store in compressed+serialized for on this master node, we do store the settings for each index as deserialized Settings object. This can be a significant source of heap usage if there's a large number of indices with non-trivial settings. An example of this would be the audit-beats template which contains a fairly long list of field names in index.query.default_field. In this example, handling 10k audit-beat indices takes almost 500MB of master heap just for storing the duplicate lists of field names in Settings instances.

I will look for an easy win here, it shouldn't be too hard to deduplicate these in some form when building index metadata.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-10-10T17:05:35Z

Pinging @elastic/es-distributed (Team:Distributed)

This is a somewhat crude solution to elastic#78892 that addresses 95%+ of duplicate setting entry memory consumption in large clusters. The remaining duplicate structures (lists of all the same strings) are comparatively cheap in their heap consumption. In heavy benchmarking for elastic#77466 no runtime impact of adding this extra step to setting creation has been found despite pushing setting creation harder than is expected in real-world usage (part of the low relative impact here is the fact that populating a tree-map is quite expensive to begin with so adding the string interning which is fast via the CHM cache doesn't add much overhead). On the other hand, the heap use impact for use-cases that come with a large number of duplicate settings (many similar indices) is significant. As an example, 10k AuditBeat indices consume about 500M of heap for duplicate settings data structures without this change. This cahnge brings the heap consumption from duplicate settings down to O(1M) on every node in the cluster. Relates and addresses most of elastic#78892 Relates elastic#77466

This is a somewhat crude solution to #78892 that addresses 95%+ of duplicate setting entry memory consumption in large clusters. The remaining duplicate structures (lists of all the same strings) are comparatively cheap in their heap consumption. In heavy benchmarking for #77466 no runtime impact of adding this extra step to setting creation has been found despite pushing setting creation harder than is expected in real-world usage (part of the low relative impact here is the fact that populating a tree-map is quite expensive to begin with so adding the string interning which is fast via the CHM cache doesn't add much overhead). On the other hand, the heap use impact for use-cases that come with a large number of duplicate settings (many similar indices) is significant. As an example, 10k AuditBeat indices consume about 500M of heap for duplicate settings data structures without this change. This cahnge brings the heap consumption from duplicate settings down to O(1M) on every node in the cluster. Relates and addresses most of #78892 Relates #77466

This is a somewhat crude solution to elastic#78892 that addresses 95%+ of duplicate setting entry memory consumption in large clusters. The remaining duplicate structures (lists of all the same strings) are comparatively cheap in their heap consumption. In heavy benchmarking for elastic#77466 no runtime impact of adding this extra step to setting creation has been found despite pushing setting creation harder than is expected in real-world usage (part of the low relative impact here is the fact that populating a tree-map is quite expensive to begin with so adding the string interning which is fast via the CHM cache doesn't add much overhead). On the other hand, the heap use impact for use-cases that come with a large number of duplicate settings (many similar indices) is significant. As an example, 10k AuditBeat indices consume about 500M of heap for duplicate settings data structures without this change. This cahnge brings the heap consumption from duplicate settings down to O(1M) on every node in the cluster. Relates and addresses most of elastic#78892 Relates elastic#77466

This is a somewhat crude solution to #78892 that addresses 95%+ of duplicate setting entry memory consumption in large clusters. The remaining duplicate structures (lists of all the same strings) are comparatively cheap in their heap consumption. In heavy benchmarking for #77466 no runtime impact of adding this extra step to setting creation has been found despite pushing setting creation harder than is expected in real-world usage (part of the low relative impact here is the fact that populating a tree-map is quite expensive to begin with so adding the string interning which is fast via the CHM cache doesn't add much overhead). On the other hand, the heap use impact for use-cases that come with a large number of duplicate settings (many similar indices) is significant. As an example, 10k AuditBeat indices consume about 500M of heap for duplicate settings data structures without this change. This cahnge brings the heap consumption from duplicate settings down to O(1M) on every node in the cluster. Relates and addresses most of #78892 Relates #77466

This is a somewhat crude solution to elastic#78892 that addresses 95%+ of duplicate setting entry memory consumption in large clusters. The remaining duplicate structures (lists of all the same strings) are comparatively cheap in their heap consumption. In heavy benchmarking for elastic#77466 no runtime impact of adding this extra step to setting creation has been found despite pushing setting creation harder than is expected in real-world usage (part of the low relative impact here is the fact that populating a tree-map is quite expensive to begin with so adding the string interning which is fast via the CHM cache doesn't add much overhead). On the other hand, the heap use impact for use-cases that come with a large number of duplicate settings (many similar indices) is significant. As an example, 10k AuditBeat indices consume about 500M of heap for duplicate settings data structures without this change. This cahnge brings the heap consumption from duplicate settings down to O(1M) on every node in the cluster. Relates and addresses most of elastic#78892 Relates elastic#77466

This is a somewhat crude solution to #78892 that addresses 95%+ of duplicate setting entry memory consumption in large clusters. The remaining duplicate structures (lists of all the same strings) are comparatively cheap in their heap consumption. In heavy benchmarking for #77466 no runtime impact of adding this extra step to setting creation has been found despite pushing setting creation harder than is expected in real-world usage (part of the low relative impact here is the fact that populating a tree-map is quite expensive to begin with so adding the string interning which is fast via the CHM cache doesn't add much overhead). On the other hand, the heap use impact for use-cases that come with a large number of duplicate settings (many similar indices) is significant. As an example, 10k AuditBeat indices consume about 500M of heap for duplicate settings data structures without this change. This cahnge brings the heap consumption from duplicate settings down to O(1M) on every node in the cluster. Relates and addresses most of #78892 Relates #77466

original-brownbear added >enhancement :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Oct 10, 2021

original-brownbear self-assigned this Oct 10, 2021

elasticmachine added the Team:Distributed Meta label for distributed team label Oct 10, 2021

original-brownbear mentioned this issue Oct 10, 2021

Fix Large Shard Count Scalability Issues #77466

Open

97 tasks

original-brownbear mentioned this issue Nov 8, 2021

Implement setting deduplication via String interning #80493

Merged

original-brownbear mentioned this issue Nov 10, 2021

Implement Setting Deduplication via String Interning (#80493) #80590

Merged

original-brownbear mentioned this issue Jan 17, 2022

Implement Setting Deduplication via String Interning (#80493) #82659

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find a way to Deduplicate Index Settings #78892

Find a way to Deduplicate Index Settings #78892

original-brownbear commented Oct 10, 2021

elasticmachine commented Oct 10, 2021

Find a way to Deduplicate Index Settings #78892

Find a way to Deduplicate Index Settings #78892

Comments

original-brownbear commented Oct 10, 2021

elasticmachine commented Oct 10, 2021