Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split indices API can cause heap space issues when the underlying shard segments are too large #107011

Open
jbaiera opened this issue Apr 2, 2024 · 1 comment
Labels
>bug :Data Management/Indices APIs APIs to create and manage indices and templates Team:Data Management Meta label for data/management team

Comments

@jbaiera
Copy link
Member

jbaiera commented Apr 2, 2024

When attempting to split indices with shards that occupy a considerable memory footprint, the process of opening the hard-linked segment files when the split shards are opened can cause spikes in heap consumption. For a single-shard index with a shard memory footprint of 5 gb, splitting that index into 5 shards will cause the node hosting the shard to consume approximately 30 gb of heap:

  1 shard (original index) x 5gb heap
+ 5 shards (split index) x 5gb heap each
============
  6 shards (total) @ 30gb total heap space needed

This is because at the Lucene level, each shard reader must load the full 5 gb of heap space needed to host the entire shard however many times the index has been split. This is eventually remedied when the segment files are rewritten, but until that happens there is a chance for instability due to heap consumption. The original index continues to contribute to heap consumption because the split operation does not remove the original index.

From my searching, I have not found any instances where we check projected heap consumption before performing a split operation.

Distantly related to #98107 because this issue pertains to open reader instances consuming excessive heap.

@jbaiera jbaiera added >bug :Data Management/Indices APIs APIs to create and manage indices and templates labels Apr 2, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Indices APIs APIs to create and manage indices and templates Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

2 participants