Split indices API can cause heap space issues when the underlying shard segments are too large #107011
Labels
>bug
:Data Management/Indices APIs
APIs to create and manage indices and templates
Team:Data Management
Meta label for data/management team
When attempting to split indices with shards that occupy a considerable memory footprint, the process of opening the hard-linked segment files when the split shards are opened can cause spikes in heap consumption. For a single-shard index with a shard memory footprint of 5 gb, splitting that index into 5 shards will cause the node hosting the shard to consume approximately 30 gb of heap:
This is because at the Lucene level, each shard reader must load the full 5 gb of heap space needed to host the entire shard however many times the index has been split. This is eventually remedied when the segment files are rewritten, but until that happens there is a chance for instability due to heap consumption. The original index continues to contribute to heap consumption because the split operation does not remove the original index.
From my searching, I have not found any instances where we check projected heap consumption before performing a split operation.
Distantly related to #98107 because this issue pertains to open reader instances consuming excessive heap.
The text was updated successfully, but these errors were encountered: