Improve peer recovery performance in active indexing #58011
Labels
:Distributed/Recovery
Anything around constructing a new shard, either from a local or a remote source.
>enhancement
Team:Distributed
Meta label for distributed team
Phase2 of a peer recovery can take a lot of time if the index is large when the recovery starts, and we are actively indexing to it. The reason is that we have to replay a huge number of operations that occur while we are transferring the segment files. I see two things that we can improve:
Send/replay operations concurrently like we are doing for file chunks
Before starting phase2, we check the total operations that we will replay. If the number of the pending operations crosses the file-based recovery threshold, then it would be better to start over the recovery.
The text was updated successfully, but these errors were encountered: