Improve peer recovery performance in active indexing #58011

dnhatn · 2020-06-11T19:48:14Z

Phase2 of a peer recovery can take a lot of time if the index is large when the recovery starts, and we are actively indexing to it. The reason is that we have to replay a huge number of operations that occur while we are transferring the segment files. I see two things that we can improve:

Send/replay operations concurrently like we are doing for file chunks
Before starting phase2, we check the total operations that we will replay. If the number of the pending operations crosses the file-based recovery threshold, then it would be better to start over the recovery.

elasticmachine · 2020-06-11T19:48:15Z

Pinging @elastic/es-distributed (:Distributed/Recovery)

dnhatn · 2020-06-30T17:12:32Z

We discussed this two weeks ago. The root cause of the related issue is that the setting cluster.routing.allocation.node_concurrent_recoveries was set too high. We agreed to introduce a limit for that setting. We also agreed to support concurrency in phase 2 of peer recoveries but disable it by default. I will close this issue and continue working on the PR.

dnhatn added >enhancement :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. team-discuss labels Jun 11, 2020

elasticmachine added the Team:Distributed Meta label for distributed team label Jun 11, 2020

dnhatn mentioned this issue Jun 12, 2020

Sending operations concurrently in peer recovery #58018

Merged

henningandersen mentioned this issue Jun 15, 2020

Account for remaining recovery in disk allocator #58029

Merged

dnhatn closed this as completed Jun 30, 2020

dnhatn removed the team-discuss label Jun 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve peer recovery performance in active indexing #58011

Improve peer recovery performance in active indexing #58011

dnhatn commented Jun 11, 2020

elasticmachine commented Jun 11, 2020

dnhatn commented Jun 30, 2020

Improve peer recovery performance in active indexing #58011

Improve peer recovery performance in active indexing #58011

Comments

dnhatn commented Jun 11, 2020

elasticmachine commented Jun 11, 2020

dnhatn commented Jun 30, 2020