Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve peer recovery performance in active indexing #58011

Closed
dnhatn opened this issue Jun 11, 2020 · 2 comments
Closed

Improve peer recovery performance in active indexing #58011

dnhatn opened this issue Jun 11, 2020 · 2 comments
Labels
:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement Team:Distributed Meta label for distributed team

Comments

@dnhatn
Copy link
Member

dnhatn commented Jun 11, 2020

Phase2 of a peer recovery can take a lot of time if the index is large when the recovery starts, and we are actively indexing to it. The reason is that we have to replay a huge number of operations that occur while we are transferring the segment files. I see two things that we can improve:

  • Send/replay operations concurrently like we are doing for file chunks

  • Before starting phase2, we check the total operations that we will replay. If the number of the pending operations crosses the file-based recovery threshold, then it would be better to start over the recovery.

@dnhatn dnhatn added >enhancement :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. team-discuss labels Jun 11, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Recovery)

@dnhatn
Copy link
Member Author

dnhatn commented Jun 30, 2020

We discussed this two weeks ago. The root cause of the related issue is that the setting cluster.routing.allocation.node_concurrent_recoveries was set too high. We agreed to introduce a limit for that setting. We also agreed to support concurrency in phase 2 of peer recoveries but disable it by default. I will close this issue and continue working on the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement Team:Distributed Meta label for distributed team
Projects
None yet
Development

No branches or pull requests

2 participants