Skip to content

Conversation

@DaveCTurner
Copy link
Contributor

Follow-on from #95115 using the new async runUnderPrimaryPermit to remove almost all the remaining blocking in RecoverySourceHandler.

@DaveCTurner DaveCTurner added >non-issue :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v8.8.0 labels Apr 17, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Apr 17, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner
Copy link
Contributor Author

I hope that this is easy enough to review as a sequence of individual commits (ignoring whitespace changes). If not I can make a sequence of PRs.

Note that after these changes the only blocking during recovery happens at relocation handoff when waiting for all ongoing operations to finish. I hope to fix that soon too, but it's a more involved change because the blocking is deep within IndexShardOperationPermits. Once that's done, we should be able to write deterministic tests that execute all kinds of recovery, including primary relocations.

@DaveCTurner DaveCTurner marked this pull request as draft April 18, 2023 07:46
@DaveCTurner
Copy link
Contributor Author

Marking this as WIP because this simple approach doesn't quite work, see #95309.

@DaveCTurner
Copy link
Contributor Author

The retention lease bits are a little tricky; we've fixed the create-lease step in #95309 and #95324 will fix the delete-lease step. Once that's done I'll update this PR because the remaining steps are much simpler.

@DaveCTurner DaveCTurner force-pushed the 2023-04-17-more-nonblocking-recovery branch from 914d522 to 897e9a2 Compare April 19, 2023 14:20
@gmarouli gmarouli added v8.9.0 and removed v8.8.0 labels Apr 26, 2023
@DaveCTurner DaveCTurner force-pushed the 2023-04-17-more-nonblocking-recovery branch from 897e9a2 to 93d672a Compare May 2, 2023 14:51
@DaveCTurner DaveCTurner marked this pull request as ready for review May 2, 2023 15:37
@DaveCTurner
Copy link
Contributor Author

Ok this is good to review now

@DaveCTurner DaveCTurner removed the WIP label May 4, 2023
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Thanks for splitting it up into individual commits, made it an easy review.

@DaveCTurner DaveCTurner added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label May 9, 2023
@elasticsearchmachine elasticsearchmachine merged commit da1404e into elastic:main May 9, 2023
@DaveCTurner DaveCTurner deleted the 2023-04-17-more-nonblocking-recovery branch May 9, 2023 06:43
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Oct 10, 2023
`IndexShard#markAllocationIdAsInSync` is interruptible because it may
block the thread on a monitor waiting for the local checkpoint to
advance, but we lost the ability to interrupt it on a recovery
cancellation in elastic#95270.

Closes elastic#96578
Closes elastic#100589
DaveCTurner added a commit that referenced this pull request Oct 11, 2023
`IndexShard#markAllocationIdAsInSync` is interruptible because it may
block the thread on a monitor waiting for the local checkpoint to
advance, but we lost the ability to interrupt it on a recovery
cancellation in #95270.

Closes #96578
Closes #100589
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Oct 11, 2023
`IndexShard#markAllocationIdAsInSync` is interruptible because it may
block the thread on a monitor waiting for the local checkpoint to
advance, but we lost the ability to interrupt it on a recovery
cancellation in elastic#95270.

Closes elastic#96578
Closes elastic#100589
elasticsearchmachine pushed a commit that referenced this pull request Oct 11, 2023
`IndexShard#markAllocationIdAsInSync` is interruptible because it may
block the thread on a monitor waiting for the local checkpoint to
advance, but we lost the ability to interrupt it on a recovery
cancellation in #95270.

Closes #96578
Closes #100589
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >non-issue Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.9.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants