Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent invalid renewals of PRRLs #43898

Conversation

DaveCTurner
Copy link
Contributor

Today when a PRRL is created during peer recovery it retains history from the
sequence number provided by the recovering peer. However this sequence number
may be greater than the primary's knowledge of that peer's persisted global
checkpoint. Subsequent renewals of this lease will attempt to set the retained
sequence number back to the primary's knowledge of that peer's persisted global
checkpoint tripping an assertion that retention leases must only advance. This
commit accounts for this.

Caught by a failure of RecoveryWhileUnderLoadIT.testRecoverWhileRelocating

Relates #41536

Today when a PRRL is created during peer recovery it retains history from the
sequence number provided by the recovering peer. However this sequence number
may be greater than the primary's knowledge of that peer's persisted global
checkpoint. Subsequent renewals of this lease will attempt to set the retained
sequence number back to the primary's knowledge of that peer's persisted global
checkpoint tripping an assertion that retention leases must only advance.  This
commit accounts for this.

Caught by [a failure of `RecoveryWhileUnderLoadIT.testRecoverWhileRelocating`](https://scans.gradle.com/s/wxccfrtfgjj3g/console-log?task=:server:integTest#L14)

Relates elastic#41536
@DaveCTurner DaveCTurner added >test Issues or PRs that are addressing/adding tests :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. labels Jul 3, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ywelsch
Copy link
Contributor

ywelsch commented Jul 3, 2019

@DaveCTurner and I discussed this one after raising concerns that the change here could possibly hide future bugs. Instead, we want to know about cases where the assertion that retention leases must only advance trips, and then address those cases. For example, we noticed that we should probably remove leases for shards that are undergoing file-based recovery (as these recoveries are destructive in nature), which could be the cause for the linked test failure.

@DaveCTurner
Copy link
Contributor Author

Thanks both for your inputs. As Yannick says, we want more nuance here. I opened #43928 to address this particular case, so will close this.

@DaveCTurner DaveCTurner closed this Jul 3, 2019
@DaveCTurner DaveCTurner deleted the 2019-07-03-invalid-prrl-renewal branch July 3, 2019 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >test Issues or PRs that are addressing/adding tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants