Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not allow stale replicas to automatically be promoted to primary #14671

Closed
jasontedor opened this issue Nov 11, 2015 · 4 comments
Closed

Do not allow stale replicas to automatically be promoted to primary #14671

jasontedor opened this issue Nov 11, 2015 · 4 comments

Comments

@jasontedor
Copy link
Member

Consider a primary shard P hosted on node p and its replica shard Q hosted on node q. If p is isolated from the cluster (e.g., through node failure, a flapping NIC, or an excessively long garbage collection pause), indexing operations can continue on q after Q is promoted to primary; these indexing operations will be acknowledged to the requesting clients. If q is subsequently isolated before p rejoins and before a new replica is assigned to another node in the cluster, the subsequent rejoining of p can currently lead to P being promoted to primary again. The indexing operations acknowledged by q will be lost.

A mechanism needs to be built to prevent the automatic promotion of a stale shard in such a scenario and instead only promote a non-stale shard to primary (if a non-stale shard is availabie). The only scenario in which a stale shard should be promoted to primary is through manual intervention by a system operator (e.g., in cases when q suffers a total hardware failure).

Relates #10933

@bleskes
Copy link
Contributor

bleskes commented Nov 11, 2015

Thanks @jasontedor . can we also update the resiliency page?

@jasontedor
Copy link
Member Author

@bleskes Added to the Resiliency page in #14681.

@bleskes
Copy link
Contributor

bleskes commented Nov 11, 2015

Thanks Jason!

On 11 nov. 2015 4:35 PM +0100, Jason Tedornotifications@github.com, wrote:

@bleskes(https://github.com/bleskes)Added to the Resiliency page in#14681(#14681).


Reply to this email directly orview it on GitHub(#14671 (comment)).

@clintongormley
Copy link

Closed by #15281

bleskes added a commit that referenced this issue Apr 7, 2016
#14252 , #7572 , #15900, #12573, #14671, #15281 and #9126 have all been closed/merged and will be part of 5.0.0.
bleskes added a commit that referenced this issue Apr 7, 2016
#14252 , #7572 , #15900, #12573, #14671, #15281 and #9126 have all been closed/merged and will be part of 5.0.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants