Wait on shard failures #14252

jasontedor opened this Issue Oct 22, 2015 · 4 comments


None yet

4 participants


Currently when executing an action (e.g., bulk, delete, or indexing operations) on all shards, if an exception occurs while executing the action on a replica shard we send a shard failure message to the master. However, we do not wait for the master to acknowledge this message and do not handle failures in sending this message to the master. This is problematic because it means that we will acknowledge the action and this can result in losing writes. For example, in a situation where a primary is isolated from the master and its replicas, the following sequence of events can occur:

  1. we write to the local primary
  2. we fail to write to the replicas
  3. we fail in notifying the master to fail the replicas
  4. the primary acknowledges the write to the client
  5. the master notices the primary is gone and promotes one of the replicas to be primary

In this case, the replica will not have the write that was acknowledged to the client and this amounts to data loss.

Instead, if we waited on the master to acknowledge the shard failures we would never have acknowledged the write to the client in this case.

  • Create listener mechanism for executing callbacks when exceptions occur sending a shard failure message to the master #14295
  • Add unit tests that show we wait until failure or success (do not have to handle the failures yet) #14707
  • Add general support for cluster state batch updates #14899
  • Apply cluster state batch updates to shard failures #15016
  • Handle when the node we thought was the master is no longer the master (e.g., master might have stepped down) -> find the actual master (e.g., wait for a new master to be elected) and retry the failed shard notice #15748
  • Fail shard failure requests from illegal sources #16275
  • Master tells us we are no longer the primary -> fail the local shard, retry request on new primary #16415
  • Handle failed shard has already been removed from the routing table -> okay #16089
  • Handle master side of shard failures (do not respond to the node until the new cluster state is published, otherwise report failure or allow the node to timeout) #15468
bleskes commented Oct 23, 2015


s1monw commented Oct 23, 2015

sounds good to me too

@jasontedor jasontedor added a commit that closed this issue Feb 10, 2016
@jasontedor jasontedor Fail demoted primary shards and retry request
This commit handles the scenario where a replication action fails on a
replica shard, the primary shard attempts to fail the replica shard
but the primary shard is notified of demotion by the master. In this
scenario, the demoted primary shard must be failed, and then the
request rerouted again to the new primary shard.

Closes #16415, closes #14252
makeyang commented Apr 5, 2016

will this one resovled issue:7572?

bleskes commented Apr 5, 2016

@makeyang yes. you are correct. We are waiting with closing that issue until #17038 is in.

@bleskes bleskes added a commit that referenced this issue Apr 7, 2016
@bleskes bleskes Update resliency page
#14252 , #7572 , #15900, #12573, #14671, #15281 and #9126 have all been closed/merged and will be part of 5.0.0.
@bleskes bleskes added a commit that referenced this issue Apr 7, 2016
@bleskes bleskes Update resiliency page (#17586)
#14252 , #7572 , #15900, #12573, #14671, #15281 and #9126 have all been closed/merged and will be part of 5.0.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment