Disconnect between coordinating node and shards can cause duplicate updates or wrong status code #9967

Open
brwe opened this Issue Mar 3, 2015 · 0 comments

Projects

None yet

3 participants

@brwe
Contributor
brwe commented Mar 3, 2015

A document update can be sent to any node in the cluster (coordinating node) and this node will forward it to the ode that has the shard (the executing node). If the update fails, then under certain conditions the coordinating node tries to send the the update again (for example https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/support/replication/TransportShardReplicationOperationAction.java#L447). However, the executing node might already have applied the update and will then just apply it again. This is problematic if the update was for example increasing a counter. The same effect might cause the wrong status code to be returned for versioned indexing requests. A real word scenario where this can happen is when nodes are restarted that have shards without replicas and updates are send to the restarted node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment