VersionConflictEngineException with script update in cluster #13619

falkorichter · 2015-09-16T15:59:55Z

We´re having problems with VersionConflictEngineExceptions all the time. The update should happen as a script and increment a number value (see sample document below)

We´re running a cluster of two els instances and I can only imagine that the synchronization is causing the conflict version in one node. I would expect the update not to throw this kind of exception in a cluster, as each update is atomically. We are running four application servers that execute this code and the Exceptions are thrown randomly on all instances.

stacktrace:

Caused by: org.elasticsearch.index.engine.VersionConflictEngineException: [kpi][4] [opportunity][1442415600000]: version conflict, current [5933], provided [5932]
        at org.elasticsearch.index.engine.internal.InternalEngine.innerIndex(InternalEngine.java:582) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.index.engine.internal.InternalEngine.index(InternalEngine.java:522) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:425) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:193) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:512) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.doStart(TransportShardReplicationOperationAction.java:426) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.start(TransportShardReplicationOperationAction.java:342) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction.doExecute(TransportShardReplicationOperationAction.java:97) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.action.index.TransportIndexAction.innerExecute(TransportIndexAction.java:134) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.action.index.TransportIndexAction.doExecute(TransportIndexAction.java:112) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.action.index.TransportIndexAction.doExecute(TransportIndexAction.java:60) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:75) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.action.update.TransportUpdateAction.shardOperation(TransportUpdateAction.java:217) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.action.update.TransportUpdateAction.shardOperation(TransportUpdateAction.java:170) [elasticsearch-1.4.4.jar:]
        at org.elasticsearch.action.support.single.instance.TransportInstanceSingleOperationAction$AsyncSingleAction$1.run(TransportInstanceSingleOperationAction.java:187) [elasticsearch-1.4.4.jar:]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_20]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_20]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_20]

sample document:

{
  "_index": "kpi",
  "_type": "opportunity",
  "_id": "1442412000000",
  "_version": 14742,
  "found": true,
  "_source": {
    "timestamp": "2015-09-16T14:00:00.249+0000",
    "own": 224,
    "shared": 2,
    "network": 3941,
    "unknown": 10575
  }
}

each update script looks like this (one of the lines, only one increment per script):

ctx._source.own+=1;
ctx._source.shared+=1;
ctx._source.network+=1;
ctx._source.unknown+=1;

The text was updated successfully, but these errors were encountered:

jasontedor · 2015-09-16T16:11:55Z

Can you confirm that you are not setting the retry_on_conflict parameter? This parameter is zero by default and is designed exactly for your use case of updates where the ordering of updates (say incrementing a counter) isn't important.

If you do confirm this, this behavior is expected when you have multiple writers attempting to update the same document. You can address this issue by using the retry_on_conflict parameter to retry when a version conflict occurs. You can read more about this issue in the documentation on partial updates including the specific section on conflicts.

falkorichter · 2015-09-16T17:22:53Z

I can confirm I´m not setting the retry_on_conflict parameter. But it sounds exactly like the parameter I want to use. I deployed with the parameter and the exceptions seem to be gone.

jasontedor added the feedback_needed label Sep 16, 2015

falkorichter closed this as completed Sep 16, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VersionConflictEngineException with script update in cluster #13619

VersionConflictEngineException with script update in cluster #13619

falkorichter commented Sep 16, 2015

jasontedor commented Sep 16, 2015

falkorichter commented Sep 16, 2015

VersionConflictEngineException with script update in cluster #13619

VersionConflictEngineException with script update in cluster #13619

Comments

falkorichter commented Sep 16, 2015

jasontedor commented Sep 16, 2015

falkorichter commented Sep 16, 2015