New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Item status code not checked by Elasticsearch Bulk API client #192

Closed
andrewkroh opened this Issue Oct 21, 2015 · 1 comment

Comments

Projects
None yet
4 participants
@andrewkroh
Member

andrewkroh commented Oct 21, 2015

Summary: The Elasticsearch Bulk API client in libbeat does not check the status codes returned in the items array. This could result in data loss if an insert fails because the Beat will think the insert was property acknowledged and will not retry the delivery.

Background: The Elasticsearch Bulk API returns an "items" array containing one entry for every document that was indexed. There is a status code for each element. There is also an HTTP status code for the whole bulk API request. The HTTP response code can be 201 even though an item failed to insert.

Relevant Code: https://github.com/urso/libbeat/blob/master/outputs/elasticsearch/bulkapi.go#L42

Example response containing a failure:

{  
   "took":6477,
   "errors":true,
   "items":[  
      {  
         "create":{  
            "_index":"eventbeat-2014.07.24",
            "_type":"eventlog",
            "_id":"AVCMajFei1FaLysRhu6k",
            "_version":1,
            "status":201
         }
      },
      {  
         "create":{  
            "_index":"eventbeat-2014.07.24",
            "_type":"eventlog",
            "_id":"AVCMajFei1FaLysRhu6l",
            "status":429,
            "error":"EsRejectedExecutionException[rejected execution (queue capacity 50) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1@1420e113]"
         }
      }
   ]
}

@andrewkroh andrewkroh added the bug label Oct 21, 2015

@kimchy

This comment has been minimized.

Member

kimchy commented Oct 21, 2015

it also returns a status code when the thread pool rejects the execution and it being overloaded (503). In this case, I think the beat should log a message and do some sort of back off, retrying the same "location" when applicable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment