Queue size? #427

bytenik · 2013-12-18T22:35:52Z

I suspect this isn't a NEST specific issue, but I was hoping to get some help interpreting this error since I haven't deeply looked into how NEST works with bulk:

System.InvalidOperationException: Unknown error came back during bulk operation: RemoteTransportException[[Death][inet[/172.18.0.9:9300]][bulk/shard]]; nested: EsRejectedExecutionException[rejected execution (queue capacity 50) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@12ae9af];

Does this mean I'm doing too many operations in one bulk at one time, or too many bulks in a row, or what? Is there a setting I should be increasing or something I should be doing differently?

Mpdreamz · 2013-12-20T09:23:29Z

You are doing too many bulk requests concurrently possibly with too much of a load, which is why you see calls being queued for work.

Elasticsearch maintains different threadpools for different kind of actions see:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html

The bulk has a default max queue of 50.

NEST Async by default is unbounded meaning it will spawn as many calls on IO completion ports as it can. Which is what you want for searching, getting, updating, posting.

For indexing in bulk you can instantiate a client like so:

 var settings = new ConnectionSettings(....)
          .SetMaximumConnnections(25);
 var client = new ElasticClient(settings);

This way NEST will make sure that it wont have more than 25 concurrent requests to elasticsearch.

bytenik · 2013-12-20T11:53:49Z

To clarify, is this the total number of bulk calls, or the number of bulk operations inside each request? i.e. does 2 bulk calls that contain 30 ops each cause this issue or does only 50+ bulk calls of any count cause it?

I do in fact routinely do a ton of index operations all at once on bulk operations. I'm also OK with these getting queued up if necessary. Though honestly, I'm surprised that 200 index ops can cause such load problems that they all get queued up. Or, are we saying that I've got 50 bulk calls with hundreds of index ops in each one?

Mpdreamz · 2013-12-20T12:52:31Z

Its the total number of bulk calls that are allowed to be queued for execution irregardless of how many operations you put in each.

Each elasticsearch node has N cores * 5 fixed dedicated threads doing the bulk index work, if all of them are busy then calls are put in the queue, if that queue reaches 50 it starts bouncing request with the exception you just saw.

Without knowing the exact details of your request (how big is the json, how many ops are per bulk request) my main suspect for poor indexing performance is that you still have the index.refresh_interval set to 1s this means it will perform a flush on the data every second. Out of the box this is what you want and gives elasticsearch its NRT characteristics with regards to POST'ing a document and it turning up on your searches. But if you are doing bulk operations it is probably a good practice to turn this off prior to indexing (by setting it to -1) and turning it on again afterwards.

var settings = new IndexSettings();
settings["refresh_interval"] = "-1";
client.UpdateSettings(indexName, settings);

Only do this though if you are indexing into a clean new index, if its an index that is actively posted new documents/updates than those might not be available untill the next flush.

http://blog.sematext.com/2013/07/08/elasticsearch-refresh-interval-vs-indexing-performance/ gives a good indication what the refresh_interval does to indexing performance.

Having said all of that if your batch size is 200 it seems very strange that you are hitting the queue limit.

How many documents are you indexing and what is your batchsize? What also helps is logging the bulk response times (.Took property on the IBulkResponse) to see if there is a pattern somehow i.e is it a ramp up or are there are a couple of rogue calls?

Also following the calls with Fiddler might give some more insights into whats going on.
http://fiddler2.com/documentation/Configure-Fiddler/Tasks/ConfigureDotNETApp

deads2k · 2014-07-21T14:33:14Z

I just hit this, figured out why, and decided that other people might hit the same issue. It's not exactly the number of bulk requests made, it is actually the total number of shards that will be updated on a given node by the bulk calls. This means the contents of the actual bulk operations inside the bulk request actually matter. For instance, if you have a single node, with a single index, running on an 8 core box, with 60 shards and you issue a bulk request that has indexing operations that affects all 60 shards, you will get this error message with a single bulk request.

If anyone wants to change this, you can see the splitting happening inside of org.elasticsearch.action.bulk.TransportBulkAction.executeBulk() near the comment "go over all the request and create a ShardId". The individual requests happen a few lines down around line 293 on version 1.2.1.

Mpdreamz closed this as completed Dec 20, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queue size? #427

Queue size? #427

bytenik commented Dec 18, 2013

Mpdreamz commented Dec 20, 2013

bytenik commented Dec 20, 2013

Mpdreamz commented Dec 20, 2013

deads2k commented Jul 21, 2014

Queue size? #427

Queue size? #427

Comments

bytenik commented Dec 18, 2013

Mpdreamz commented Dec 20, 2013

bytenik commented Dec 20, 2013

Mpdreamz commented Dec 20, 2013

deads2k commented Jul 21, 2014