Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queue size? #427

Closed
bytenik opened this issue Dec 18, 2013 · 4 comments
Closed

Queue size? #427

bytenik opened this issue Dec 18, 2013 · 4 comments

Comments

@bytenik
Copy link
Contributor

bytenik commented Dec 18, 2013

I suspect this isn't a NEST specific issue, but I was hoping to get some help interpreting this error since I haven't deeply looked into how NEST works with bulk:

System.InvalidOperationException: Unknown error came back during bulk operation: RemoteTransportException[[Death][inet[/172.18.0.9:9300]][bulk/shard]]; nested: EsRejectedExecutionException[rejected execution (queue capacity 50) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@12ae9af];

Does this mean I'm doing too many operations in one bulk at one time, or too many bulks in a row, or what? Is there a setting I should be increasing or something I should be doing differently?

@Mpdreamz
Copy link
Member

You are doing too many bulk requests concurrently possibly with too much of a load, which is why you see calls being queued for work.

Elasticsearch maintains different threadpools for different kind of actions see:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html

The bulk has a default max queue of 50.

NEST Async by default is unbounded meaning it will spawn as many calls on IO completion ports as it can. Which is what you want for searching, getting, updating, posting.

For indexing in bulk you can instantiate a client like so:

 var settings = new ConnectionSettings(....)
          .SetMaximumConnnections(25);
 var client = new ElasticClient(settings);

This way NEST will make sure that it wont have more than 25 concurrent requests to elasticsearch.

@bytenik
Copy link
Contributor Author

bytenik commented Dec 20, 2013

To clarify, is this the total number of bulk calls, or the number of bulk operations inside each request? i.e. does 2 bulk calls that contain 30 ops each cause this issue or does only 50+ bulk calls of any count cause it?

I do in fact routinely do a ton of index operations all at once on bulk operations. I'm also OK with these getting queued up if necessary. Though honestly, I'm surprised that 200 index ops can cause such load problems that they all get queued up. Or, are we saying that I've got 50 bulk calls with hundreds of index ops in each one?

@Mpdreamz
Copy link
Member

Its the total number of bulk calls that are allowed to be queued for execution irregardless of how many operations you put in each.

Each elasticsearch node has N cores * 5 fixed dedicated threads doing the bulk index work, if all of them are busy then calls are put in the queue, if that queue reaches 50 it starts bouncing request with the exception you just saw.

Without knowing the exact details of your request (how big is the json, how many ops are per bulk request) my main suspect for poor indexing performance is that you still have the index.refresh_interval set to 1s this means it will perform a flush on the data every second. Out of the box this is what you want and gives elasticsearch its NRT characteristics with regards to POST'ing a document and it turning up on your searches. But if you are doing bulk operations it is probably a good practice to turn this off prior to indexing (by setting it to -1) and turning it on again afterwards.

var settings = new IndexSettings();
settings["refresh_interval"] = "-1";
client.UpdateSettings(indexName, settings);

Only do this though if you are indexing into a clean new index, if its an index that is actively posted new documents/updates than those might not be available untill the next flush.

http://blog.sematext.com/2013/07/08/elasticsearch-refresh-interval-vs-indexing-performance/ gives a good indication what the refresh_interval does to indexing performance.

Having said all of that if your batch size is 200 it seems very strange that you are hitting the queue limit.

How many documents are you indexing and what is your batchsize? What also helps is logging the bulk response times (.Took property on the IBulkResponse) to see if there is a pattern somehow i.e is it a ramp up or are there are a couple of rogue calls?

Also following the calls with Fiddler might give some more insights into whats going on.
http://fiddler2.com/documentation/Configure-Fiddler/Tasks/ConfigureDotNETApp

@deads2k
Copy link

deads2k commented Jul 21, 2014

I just hit this, figured out why, and decided that other people might hit the same issue. It's not exactly the number of bulk requests made, it is actually the total number of shards that will be updated on a given node by the bulk calls. This means the contents of the actual bulk operations inside the bulk request actually matter. For instance, if you have a single node, with a single index, running on an 8 core box, with 60 shards and you issue a bulk request that has indexing operations that affects all 60 shards, you will get this error message with a single bulk request.

If anyone wants to change this, you can see the splitting happening inside of org.elasticsearch.action.bulk.TransportBulkAction.executeBulk() near the comment "go over all the request and create a ShardId". The individual requests happen a few lines down around line 293 on version 1.2.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants