Warning Courier Fetch: #3221

akivaElkayam · 2015-03-01T08:12:04Z

recently I got the warning:
Courier Fetch: 17 of 100 shards failed.

what cause it and what can I do to fix it?

thanks Akiva

monotek · 2015-03-02T10:15:07Z

same here: "Courier Fetch: 5 of 129 shards failed."

elasticsearch.log says:

[2015-03-02 11:08:02,345][DEBUG][action.search.type ] [es1] [otrs-2015.03][0], node[cmGk6z9BQXyycYMURhmu8A], [R], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@3b6d8170] lastShard [true]
java.lang.ClassCastException
[2015-03-02 11:08:02,348][DEBUG][action.search.type ] [es1] [otrs-2015.03][3], node[snnf_KsrTfyU-P5njE6Uhw], [R], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@3b6d8170] lastShard [true]

how to fix failed shards?

monotek · 2015-03-02T10:43:42Z

update: just found out that this is maybe a problem of our move to another elasticsearch cluster. all old indexes had only 1 primary shard. now we are using the standard of 5 shards.

so the new automaticly created index "otrs-2015.03" is using 5 shards.

is this a problem for kibana?

monotek · 2015-03-02T11:00:55Z

just solved my problem.
the new es cluster was missing some templates.
after adding the templates, deleting the otrs-2015.03 index and recreating it evrything worked again...
if you want to check if you have the same problem, go to kibana settings an reload the field list of your index.

rashidkpc · 2015-03-02T14:54:25Z

This is not a Kibana error, but rather an elasticsearch issue. You have failing or otherwise unavailable primary shards.

monotek · 2015-03-02T16:04:27Z

Not completly true.

All of my shards were OK.
Only the template which sets the field type was missing.

I also saw the same error when elasticsearch.yml option "threadpool.search.queue_size" ist set to to low...

Maybe kibanas error message could be improved...

2xmax · 2015-03-06T15:19:04Z

+1 to @monotek statement
I had two different incidences with different templates and user set up the Index Pattern as *. This leads to the OP error.

e.g. if you only have two indexes, one index with one template, and other with the different one, and you setup "*" pattern in settings, then you will receive "Courier Fetch: 1 of 2 shards failed." warning. Yes, it is incorrect settings, but at least the message is misleading

2xmax · 2015-03-06T15:22:01Z

...and it is definitely not elasticsearch issue.

forzagreen · 2015-03-12T15:47:15Z

Even if the state of my cluster is green, it's still returning Courier Fetch: 17 of 37 shards failed

xo4n · 2015-03-17T09:29:16Z

and it doenst happen with kibana3

dpb587 · 2015-04-01T17:01:19Z

We ran into this as well. By using the developer tools in the browser to look at the response of the _msearch ajax request we were able to see the failure messages that elasticsearch was sending back. In our case, the queue capacity was misconfigured and too low.

.responses[1]._shards.failures[0]
{
  "index": "logstash-2015.03.07",
  "shard": 0,
  "status": 429,
  "reason": "RemoteTransportException[[esdata-1a/0][inet[/192.0.2.1:9300]][indices:data/read/search[phase/query]]]; nested: EsRejectedExecutionException[rejected execution (queue capacity 128) on org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler@5ce86c9c]; "
}

spuder · 2015-04-23T17:02:36Z

If you run into this problem, you can further troubleshoot as dp587 suggested with the chrome developer tools.

In my case, the shard 'logstash-2015.04.16' is failing on this error.

 [FIELDDATA] Data too large, data for [timeStamp] would be larger than limit of [2991430041/2.7gb]];

zitang · 2015-05-02T19:40:28Z

I've encountered the same issue, you can check elasticsearch log for details.
In my case, the search queue is too small, increasing it then the issue is solved.

Yzord · 2015-06-16T07:49:14Z

And what if i don't have threadpool.search.queue_size in my elasticsearch.yml? I can't find it, but i also have this shards failed error which has ruined my indexes :(

monotek · 2015-06-16T08:41:36Z

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

just, add it...

Regards

André Bauer

Am 16.06.2015 um 09:49 schrieb Yzord:

And what if i don't have threadpool.search.queue_size in my
elasticsearch.yml? I can't find it, but i also have this shards
failed error which has ruined my indexes :(

— Reply to this email directly or view it on GitHub
#3221 (comment)
.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJVf+E4AAoJEAdIc/zkolSroYwH/jYsW4e1VO6yJsdxi2KtMIxF
c13itm6nvOTxZXy6Qeg2f8BBsj4nbJyIDHwl4adGZn2+tcc/LMRRIzIawYyHLcYX
L/ybqsJlz6Ei6Q/k74/nxC7dhZjNEAq74WueghV8L/8Vjc9vYEZ+K2KOQq3thWdb
5oa8sayTxgN2JWKg3wX8eadV7EHi3z0yn9eASg3UqW0LIImdSx3obxbP1lfXAlih
SWD2/3bqk925sqQSwknmUVbDbkqFbhkQVWhMHwv3wiFivjqMffoIubeIXcNTotC7
4gIKGLLy6STWKJgS0Jikgto2bNxXMMsSHTLiImuaZ62JPzcnH4FD9luJsuQPYDY=
=c/Lh
-----END PGP SIGNATURE-----

Yzord · 2015-06-16T08:47:46Z

Do you know the right string to add? Is it

threadpool.search.queue_size: high

monotek · 2015-06-16T10:22:17Z

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

I use it like this:

Search pool

threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 10000

Regards

André Bauer

Am 16.06.2015 um 10:48 schrieb Yzord:

Do you know the right string to add? Is it

threadpool.search.queue_size: high

— Reply to this email directly or view it on GitHub
#3221 (comment)
.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJVf/jSAAoJEAdIc/zkolSrdakIALTG68Dy/c561hXpYtxAtCEq
vZwt7s2jr31y8jAW1zjchV4mLRqLKacs2ofBHtmBeQxCsloooCVOVwlyIM41YZ9k
7nDrLRnDpxUPGDFAEmaYhB+NqTr4ASV4VCfrDD0jfWPLQ6tCNluhC04Q9u55Hi9V
UPnOcr8jmUkOPUK786Ppps+6gto/FhYSCD+H8NqDxdWP9yHa/g4fjY3PaemkGblp
F9isCM5behxgAH3ZeAPmpSvwe9MWD0PSlNho9RRtElgGooGkITP/BLodK0JvxnOn
6D5GtY5E7ix7DEsd+LiE4Yv2PLWx7MIgPmzO1QWHL5EDl+LCt4WmHZxYdonOrQ8=
=ABPU
-----END PGP SIGNATURE-----

ChastinaLi · 2015-06-19T13:17:04Z

actually, doing threadpool.search.queue_size: 10000 fixes it. Changing search_size is probably not a good idea. And type defaults to fixed.

guayan2003 · 2015-08-20T03:23:36Z

Hi All, I hava the same problem here, I post the issue in Elasticsearch forum (https://discuss.elastic.co/t/metric-aggregations-how-to-divide-value/27630/1), can some one help?

Jason

phutchins · 2015-09-03T14:22:18Z

threadpool.search.queue_size: 10000 fixed it for me also. I'd love to know more about how to tune the queue_size parameter for search instead of arbitrarily using 10k if anyone has some insight into how to start there...

jasonngpt · 2015-09-08T02:47:43Z

ditto here.. adding the line below fixed this issue for me.

threadpool.search.queue_size: 10000

Mentioned here in the ES docs too. https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html

muddman · 2015-10-23T17:23:14Z

threadpool.search.queue_size:2000 did it for me. Thanks for the screenshot @spuder, the chrome debug messages were extremely helpful in figuring out what I needed to set the queue_size to. I was able to increment it slowly from 200 up to 2000 until it was resolved on my system without having to jump up to 10K and I confirmed the issue each time with the log message:

EsRejectedExecutionException[rejected execution (queue capacity 1000) on org.elasticsearch.search.action...

Inderjeet26 · 2015-11-24T19:30:50Z

+1 from me too threadpool.search.queue_size: 10000

robin13 · 2016-01-12T10:43:51Z

To clarify: this is neither a Kibana nor an Elasticsearch problem. The root cause of these errors is missing (or badly allocated) resources for Elasticsearch.
By increasing the search.queue_size you are in effect increasing the buffer of search queries which each node can hold before executing resulting in slower response times for all queries (bad) and harboring the danger of an OOME if the sum of the queued queries is too large. (e.g. 10k queries of 1MB each (yes, 10MB would be a massive query, but maybe that is why the queries are taking so long...) == 10GB of memory consumed just for the queue.
If you experience this issue, please do not just blindly increase the queue_size; investigate the root cause why a queue of 1000 is not enough and address this.

Some questions to start with:

How many shards is each query hitting (each shard queried == 1 thread)? Can this be reduced?
- Reduce total number of shards?
- Ensure the query hits less indices/shards by better time range filters?
How many cores do you have (number of search threads available == ( ( cores * 3 ) / 2 ) + 1? Can one search request be serviced with the threads available?

monotek · 2016-01-12T11:05:42Z

At the moment i have querys which hit 53 indexes (*5 shards = 265 shards).

So if 1 shard means 1 thread it would be good idea to go away from 5 shards default per index?

I startet with 1 or 2 core VMs but have now 2 Nodes with 4 Cores and 16 GB Ram.

So maybe i could go down from threadpool.search.queue_size: 10000 anyway.

robin13 · 2016-01-12T11:19:05Z

So if 1 shard means 1 thread it would be good idea to go away from 5 shards default per index?

Yes - but also to remain <30g per Shard

I startet with VMs but have 2 Nodes with 4 Cores and 16 GB Ram now.

Which means you have 2 * ( ( 4 * 3 ) / 2 ) + 1 = 14 search threads (over both nodes)
Your query which hits 190 shards would hence consume all 14 search threads, and push 176 into the queue. The 176 queries in the queue will be processed after the first 14 have been completed.

monotek · 2016-01-12T11:52:50Z

This would mean the default threadpool.search.queue_size: of 1000 should be enough, or not?

Just tried it an commented the setting out but got instantly the old errors in Kibana like:

"Courier Fetch: 55 of 265 shards failed."

I use threadpool.search.queue_size: 5000 now.
Does also work.
2000 was not enough.

robin13 · 2016-01-12T12:33:33Z

How many queries are being sent in parallel - if you have a dashboard with 4 visualizations, this will be (at least) 4 * 265 threads.

monotek · 2016-01-12T13:02:24Z

Ah. Good to know!
I have Dashboards with up to 35 visualisations...

So for longterm usage i will consider going back to 1 shard per index and maybe also reindex my old data to 1 index per year instead of 1 index per month.

CVTJNII · 2016-11-10T19:45:47Z

On which node should threadpool.search.queue_size be applied? Client nodes? Or is this queue on the data nodes?

Kulasangar · 2017-02-22T11:03:37Z

Where should this be applied ??

raulvc · 2017-10-17T15:52:41Z

I know this thread is closed but just for the record, I also got a misleading failed shard message that was actually a template error in a scripted field

crazyacking · 2018-01-27T12:37:32Z

coooool~

bobby259 · 2018-04-03T22:43:17Z

thread_pool.search.queue_size: 10000
Note the underscore.

devantoine · 2018-11-13T15:04:39Z

I had this issue because I already had an index with metricbeat before importing the Kibana dashboards.

After deleting all the dashboards, visualizations and indexes I've reimported the dashboards from metricbeat and then started it again to have an index.

rocketraman · 2019-09-18T16:53:16Z

I encountered this error message as well, on a green cluster . By using dev tools as per #3221 (comment), I saw the error message:

Failed to parse query [...]

This is certainly a poor error message given the underlying problem was just a query entry issue...

rashidkpc closed this as completed Mar 2, 2015

Ginja mentioned this issue May 27, 2015

Kibana 4 - Courier Fetch: X of 5 shards failed #3994

Closed

pjcard mentioned this issue Apr 18, 2018

Timelion search ignores time range when choosing indices #10475

Closed

huytquach-snyk mentioned this issue May 2, 2022

[Snyk] Security upgrade mongoose from 4.2.4 to 4.2.5 huytquach-snyk/kibana#8

Open

Warning Courier Fetch: #3221

Warning Courier Fetch: #3221

Comments

akivaElkayam commented Mar 1, 2015

monotek commented Mar 2, 2015

monotek commented Mar 2, 2015

monotek commented Mar 2, 2015

rashidkpc commented Mar 2, 2015

monotek commented Mar 2, 2015

2xmax commented Mar 6, 2015

2xmax commented Mar 6, 2015

forzagreen commented Mar 12, 2015

xo4n commented Mar 17, 2015

dpb587 commented Apr 1, 2015

spuder commented Apr 23, 2015

zitang commented May 2, 2015

Yzord commented Jun 16, 2015

monotek commented Jun 16, 2015

Yzord commented Jun 16, 2015

monotek commented Jun 16, 2015

Search pool

ChastinaLi commented Jun 19, 2015

guayan2003 commented Aug 20, 2015

phutchins commented Sep 3, 2015

jasonngpt commented Sep 8, 2015

muddman commented Oct 23, 2015

Inderjeet26 commented Nov 24, 2015

robin13 commented Jan 12, 2016

monotek commented Jan 12, 2016

robin13 commented Jan 12, 2016

monotek commented Jan 12, 2016

robin13 commented Jan 12, 2016

monotek commented Jan 12, 2016

CVTJNII commented Nov 10, 2016

Kulasangar commented Feb 22, 2017

raulvc commented Oct 17, 2017

crazyacking commented Jan 27, 2018

bobby259 commented Apr 3, 2018

devantoine commented Nov 13, 2018

rocketraman commented Sep 18, 2019