Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning Courier Fetch: #3221

Closed
akivaElkayam opened this issue Mar 1, 2015 · 35 comments · May be fixed by huytquach-snyk/kibana#8
Closed

Warning Courier Fetch: #3221

akivaElkayam opened this issue Mar 1, 2015 · 35 comments · May be fixed by huytquach-snyk/kibana#8

Comments

@akivaElkayam
Copy link

recently I got the warning:
Courier Fetch: 17 of 100 shards failed.

what cause it and what can I do to fix it?

thanks Akiva

@monotek
Copy link

monotek commented Mar 2, 2015

same here: "Courier Fetch: 5 of 129 shards failed."

elasticsearch.log says:

[2015-03-02 11:08:02,345][DEBUG][action.search.type ] [es1] [otrs-2015.03][0], node[cmGk6z9BQXyycYMURhmu8A], [R], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@3b6d8170] lastShard [true]
java.lang.ClassCastException
[2015-03-02 11:08:02,348][DEBUG][action.search.type ] [es1] [otrs-2015.03][3], node[snnf_KsrTfyU-P5njE6Uhw], [R], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@3b6d8170] lastShard [true]

how to fix failed shards?

@monotek
Copy link

monotek commented Mar 2, 2015

update: just found out that this is maybe a problem of our move to another elasticsearch cluster. all old indexes had only 1 primary shard. now we are using the standard of 5 shards.

so the new automaticly created index "otrs-2015.03" is using 5 shards.

is this a problem for kibana?

@monotek
Copy link

monotek commented Mar 2, 2015

just solved my problem.
the new es cluster was missing some templates.
after adding the templates, deleting the otrs-2015.03 index and recreating it evrything worked again...
if you want to check if you have the same problem, go to kibana settings an reload the field list of your index.

@rashidkpc
Copy link
Contributor

This is not a Kibana error, but rather an elasticsearch issue. You have failing or otherwise unavailable primary shards.

@monotek
Copy link

monotek commented Mar 2, 2015

Not completly true.

All of my shards were OK.
Only the template which sets the field type was missing.

I also saw the same error when elasticsearch.yml option "threadpool.search.queue_size" ist set to to low...

Maybe kibanas error message could be improved...

@2xmax
Copy link

2xmax commented Mar 6, 2015

+1 to @monotek statement
I had two different incidences with different templates and user set up the Index Pattern as *. This leads to the OP error.

e.g. if you only have two indexes, one index with one template, and other with the different one, and you setup "*" pattern in settings, then you will receive "Courier Fetch: 1 of 2 shards failed." warning. Yes, it is incorrect settings, but at least the message is misleading

@2xmax
Copy link

2xmax commented Mar 6, 2015

...and it is definitely not elasticsearch issue.

@forzagreen
Copy link

Even if the state of my cluster is green, it's still returning Courier Fetch: 17 of 37 shards failed

@xo4n
Copy link

xo4n commented Mar 17, 2015

and it doenst happen with kibana3

@dpb587
Copy link

dpb587 commented Apr 1, 2015

We ran into this as well. By using the developer tools in the browser to look at the response of the _msearch ajax request we were able to see the failure messages that elasticsearch was sending back. In our case, the queue capacity was misconfigured and too low.

.responses[1]._shards.failures[0]
{
  "index": "logstash-2015.03.07",
  "shard": 0,
  "status": 429,
  "reason": "RemoteTransportException[[esdata-1a/0][inet[/192.0.2.1:9300]][indices:data/read/search[phase/query]]]; nested: EsRejectedExecutionException[rejected execution (queue capacity 128) on org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler@5ce86c9c]; "
}

@spuder
Copy link

spuder commented Apr 23, 2015

If you run into this problem, you can further troubleshoot as dp587 suggested with the chrome developer tools.

In my case, the shard 'logstash-2015.04.16' is failing on this error.

 [FIELDDATA] Data too large, data for [timeStamp] would be larger than limit of [2991430041/2.7gb]];

screenshot 2015-04-23 10 57 33

@zitang
Copy link

zitang commented May 2, 2015

I've encountered the same issue, you can check elasticsearch log for details.
In my case, the search queue is too small, increasing it then the issue is solved.

@Yzord
Copy link

Yzord commented Jun 16, 2015

And what if i don't have threadpool.search.queue_size in my elasticsearch.yml? I can't find it, but i also have this shards failed error which has ruined my indexes :(

@monotek
Copy link

monotek commented Jun 16, 2015

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

just, add it...

Regards

André Bauer

Am 16.06.2015 um 09:49 schrieb Yzord:

And what if i don't have threadpool.search.queue_size in my
elasticsearch.yml? I can't find it, but i also have this shards
failed error which has ruined my indexes :(

— Reply to this email directly or view it on GitHub
#3221 (comment)
.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJVf+E4AAoJEAdIc/zkolSroYwH/jYsW4e1VO6yJsdxi2KtMIxF
c13itm6nvOTxZXy6Qeg2f8BBsj4nbJyIDHwl4adGZn2+tcc/LMRRIzIawYyHLcYX
L/ybqsJlz6Ei6Q/k74/nxC7dhZjNEAq74WueghV8L/8Vjc9vYEZ+K2KOQq3thWdb
5oa8sayTxgN2JWKg3wX8eadV7EHi3z0yn9eASg3UqW0LIImdSx3obxbP1lfXAlih
SWD2/3bqk925sqQSwknmUVbDbkqFbhkQVWhMHwv3wiFivjqMffoIubeIXcNTotC7
4gIKGLLy6STWKJgS0Jikgto2bNxXMMsSHTLiImuaZ62JPzcnH4FD9luJsuQPYDY=
=c/Lh
-----END PGP SIGNATURE-----

@Yzord
Copy link

Yzord commented Jun 16, 2015

Do you know the right string to add? Is it

threadpool.search.queue_size: high

@monotek
Copy link

monotek commented Jun 16, 2015

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

I use it like this:

Search pool

threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 10000

Regards

André Bauer

Am 16.06.2015 um 10:48 schrieb Yzord:

Do you know the right string to add? Is it

threadpool.search.queue_size: high

— Reply to this email directly or view it on GitHub
#3221 (comment)
.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJVf/jSAAoJEAdIc/zkolSrdakIALTG68Dy/c561hXpYtxAtCEq
vZwt7s2jr31y8jAW1zjchV4mLRqLKacs2ofBHtmBeQxCsloooCVOVwlyIM41YZ9k
7nDrLRnDpxUPGDFAEmaYhB+NqTr4ASV4VCfrDD0jfWPLQ6tCNluhC04Q9u55Hi9V
UPnOcr8jmUkOPUK786Ppps+6gto/FhYSCD+H8NqDxdWP9yHa/g4fjY3PaemkGblp
F9isCM5behxgAH3ZeAPmpSvwe9MWD0PSlNho9RRtElgGooGkITP/BLodK0JvxnOn
6D5GtY5E7ix7DEsd+LiE4Yv2PLWx7MIgPmzO1QWHL5EDl+LCt4WmHZxYdonOrQ8=
=ABPU
-----END PGP SIGNATURE-----

@ChastinaLi
Copy link

actually, doing threadpool.search.queue_size: 10000 fixes it. Changing search_size is probably not a good idea. And type defaults to fixed.

@guayan2003
Copy link

Hi All, I hava the same problem here, I post the issue in Elasticsearch forum (https://discuss.elastic.co/t/metric-aggregations-how-to-divide-value/27630/1), can some one help?

Jason

@phutchins
Copy link

threadpool.search.queue_size: 10000 fixed it for me also. I'd love to know more about how to tune the queue_size parameter for search instead of arbitrarily using 10k if anyone has some insight into how to start there...

@jasonngpt
Copy link

ditto here.. adding the line below fixed this issue for me.

threadpool.search.queue_size: 10000

Mentioned here in the ES docs too. https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html

@muddman
Copy link

muddman commented Oct 23, 2015

threadpool.search.queue_size:2000 did it for me. Thanks for the screenshot @spuder, the chrome debug messages were extremely helpful in figuring out what I needed to set the queue_size to. I was able to increment it slowly from 200 up to 2000 until it was resolved on my system without having to jump up to 10K and I confirmed the issue each time with the log message:

EsRejectedExecutionException[rejected execution (queue capacity 1000) on org.elasticsearch.search.action...

kibana4-es-error-queue_size

@Inderjeet26
Copy link

+1 from me too threadpool.search.queue_size: 10000

@robin13
Copy link
Contributor

robin13 commented Jan 12, 2016

To clarify: this is neither a Kibana nor an Elasticsearch problem. The root cause of these errors is missing (or badly allocated) resources for Elasticsearch.
By increasing the search.queue_size you are in effect increasing the buffer of search queries which each node can hold before executing resulting in slower response times for all queries (bad) and harboring the danger of an OOME if the sum of the queued queries is too large. (e.g. 10k queries of 1MB each (yes, 10MB would be a massive query, but maybe that is why the queries are taking so long...) == 10GB of memory consumed just for the queue.
If you experience this issue, please do not just blindly increase the queue_size; investigate the root cause why a queue of 1000 is not enough and address this.

Some questions to start with:

  • How many shards is each query hitting (each shard queried == 1 thread)? Can this be reduced?
    • Reduce total number of shards?
    • Ensure the query hits less indices/shards by better time range filters?
  • How many cores do you have (number of search threads available == ( ( cores * 3 ) / 2 ) + 1? Can one search request be serviced with the threads available?

@monotek
Copy link

monotek commented Jan 12, 2016

At the moment i have querys which hit 53 indexes (*5 shards = 265 shards).

So if 1 shard means 1 thread it would be good idea to go away from 5 shards default per index?

I startet with 1 or 2 core VMs but have now 2 Nodes with 4 Cores and 16 GB Ram.

So maybe i could go down from threadpool.search.queue_size: 10000 anyway.

@robin13
Copy link
Contributor

robin13 commented Jan 12, 2016

So if 1 shard means 1 thread it would be good idea to go away from 5 shards default per index?

Yes - but also to remain <30g per Shard

I startet with VMs but have 2 Nodes with 4 Cores and 16 GB Ram now.

Which means you have 2 * ( ( 4 * 3 ) / 2 ) + 1 = 14 search threads (over both nodes)
Your query which hits 190 shards would hence consume all 14 search threads, and push 176 into the queue. The 176 queries in the queue will be processed after the first 14 have been completed.

@monotek
Copy link

monotek commented Jan 12, 2016

This would mean the default threadpool.search.queue_size: of 1000 should be enough, or not?

Just tried it an commented the setting out but got instantly the old errors in Kibana like:

"Courier Fetch: 55 of 265 shards failed."

I use threadpool.search.queue_size: 5000 now.
Does also work.
2000 was not enough.

@robin13
Copy link
Contributor

robin13 commented Jan 12, 2016

How many queries are being sent in parallel - if you have a dashboard with 4 visualizations, this will be (at least) 4 * 265 threads.

@monotek
Copy link

monotek commented Jan 12, 2016

Ah. Good to know!
I have Dashboards with up to 35 visualisations...

So for longterm usage i will consider going back to 1 shard per index and maybe also reindex my old data to 1 index per year instead of 1 index per month.

@CVTJNII
Copy link

CVTJNII commented Nov 10, 2016

On which node should threadpool.search.queue_size be applied? Client nodes? Or is this queue on the data nodes?

@Kulasangar
Copy link

Where should this be applied ??

@raulvc
Copy link

raulvc commented Oct 17, 2017

I know this thread is closed but just for the record, I also got a misleading failed shard message that was actually a template error in a scripted field

@crazyacking
Copy link

coooool~

@bobby259
Copy link

bobby259 commented Apr 3, 2018

thread_pool.search.queue_size: 10000
Note the underscore.

@devantoine
Copy link

I had this issue because I already had an index with metricbeat before importing the Kibana dashboards.

After deleting all the dashboards, visualizations and indexes I've reimported the dashboards from metricbeat and then started it again to have an index.

@rocketraman
Copy link

I encountered this error message as well, on a green cluster . By using dev tools as per #3221 (comment), I saw the error message:

Failed to parse query [...]

This is certainly a poor error message given the underlying problem was just a query entry issue...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.