Writetimeout is too short for index-pattern creation in Kibana #69

eldis80 · 2020-12-29T12:49:50Z

Describe the bug
Index-pattern creation fails in Kibana because no indices are listed with the default query

Environment

OpenShift 4.5.15
CLO version 4.5.0-202012120433.p0

Logs
Couldn't find much info from logs

Expected behavior
List of available indices when creating an index-pattern in Kibana

Actual behavior
When creating index-pattern in Kibana it queries the indices with this kind of POST:
URL: https://kibana-openshift-logging.apps./elasticsearch/*/_search?ignore_unavailable=true
Payload: {"size":0,"aggs":{"indices":{"terms":{"field":"_index","size":200}}}}

After a while a toast pops up saying Kibana was unable to fetch indices.

Same query using Kibana's Dev Tools gives:
{
"message": "Client request error: socket hang up",
"statusCode": 502,
"error": "Bad Gateway"
}

To Reproduce
Steps to reproduce the behavior:

Create an Elasticsearch cluster with enough docs.
Try to create index-pattern in Kibana
No indices are returned and can't create index-pattern

Additional context
I believe this happens because the query goes through elasticsearch-proxy and there was WriteTimeout of 5 seconds introduced in #57 . This WriteTimeout basically closes the connection if the response takes more than 5 seconds.

We have so many docs and shards because we have set the application logs retention to 30 days. Other logs (infra and audit) have retention for 7 days.

Beginning of response when same query is run from within ES pod using es_util tool tells that our query takes 8 seconds:
{
"took" : 8072,
"timed_out" : false,
"_shards" : {
"total" : 223,
"successful" : 223,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1213667064,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"indices" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "app-000050",
"doc_count" : 58109669
},
{
"key" : "app-000053",
"doc_count" : 41653740

jcantrill · 2020-12-29T20:58:55Z

This may be a side effect of another issue we discovered where memory is being used because of a query across all indices to get meta data. Maybe your cluster needs more memory. Please consider posting the results of running a must gather. Instructions can be found at openshift/cluster-logging-operator

eldis80 · 2020-12-30T07:49:11Z

I already increased the memory for elasticsearch-proxy as I saw those other issues but that didn't help. And even before we hadn't run in to OOM situations. As I understand from Go's http documentation, with TLS enabled the WriteTimeout is the time from request headers until response is totally written. And in our case the response from ES itself takes ~8 seconds so the elasticsearch-proxy has already closed the connection. I've tested this quite much with the Kibana's DevTools and any query taking longer than 5 seconds fails.

In any case, I think the WriteTimeout shouldn't be less than what is configured to Kibana's elasticsearch.requestTimeout (with CLO it's 300000ms). Otherwise the requests from Kibana going through elasticsearch-proxy will be disconnected after WriteTimeout.

ps. We are providing the must-gather information through a support ticket shortly.

eldis80 · 2021-01-07T10:01:40Z

Could you comment on the reasoning why the elasticsearch-proxy's Go http.Server.WriteTimeout is set to 5 seconds when the requestTimeout in Kibana is set to 300 seconds?

eldis80 · 2021-01-07T14:08:56Z

I have now tested this by compiling a new version of this elasticsearch-proxy where the http.Server.WriteTimeout in http.go is set to 600 seconds and instead of getting that "socket hang up" error like described in the bug description I'm able to get this (in Kibana Dev Tools tab):
{
"took" : 15495,
"timed_out" : false,
"_shards" : {
"total" : 235,
"successful" : 235,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1134384077,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"indices" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,

This means that the queries that take more than 5 seconds are successful now from Kibana to ES. Previously, anything that took longer than 5 seconds was unsuccessful as the connection was closed by elasticsearch-proxy.

jcantrill · 2021-02-08T20:50:35Z

Closing fixed by #73 which bumps it to a minute. Additionally allows overriding the default configuration

eldis80 · 2021-02-10T16:39:57Z

Great. I still don't understand why you would have shorter timeout at the proxy than what is configured in Kibana as the ElasticSearch queryTimeout. As I understand it, all queries from Kibana go through this elasticsearch-proxy in your implementation of ClusterLogging.

eldis80 · 2021-02-10T16:44:43Z

And is the fix also coming to 4.5 or 4.6 releases? I can only see it in master and 4.7.

jcantrill · 2021-02-12T21:54:34Z

Great. I still don't understand why you would have shorter timeout at the proxy than what is configured in Kibana as the ElasticSearch queryTimeout. As I understand it, all queries from Kibana go through this elasticsearch-proxy in your implementation of ClusterLogging.

The timeout was modified to address a memory issue when FIPS was enabled. I did not modify to the current value to understand what motivation was for picking that value. Regardless, I'm certain it was chosen from the perspective of the heavy write traffic from the collector and not read from Kibana. This change will be cherry-picked back to 4.5 and is awating verification from QE

jcantrill closed this as completed Feb 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Writetimeout is too short for index-pattern creation in Kibana #69

Writetimeout is too short for index-pattern creation in Kibana #69

eldis80 commented Dec 29, 2020

jcantrill commented Dec 29, 2020

eldis80 commented Dec 30, 2020 •

edited

Loading

eldis80 commented Jan 7, 2021

eldis80 commented Jan 7, 2021 •

edited

Loading

jcantrill commented Feb 8, 2021

eldis80 commented Feb 10, 2021

eldis80 commented Feb 10, 2021

jcantrill commented Feb 12, 2021

Writetimeout is too short for index-pattern creation in Kibana #69

Writetimeout is too short for index-pattern creation in Kibana #69

Comments

eldis80 commented Dec 29, 2020

jcantrill commented Dec 29, 2020

eldis80 commented Dec 30, 2020 • edited Loading

eldis80 commented Jan 7, 2021

eldis80 commented Jan 7, 2021 • edited Loading

jcantrill commented Feb 8, 2021

eldis80 commented Feb 10, 2021

eldis80 commented Feb 10, 2021

jcantrill commented Feb 12, 2021

eldis80 commented Dec 30, 2020 •

edited

Loading

eldis80 commented Jan 7, 2021 •

edited

Loading