Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All Kibana requests with a client timeout should propagate these to elasticsearch #13342

Closed
markharwood opened this issue Aug 4, 2017 · 10 comments
Labels
bug Fixes for quality problems that affect the customer experience Feature:Search Querying infrastructure in Kibana Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@markharwood
Copy link
Contributor

Elasticsearch should not run indefinitely computing results for Kibana when Kibana already knows that no one will ever see them.

This situation arises because the elasticsearch default configuration will run search requests indefinitely but the Kibana client will give up waiting for responses after a known period of time.

The solution is for all Kibana requests with a known timeout to pass this value through in the timeout setting of the search api.

I imagine this requirement could hit several parts of the Kibana codebase so this may well become a meta issue that lists other detailed issues like #8544

@markharwood markharwood added the bug Fixes for quality problems that affect the customer experience label Aug 4, 2017
@jbudz jbudz added the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label Aug 7, 2017
@markharwood
Copy link
Contributor Author

Just leaving a note here for added impetus to fixing this.
I've recently seen a number of support issues with OOMs caused by killer Kibana queries where the elasticsearch circuit breaker has failed to trip. Circuit breaking is a hard problem, however my assumption is that these OOM situations take a while to happen as the JVM gradually descends into GC hell and this provides an opportunity for an early exit for a query based on search timeouts - but only if the client, i.e. Kibana passes them through in search requests.

I wouldn't be surprised if a high percentage of elasticsearch OOMs occur some time after the Kibana client has already chosen to give up waiting for the results. Kibana just needs to let us know what time limits it is using for the end user.

@jimgoodwin is adding this on the radar?

@epixa
Copy link
Contributor

epixa commented Oct 17, 2017

We can send timeout via query parameter as well, which will make it easier to append to proxy search requests on the server itself rather than passing the configuration up to the browser.

@markharwood
Copy link
Contributor Author

So I had a quick look at Kibana default settings. I have no idea to what extent they are honoured in the Kibana codebase but we have:

`elasticsearch.requestTimeout:`:: *Default: 30000* Time in milliseconds to wait for responses from the back end

and

`elasticsearch.shardTimeout:`:: *Default: 0* Time in milliseconds for Elasticsearch to wait for responses from shards. Set to 0 to disable.

I might be missing something in the meaning of these settings and how they map to elasticsearch calls but it is arguably not very useful to have:

  1. shardTimeout disabled if requestTimeout is enabled (means infinite es overruns)
  2. shardTimeout > requestTimeout if both enabled (means finite es overruns)
  3. shardTimeout < requestTimeout if both enabled (means partial results from es?)

Of these, option 3 is arguably the only vaguely justifiable scenario and even then only on something like the discover tab where you want to see some docs and care more about speed of response than looking at genuinely globallly top-ranked docs. Any aggs showing max or sum of something will have unbounded errors if using partial results so really not very useful.
Option 1 is the default config today and is really inexcusable.
Option 2 at least bounds the es computation but is still knowingly asking for results that will not be presented.

@dndlion
Copy link

dndlion commented Nov 24, 2017

Have also seen this issue, causes users to submit slow queries more than is helpful!

@epixa epixa added :Discovery and removed Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc labels Feb 15, 2018
@epixa
Copy link
Contributor

epixa commented Feb 15, 2018

I'm moving this over to the discovery because it really seem to be more of an issue with our search requests than with any sort of core platform capability. There are probably a bunch of layers to this issue, but we really should just start with a baseline that when we timeout a request through the kibana server, the same timeout should be applied on elasticsearch's end, which essentially just means we need to attach the timeout value to the request itself as well.

@epixa
Copy link
Contributor

epixa commented Feb 15, 2018

cc @Bargs

@markharwood
Copy link
Contributor Author

FYI - starting in 6.3 clients such as Kibana can choose if they want partial results or errors from elasticsearch in the event of timeouts or other scenarios which would produce partial results.

It's worth noting that in 7.0 we are considering the option of failing searches by default rather than returning any partial results.

@spalger
Copy link
Contributor

spalger commented Apr 4, 2018

Reopening until we come to an agreement around #17420 (comment)

@spalger spalger reopened this Apr 4, 2018
@Bargs Bargs added the Feature:Visualizations Generic visualization features (in case no more specific feature label is available) label Apr 5, 2018
@Bargs
Copy link
Contributor

Bargs commented Apr 5, 2018

Created some issues to track areas that don't use courier, and thus weren't fixed by #17420.

#17577
#17576
#17575

Probably not exhaustive, but it's a start.

@timroes timroes added Feature:Search Querying infrastructure in Kibana Team:Visualizations Visualization editors, elastic-charts and infrastructure and removed :Discovery Feature:Visualizations Generic visualization features (in case no more specific feature label is available) labels Sep 16, 2018
@timroes
Copy link
Contributor

timroes commented Mar 27, 2019

Most requests we know about now have the timeout correctly applied (Courier, TSVB, Timelion). I would close that meta issue and in case we're still discover any more requests, let's open individual issues for those.

@timroes timroes closed this as completed Mar 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Search Querying infrastructure in Kibana Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

No branches or pull requests

7 participants