Improve dashboard load performance #14750

stacey-gammon · 2017-11-03T14:57:21Z

Currently we send out a request for every embeddable on a dashboard in a single _msearch. This means one or two slow visualizations or saved searches can bog down an entire dashboard.

I'd like to explore ways to improve the performance and split up the requests.

One idea is to do a single _msearch for all requests but only request the hit count. Then make subsequent requests based off batching the individual requests on their hit count.

I'm not sure if hit count is the right metric to use though. It works in my sample cases where saved searches take the longest, and their hit count is 500, but it could be that on some datasets creating an aggregation over a long time span with a ton of data would only return a single hit count, yet take a long time to complete.

cc @pickypg - do you know if hit count correlates with query performance? Or am I off base? Maybe it's a combination of hit count plus index size (not sure if there is a way to get that information quickly).

Another idea thrown around was to use the scroll API to get data chunked by time, not visualization, and display the intermediate results. I'm not sure how useful this would be to people though. Would partial data be at all worthwhile to see while a slow query finishes loading, or would people find it more useful to see visualizations complete one at a time (or one group at a time), but with the full data.

somewhat related: #7215

cc @elastic/kibana-sharing

nreese · 2017-11-03T15:04:26Z

Why not as a simple first step, just separate saved searches into their own _msearch request?

stacey-gammon · 2017-11-03T15:28:56Z

Maybe, but I think that starts us down a path of making too many assumptions, both at what we expect courier to handle (assuming we do this in courier and not have dashboard drive it), what the embeddable types are, and how long they will take.

What if someone adds a new embeddable type that takes a long time too?

Though embeddables still aren't a first class concept, so maybe I'm still thinking too far into the future.

Nothing is simple with courier, and I'm nervous about throwing in more one off code that doesn't fully solve the problem. But, still worthwhile to explore despite my initial misgivings!

trevan · 2017-11-03T16:00:05Z

I'm pretty sure that hit count isn't the best metric, but I don't know what it would be. Looking at one of my dashboards, I have a visualization that took 224 ms and it has a hit count of ~400,000 (it is a big number metric). Another one took 2s and it has a hit count of ~15,000 (a data table with 3 aggregations).

One problem with a separate _msearch for each embeddable is that large dashboards will generate a lot of requests and I think you'll start to hit browser limits (I think FF is 6 and Chrome is 10). We have a very common dashboard that has 20 normal Kibana visualizations plus 5 TSVB.

A crazy idea would be to use one request for all of the visualizations (including TSVB) and then after the first load, figure out which visualizations were really slow, store that information to the dashboard somewhere, and in future requests, split those away. Kind of a self-learning dashboard.

stacey-gammon · 2017-11-03T16:27:52Z

Interesting. I wonder if aggregation type makes a difference.

Agree on the issue with a single separate search per embeddable - we'd need to chunk it up somehow in batches.

Definitely an interesting thought re: the self learning dashboard. I worry about that route getting complicated. e.g. you're on a slow network and your dashboard learns to do a single panel per batch (unless we could split out network latency time vs es response time...), how long would it take the dashboard to "unlearn" that and batch them up again when on a fast network. Or your es ends up getting bogged down during a busy part of the day with a lot of requests from various sources - does your dashboard learn quick enough to keep up with that or will it end up falling behind so when the traffic is busy, your still learning to make smaller batches, then by the time the traffic is low, you're dashboard needs to take time to learn to make bigger batches.

It sounds like a really interesting experiment, I just worry about maintainability, finding the right algorithm, and how long of an effort that would take. We do have machine learning experts at elastic, but if there was some other metric we could use, we might be able to improve with a simpler method.

IMO, the best scenario would be if es implemented streaming, then they would be in charge of figuring out how to batch up the returned responses, not us on the client, and we'd only have to send out a single request for all the data.

I wonder what would happen if we put the streaming logic on the kibana server side. The client handles streamed responses, and the server handles querying es. Feels like this would be faster going from Kibana server -> es server rather than kibana client -> es server... but I have no data to back that up. Maybe the Kibana server would end up being a bottleneck with multiple clients if we did it that way.

trevan · 2017-11-03T16:43:36Z

With timelion, tsvb, kibana visualizations, and other embeddables, you'd probably want to do the streaming logic on the Kibana server side.

alepuccetti · 2017-11-21T21:18:18Z

@trevan

One problem with a separate _msearch for each embeddable is that large dashboards will generate a lot of requests and I think you'll start to hit browser limits (I think FF is 6 and Chrome is 10). We have a very common dashboard that has 20 normal Kibana visualizations plus 5 TSVB.

What about run a maximum number of _msearch prioritizing the visualizations that are actually displayed on the screen? The challenge would be to handle dates that use now to keep consistency when the queued _msearch are fired.

About my personal experience, we have dashboards doing a lot of aggregations over hundreds of millions of documents and having Kibana responsiveness tied to the slowest one is not ideal. So having multiple _msearch even a queue of them would be nice. I would go even as far as having 1 request per visualization and resolved them in parallel prioritizing the ones on the screen at the moment.

trevan · 2017-11-21T21:27:52Z

@alepuccetti, we use really tiny visualizations to pack it in our screen. Of those 20+5 visualizations, 12 of them are visible. So we'd still hit the browser limit.

I kind of like the streaming idea, though it is a bigger change. If all requests for visualizations/embeddables were sent as one request to Kibana's backend and then have each of those requests sent individually to ES and stream back the results as they come in.

alepuccetti · 2017-11-21T21:45:02Z

So we'd still hit the browser limit.

Well, we could have multiple queries in one _msearch, at least would be better than one big request. Chose 6 requests (the minimum between FF and Chrome) will improve the responsiveness, even better detect the browser and tune the number of requests to use to split the queries. Also, this will add the value of easily evaluate which visualization is slower or at least will help narrow it down.

I am not sure to fully understand the streaming idea but it seems to require a bigger redesign. Using multiple _msearch would be a first step.

alepuccetti · 2017-12-12T14:35:25Z

I filed an issue about _msearch on the elasticsearch repo (elastic/elasticsearch#27775) which could mitigate the responsiveness problem. However, I am still convinced that kibana dashboards should be able to resolve each visualization separately.

alepuccetti · 2017-12-13T10:32:28Z

Update from the _msearch issue.

As was explained to me (elastic/elasticsearch#27775 (comment)) the real culprit is actually the preference parameter.
Is there any way to configure kibana to not use preferences when running _msearch?
Why was decided to use this configuration in the first place?

chrisdavies · 2018-04-09T13:41:26Z

This sounds a bit like the head-of-line blocking problem. We have a batch of independent requests being held up by the slowest request. It seems to me that there is already a standard solution to this problem: http/2.

If we had an http/2 endpoint, we could possibly write our clients in the same way we'd write if we weren't optimizing at all. No manual batching or msearch or anything like that. We'd make data requests as a bunch of independent AJAX calls. Under the hood, in supporting browsers, the http/2 protocol will ensure these get multiplexed. We'd also be able to process responses out of order, which means fast requests will no longer be held up by slow ones.

We should make sure the requests are made in the same order of visualizations, so the first visualizations on the screen should be the first ones to make a request. This is a fairly easy tweak, and should improve perceived time-to-first-visualization.

http/2 requires https. This means anyone using unsecured connections will fall back to vanilla http and will have a degraded experience.

Unfortunately, Elasticsearch doesn't support http/2 yet. Until they do, we have to come up with alternative solutions. It might be worth benchmarking the current approach and comparing it to an http/2 approach (routed through an http/2 compatible proxy).

chrisdavies · 2018-04-09T13:48:08Z

Talked to @stacey-gammon about this, and she suggested that we put some good instrumentation into Kibana so we can get actual stats on dashboard / visualization load times in the wild.

I think it would also be worth putting a handful of test scenarios together and doing some bench-marking:

Current msearch approach
Remove msearch, and proxy Elasticsearch access through an http/2-compatible proxy¹
Unbatched, un-proxied http (might as well measure it)
Separate saved searches into their own msearch request

¹ This can fall into a head-of-line blocking problem, too, though we should be able to mitigate it in various ways.

stacey-gammon · 2018-08-27T19:29:33Z

Chatted a bit with @epixa today... just want to jot down a note that we can't have a client side only solution if we want to support plugins that want to expose rest APIs.

If we have a client side solution that ships queries to a Kibana server side solution, we can use the same solution for both use cases (client side and rest APIs).

wylieconlon · 2020-09-04T14:45:52Z

@stacey-gammon I think a lot of the original issues were resolved, should this issue be updated with any remaining issues or closed?

stacey-gammon · 2020-09-08T15:24:50Z

I think it's safe to close this.

stacey-gammon added Feature:Dashboard Dashboard related features :Sharing release_note:enhancement labels Nov 3, 2017

stacey-gammon mentioned this issue Nov 20, 2017

Increase Kibana responsiveness #14655

Closed

alepuccetti mentioned this issue Nov 21, 2017

[Dashboards] Collapsible Panels #1547

Open

stacey-gammon mentioned this issue Nov 22, 2017

The slowness of one visualization on the dashboard can slow down the whole dashboard #11832

Closed

alepuccetti mentioned this issue Dec 13, 2017

Allow customisation of "preferences" for _msearch #15573

Closed

This was referenced Jun 25, 2018

Lazy load panels below the fold #3903

Closed

Disable Refresh while not in focus #1878

Closed

possible performance improvemnet : mutiple _field_stats requests from single dashboad with same timestamp and indexpattern kibana 4.5 #7048

Closed

timroes added Team:Visualizations Visualization editors, elastic-charts and infrastructure and removed :Sharing labels Sep 14, 2018

stacey-gammon mentioned this issue Oct 2, 2018

[discuss] Courier-replacement proposal #20364

Closed

stacey-gammon mentioned this issue Jan 23, 2019

[Meta] Common Data Access Services #29215

Closed

This was referenced Mar 27, 2019

[4.x] Kibana issuing multiple _msearches causing slow performance #8128

Closed

[Meta] Refactor Kibana querying infrastructure ("courier") #34022

Closed

liza-mae added the performance label Mar 5, 2020

wylieconlon mentioned this issue Sep 4, 2020

[Dashboard] Lazy load off-screen visualizations #76686

Closed

stacey-gammon closed this as completed Sep 8, 2020

clintandrewhall mentioned this issue May 15, 2021

[dashboard][labs] Defer loading panels below the fold #99880

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve dashboard load performance #14750

Improve dashboard load performance #14750

stacey-gammon commented Nov 3, 2017 •

edited

Loading

nreese commented Nov 3, 2017

stacey-gammon commented Nov 3, 2017

trevan commented Nov 3, 2017

stacey-gammon commented Nov 3, 2017

trevan commented Nov 3, 2017

alepuccetti commented Nov 21, 2017

trevan commented Nov 21, 2017

alepuccetti commented Nov 21, 2017

alepuccetti commented Dec 12, 2017

alepuccetti commented Dec 13, 2017

chrisdavies commented Apr 9, 2018

chrisdavies commented Apr 9, 2018

stacey-gammon commented Aug 27, 2018

wylieconlon commented Sep 4, 2020

stacey-gammon commented Sep 8, 2020

Improve dashboard load performance #14750

Improve dashboard load performance #14750

Comments

stacey-gammon commented Nov 3, 2017 • edited Loading

nreese commented Nov 3, 2017

stacey-gammon commented Nov 3, 2017

trevan commented Nov 3, 2017

stacey-gammon commented Nov 3, 2017

trevan commented Nov 3, 2017

alepuccetti commented Nov 21, 2017

trevan commented Nov 21, 2017

alepuccetti commented Nov 21, 2017

alepuccetti commented Dec 12, 2017

alepuccetti commented Dec 13, 2017

chrisdavies commented Apr 9, 2018

chrisdavies commented Apr 9, 2018

stacey-gammon commented Aug 27, 2018

wylieconlon commented Sep 4, 2020

stacey-gammon commented Sep 8, 2020

stacey-gammon commented Nov 3, 2017 •

edited

Loading