[Meta] Kibana platform performance #63848

mshustov · 2020-04-17T13:53:11Z

Introducing the Kibana platform changed the way Kibana applications are built, loaded and run.
We didn't gather performance metrics before & during migration and found ourselves in the position when our customers already started experiencing degraded performance for page load time.
To improve the current situation we can split our work into different categories:

Page loading time

All the plugins are built as separate packages in the Kibana platform. It increased both the size of each bundle downloaded at startup and the number of simultaneous concurrent requests to the server.
Sub-tasks:

To prevent problems on an early stage in the future we are going to start tracking performance metrics during development(CI metric report)
I'm wondering if we can collaborate with Elastic Cloud / Telemetry / Pulse teams on creating a centralized performance dashboard for production load time metrics.

Runtime performance

This falls into 2 sub-categories:

Memory

Kibana platform was created in mind with supporting SPA mode for Kibana. It means that time of life for the Kibana app is much higher as a page is reloaded less frequently. This puts increased demands on memory leak control. Kibana must remain operable when one application is running for a long time and when the user switches between several applications. We should automate such a check on CI. @elastic/kibana-qa have you got a setup for such type of testing? I saw some dashboards for similar metrics in #59454

CPU

That's tricky and might require setting up APM for Kibana.

Sub-tasks:

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-04-17T13:53:13Z

Pinging @elastic/kibana-operations (Team:Operations)

elasticmachine · 2020-04-17T13:53:14Z

Pinging @elastic/kibana-platform (Team:Platform)

LeeDr · 2020-04-17T14:32:06Z

I think we need at least 2 types of performance tests for Kibana.

UI performance How long does it take for the first page to load? How long does it take to switch between apps? Hong long for this dashboard to load.
The easiest way to get started on this, is just to track the duration of all our existing UI tests. @stacey-gammon did it here Create functional test suite for performance benchmarking #54626. And @wayneseymour has been looking into adding that to the code coverage job. Different tests do many different things, so in themselves they may not seem like meaningful metrics. But they could at least show changes from one build or release to another. There are some efforts going on towards this right now. Gather it from Kibana CI builds by @brianseeders here [FTR] Add test suite metrics tracking/output #62515.
We could very easily enhance the existing UI tests with tests designed specifically to measure initial page load time, app switching, etc. Anything we decide we need.

We should also use either Monitoring, Metricbeat, or both to gather stats on both the Kibana server and the browser memory while tests are running to look for problems. @marius-dr gained some experience doing this while investigating Firefox memory usage.

Load tests Kibana has a growing set of APIs for saved objects, user/role management, batch reindexing, task manager, etc. We need a test framework where all these can be tested vigorously simulating multiple users and high loads. We might use tools like jMeter, Gatlin, Apache AB, Horde, etc. for this. @dmlemeshko is just getting started looking into these tools.
APM Yes. Getting Kibana instrumented for APM might be the hardest part, but might also have a big payback. I don't know of any current efforts on this.

TinaHeiligers · 2020-04-17T14:33:45Z

@restrry The APM team is more appropriate for advice on setting up APM for Kibana. AFAIK, there's already an option to do so.

afharo · 2020-04-17T14:56:43Z

I think we should have APM running across our Kibana Platform for our Perf tests. We can learn great points of improvements there.

That said, would it make sense to have a Pulse channel with a minimum set of performance stats? We can't (and don't want to) go as deep as APM in the analysis, but something that goes into telemetry to let us understand the best and recommended hardware based on our users' experiences? i.e.: We can learn that users with hardware X perceived an improved behaviour over those ones running on hardware Y or provider Z.

mshustov · 2020-04-20T12:21:40Z

We could very easily enhance the existing UI tests with tests designed specifically to measure initial page load time, app switching, etc. Anything we decide we need.

@LeeDr We are working on #62263 and interested in time between a request initiated and page with an app rendered.
Is it already technically possible to collect that information for tests?
Would it be possible to add a test suite / script / whatever for developers to run locally to measure how their changes impacted loading performance?

LeeDr · 2020-04-20T15:01:08Z

@restrry it's possible to measure that with some caveats.

With our current plan, we were just going to collect the duration of functional UI tests. Most of the tests do more than just open an app and wait for it to render so those times will include the app loading but also other steps. But we certainly could make a set of "performance" tests that just wait for the app to load.
We need to understand how we "know" that an app has finished rendering. We can wait for a loading indicator to be hidden, and/or for one or more elements to appear on a page within the app. It would be nice to know when the last element of an app has loaded. But we don't want to use a technique that will be flaky or a maintenance issue.
These measurements will only be accurate within about 1/2 second since that's how long we typically wait between retries. Even WebDriver uses a polling mechanism when looking for elements with some small delay between attempts. But I think loading an app is going to be several seconds so this level of accuracy may be OK. We may see significant differences when running tests locally vs on Jenkins. And even on Jenkins we may see significant variability between runs of the same build.

mshustov · 2020-04-20T16:19:04Z

But we certainly could make a set of "performance" tests that just wait for the app to load.

We can write a specific test-case loading only a pre-defined page x times to minimize the accidental impact of different applications and external environments. How do we push data into external storage for future analysis on kibana-stats?

We need to understand how we "know" that an app has finished rendering. We can wait for a loading indicator to be hidden, and/or for one or more elements to appear on a page within the app

I believe we can consider loading indicator to be hidden as a proper signal that all the resources were loaded and parsed. After this moment it's up to app logic to perform some background requests to load data. So I'd say it's application-specific, and we shouldn't take them into account as part of the current task.

These measurements will only be accurate within about 1/2 second

That's not really good. Is it okay if we make retry delay configurable? @dmlemeshko

And even on Jenkins we may see significant variability between runs of the same build.

Yes, that will require additional work (if even possible) to run the test on the same hardware in an isolated env. Not sure we have time to do it properly right now. IMO, running tests on Jenkins several times is the acceptable solution at the moment.

mshustov · 2020-07-15T06:22:47Z

@LeeDr I remember that some aspects of the perf testing have been discussed on GAH. From the summary email:

- server load testing for concurrent requests, concurrent users ( coming )
- endurance testing for browser memory leaks ( coming )

However, I don't see any issues linked to QA team roadmap https://github.com/elastic/kibana-team/issues/103 Would you mind adding the issues to the roadmap and the perf meta issue?

LeeDr · 2020-07-16T21:55:04Z

@dmlemeshko is working on the first one but still comparing a couple of different tools to find what will work best long-term for Kibana. We'll get an issue created soon to document the plan and update the status.

server load testing for concurrent requests, concurrent users ( coming )

We haven't started anything on this one yet (Memory endurance testing (Marius);

endurance testing for browser memory leaks ( coming )

mshustov · 2020-07-19T12:02:41Z

Since the 7.7 release and onward, the Reporting plugin has increasingly had a harder time completing reports in reasonable amount of time, especially on machines with busy CPU or low RAM resources.
We added documentation to inform the reader that 1GB is not enough RAM for the Kibana instance to work with Reporting.
One of the biggest factors that still lead to slow report generation time is:
Bundle sizes increasing release by release as more features are added. Different App teams need to help here by moving more UI code to be lazy loaded on-demand, instead of loading everything up-front.

from #71753

marius-dr · 2021-03-22T15:17:11Z

We haven't started anything on this one yet (Memory endurance testing (Marius);

endurance testing for browser memory leaks ( coming )

I have some plans for this, will start joining the performance group sync every time.
I've been normally running it over the weekends on my desktop PC, mainly on BC builds for 7.x versions (not for minors). What I didn't manage to figure out is how to keep the tests relevant with new features but also comparable with each other. Initial thoughts would be to create suites of tests that cover scenarios/"user stories" and go from there. I'll put some updates in the kibana-qa issue for it.

lizozom · 2021-10-14T14:18:53Z

@mshustov I think this and this should address measuring page load time.

tylersmalley · 2021-11-05T18:58:38Z

@suchcodemuchwow is also working to capture page load time currently in an isolated environment simulating a real-world user.

lizozom · 2022-07-20T14:44:05Z

Closing for now, lets reopen if needed

mshustov added discuss Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:Operations Team label for Operations Team Meta performance labels Apr 17, 2020

archon810 mentioned this issue May 29, 2020

Kibana Discover high CPU usage, slow and laggy UI #67732

Closed

mshustov mentioned this issue Jun 9, 2020

Optimize saved objects getScopedClient and HTTP API #68221

Merged

7 tasks

mshustov mentioned this issue Jul 7, 2020

Kibana Developer Experience #70733

Open

41 tasks

jinmu03 added this to Backlog in Kibana developer experience [NOTICE TO CLOSE] Jul 8, 2020

jinmu03 moved this from Backlog to In Progress in Kibana developer experience [NOTICE TO CLOSE] Jul 21, 2020

mshustov moved this from In Progress to Backlog in Kibana developer experience [NOTICE TO CLOSE] Sep 12, 2020

peterschretlen mentioned this issue Dec 22, 2020

Kibana performance - tools, benchmarking, CI, optimizations #86833

Closed

9 tasks

This was referenced Jun 28, 2021

[Metrics.Ops] log event loop delays #103478

Closed

[Telemetry] track and warn event loop delays thresholds #103615

Merged

jbudz mentioned this issue Sep 15, 2021

Many js file being downloaded while loading dashbaord #111087

Closed

tylersmalley added 1 and removed 1 labels Oct 11, 2021

tylersmalley added the EnableJiraSync label Oct 14, 2021

lizozom changed the title ~~Kibana platform performance~~ [Meta] Kibana platform performance Nov 10, 2021

exalate-issue-sync bot added impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. loe:small Small Level of Effort labels Feb 16, 2022

tylersmalley removed loe:small Small Level of Effort impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. EnableJiraSync labels Mar 16, 2022

lizozom closed this as completed Jul 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Meta] Kibana platform performance #63848

[Meta] Kibana platform performance #63848

mshustov commented Apr 17, 2020 •

edited by lizozom

Loading

elasticmachine commented Apr 17, 2020

elasticmachine commented Apr 17, 2020

LeeDr commented Apr 17, 2020

TinaHeiligers commented Apr 17, 2020

afharo commented Apr 17, 2020

mshustov commented Apr 20, 2020

LeeDr commented Apr 20, 2020

mshustov commented Apr 20, 2020

mshustov commented Jul 15, 2020

LeeDr commented Jul 16, 2020

mshustov commented Jul 19, 2020 •

edited by jinmu03

Loading

marius-dr commented Mar 22, 2021

lizozom commented Oct 14, 2021

tylersmalley commented Nov 5, 2021

lizozom commented Jul 20, 2022

[Meta] Kibana platform performance #63848

[Meta] Kibana platform performance #63848

Comments

mshustov commented Apr 17, 2020 • edited by lizozom Loading

Page loading time

Runtime performance

Memory

CPU

elasticmachine commented Apr 17, 2020

elasticmachine commented Apr 17, 2020

LeeDr commented Apr 17, 2020

TinaHeiligers commented Apr 17, 2020

afharo commented Apr 17, 2020

mshustov commented Apr 20, 2020

LeeDr commented Apr 20, 2020

mshustov commented Apr 20, 2020

mshustov commented Jul 15, 2020

LeeDr commented Jul 16, 2020

mshustov commented Jul 19, 2020 • edited by jinmu03 Loading

marius-dr commented Mar 22, 2021

lizozom commented Oct 14, 2021

tylersmalley commented Nov 5, 2021

lizozom commented Jul 20, 2022

mshustov commented Apr 17, 2020 •

edited by lizozom

Loading

mshustov commented Jul 19, 2020 •

edited by jinmu03

Loading