Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Canvas REST calls via batching #29847

Merged
merged 29 commits into from
Feb 6, 2019

Conversation

chrisdavies
Copy link
Contributor

@chrisdavies chrisdavies commented Feb 1, 2019

And adds batching to Canvas' remote function calls.

This builds on this PR: #29792 so reviewers, you may want to hold off on reviewing this until 29792 is merged.

@chrisdavies chrisdavies requested a review from a team as a code owner February 1, 2019 17:04
@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@chrisdavies chrisdavies changed the title [WIP] Optimize Canvas REST calls via batching Optimize Canvas REST calls via batching Feb 1, 2019
@chrisdavies chrisdavies added v7.0.0 Team:Presentation Presentation Team for Dashboard, Input Controls, and Canvas labels Feb 1, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-canvas

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@elasticmachine
Copy link
Contributor

💔 Build Failed

@chrisdavies
Copy link
Contributor Author

retest

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Copy link
Contributor

@monfera monfera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work Chris!!!

To lock in performance, it'd be good to precede this PR with a standalone test-only PR that attempts to measure performance on some expression chains, eg. one straightforward chain; one chain with back & forth communication. Best to instantiate a good number of them eg. a few dozen of each, to simulate a workbook that has several dozens of elements accessing data (it adds up quickly, 5 data driven elements on 5 pages is already 25 expressions). This way the effects of batching can be visible.

The reason for suggesting that it be a preceding PR rather than a commit in this PR is that it can then be used to measure the earlier non-batching version (and backported even for the WS-based implementation, a useful watermark for further optimizations, though my understanding is that we're moving to HTTP period, whether it makes things slower or not).

@chrisdavies
Copy link
Contributor Author

Good idea, @monfera. Do you know of any perf tests in our current test suite? I'm not aware of any, so this may be a first. The challenge with automated perf tests is that their speed varies across environments, so it's hard to know how to write them robustly. I'm open to suggestions / ideas here, though. I'm not sure if it should be a blocker on this.

I think we're moving forward with HTTP regardless, simply because of the HA Kibana cluster scenario, and the fact that WS support is not high among our enterprise environments.

@stacey-gammon
Copy link
Contributor

How about instead of blocking on automated tests (agree, nothing like this exists currently), can you run some manual tests then post the results and how you did the test? We can then have someone repeat them on their local environment just to verify that the results seem to be accurate. Then we can at least generally say, with batching it improves performance by X given a Y expression. Maybe there are tools we can use too for this kind of thing. @jimgoodwin mentioned one of them but I forget the name of it.

@monfera
Copy link
Contributor

monfera commented Feb 6, 2019

It'd be interesting to figure out what we even measure (repeating that it should be outside this PR). Example: when loading the eCommerce workbook with 5x the number of pages (ie. 10 pages total) the randomized vertical line chart updates within two seconds of a refresh (it has timelion and a couple of the functions @rashidkpc mentioned). For the entire workpad, the last HTTP response is at the 10 second mark. Another tricky aspect is that the ES and Kibana server response times are stochastic, and may change beyond the jurisdiction of the batched HTTP comms. If it's automated, then either set too tight, resulting in intermittent failures, or or too lenient, not signaling slowdown until a git bisect becomes real painful.

@w33ble spearheads another attack vector aiming to prioritize completion of the current page (now the requests for the current page are issued first, and as seen from the 2s vs 10s measurement, it already does a good job, but if there's ping-pong between client and server, then the subsequent requests will be filled randomly so there is room for manually testing the best possible case, which is, not even loading the other pages).

To me, the performance requirements for GA aren't clear, if they exist at all (the decision I heard was for unconditional removal of WS), even with WebSockets we have users who break up a midsize workpad to a few one-pagers. See also #30095 and its Description links.

@monfera
Copy link
Contributor

monfera commented Feb 6, 2019

P.S. I just used the Network time reporting in Chrome Dev Tools, seemed to be accurate for the purpose, as we're not measuring page junk or other 10ms level details, instead it's in the order of many seconds

@stacey-gammon
Copy link
Contributor

We could probably through in some test expression functions that are sleeps and simply execute on the client vs the server to fake the time it would take to execute something in es, and then maybe return some amount of data.

Copy link
Contributor

@monfera monfera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this nice optimization as it's functioning well, the code is clean and compact for what it does, and performance is almost exactly the same as the original refresh-to-current-page before deactivating WS. In addition we don't have a perf target, the WS switch is binary and this PR improves perf nicely over the baseline.

@chrisdavies chrisdavies merged commit e1430a2 into elastic:master Feb 6, 2019
@chrisdavies chrisdavies deleted the canvas/batch-without-stream branch February 6, 2019 20:50
chrisdavies added a commit to chrisdavies/kibana that referenced this pull request Feb 6, 2019
Optimize the expression interpreter by introducing batching of remote function calls.
chrisdavies added a commit to chrisdavies/kibana that referenced this pull request Feb 6, 2019
Optimize the expression interpreter by introducing batching of remote function calls.
chrisdavies added a commit that referenced this pull request Feb 7, 2019
Optimize the expression interpreter by introducing batching of remote function calls.
chrisdavies added a commit to chrisdavies/kibana that referenced this pull request Feb 7, 2019
Optimize the expression interpreter by introducing batching of remote function calls.
chrisdavies added a commit that referenced this pull request Feb 7, 2019
Optimize the expression interpreter by introducing batching of remote function calls.
chrisdavies added a commit to chrisdavies/kibana that referenced this pull request Feb 8, 2019
Optimize the expression interpreter by introducing batching of remote function calls.
chrisdavies added a commit that referenced this pull request Feb 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:fix Team:Presentation Presentation Team for Dashboard, Input Controls, and Canvas v6.7.0 v7.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants