Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of Discover with large fields #11457

Closed
Bargs opened this issue Apr 26, 2017 · 12 comments
Closed

Improve performance of Discover with large fields #11457

Bargs opened this issue Apr 26, 2017 · 12 comments
Labels
Feature:Discover Discover Application performance release_note:enhancement Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@Bargs
Copy link
Contributor

Bargs commented Apr 26, 2017

Kibana version: 5.3.1

Elasticsearch version: 5.3.1

Description of the problem including expected versus actual behavior:

Based on conversation here #7755 (comment)

Documents with large fields cause sluggishness in various parts of the Discover UI. The initial render is slow, opening and closing individual documents is slow, and switching between Table/JSON tabs is slow. There are probably other areas that are slow too. #9014 improved things quite a bit, prior to that PR Discover couldn't even load a 1MB doc without crashing the browser. However, we should still try to improve things further.

Steps to reproduce:

  1. Create a doc with a big field. The following js script may help:
import { writeFileSync } from 'fs';

let output = '';
for (let i = 0; i < 400000; i++) {
  output += i.toString();
}

writeFileSync('path/to/file.json', JSON.stringify({message: output}));
  1. Index the doc:
curl -XPOST localhost:9200/bigtest/bigtest -d @file.json -H Content-Type:Application/JSON
  1. Create the index pattern, go play with Discover

Here's a demonstration of what happens on my machine when I load a doc with a 2.3MB message field.

largefield

@msporleder-work
Copy link

some additional details: I am using chrome latest on a macbook pro newish.

Is there a non-minified version of kibana I can download to test? Right now the chrome profile shows everything in commons.bundle.js, so isn't very helpful. :)

Chances are my specific instance has a few of these larger docs showing up on discover, causing the extra slowness per full page load.

If you want to know specifically, my examples of big messages tend to be giant SQL queries with MB-worth of csv ids: DELETE FROM foo WHERE id IN ( 12345,54331,968574,.... ) or similar

@Bargs
Copy link
Contributor Author

Bargs commented Apr 27, 2017

@msporleder-work best way to get the un-minified source would be to clone the repo from github and start up Kibana in dev mode with npm start, which will generate sourcemaps for you.

I'll try indexing lots of large docs tomorrow and see how that affects things on my machine.

@Bargs
Copy link
Contributor Author

Bargs commented Apr 28, 2017

Adding more docs (unsurprisingly) slows things down, probably linearly. With 50 docs (100MB total) Discover took at least 5 minutes to load I think, I stopped watching at one point. I'm surprised it didn't crash.

I don't think there will be any quick fix for this amount of data. @weltenwort Something to think about as you ponder the doc table refactor.

@msporleder-work can you tell us a bit more about your use case? Do you need to see those giant fields in their entirety? Do you just need to search on them? I'm trying to think of other ways you could accomplish your goals.

@Bargs Bargs added the Feature:Discover Discover Application label Apr 28, 2017
@msporleder-work
Copy link

I can probably accomplish my goals and keep stability by figuring out a way for logstash to truncate the fields to < 256k.

If anyone is interested my use case for these giant entries are streaming in mysql's slow.log, one entry per sql. This let's me quickly count slow queries per server/cluster and point analysts/devs to a kibana query host:"^warehouse" AND source:"slow.log" (or whatever) and get a nice list of all the queries we need to fix. For whatever reason our queries tend to get big.

@weltenwort
Copy link
Member

weltenwort commented May 2, 2017

@Bargs thanks for including such comprehensive instructions to reproduce the effect. I will try to diagnose whether the bottleneck is the loading/processing or the rendering - I suspect all of the above 😉

Based on my intuition I would say we want to consider the following improvements:

  • loading: load only the fields currently displayed, lazy load the documents on expansion
  • processing: determine the field list via some api call instead of iterating over all fields and their values client-side [edited for correctness]
  • rendering: avoid unnecessary re-rendering using react, redux and memoization

I would be very motivated to tackle those as soon as I have completed the next stage of the context view (I've already started on the react/redux aspect on the side).

@Bargs
Copy link
Contributor Author

Bargs commented May 2, 2017

processing: determine the field list via the new field caps api instead of iterating over all fields and their values

As an aside, we need to make sure to communicate with @kobelb about this because he may be using the available field list in csv export.

@kobelb
Copy link
Contributor

kobelb commented May 2, 2017

@Bargs thanks for looping me in on this! Obviously, I don't want for the needs of any sharing integration to impose limitations on how you guys implement Discover. I was currently planning on utilizing the available fields, currently available on $scope.fieldCounts to determine all of the columns that the user wishes to share when referring to the underlying data. As long as the field caps api gives us the same information, I don't foresee this causing any issues.

@weltenwort
Copy link
Member

Actually, I was not quite correct: the field caps api does not provide the set of available columns. But the point still stands: We should get the information in a more scalable way than iterating client-side.

@kobelb I agree, sounds like that would be a subset of what discover needs anyway.

@kobelb
Copy link
Contributor

kobelb commented May 2, 2017

@weltenwort yeah... it's not exactly what we want, but it was the closest that @Bargs and I were able to determine. Based on the order that the results are returned, it's possible that certain columns could be missed.

@jbudz
Copy link
Member

jbudz commented Jun 28, 2018

Adding a raw field formatter that skips the whole processing chain may be helpful. I haven't done any investigation but the thought occurred to me.

@timroes timroes added Team:Visualizations Visualization editors, elastic-charts and infrastructure and removed Team:Visualizations Visualization editors, elastic-charts and infrastructure :Discovery labels Sep 16, 2018
@strawgate
Copy link
Contributor

Just noticed this behavior -- Kibana loads all data into the client for the first 500 entries regardless of whether or not the columns are currently showing in the table. If I have a field with a 1MB value this causes a 500MB request even though its not showing on the table.

Would love Kibana to only load the fields that are displayed and lazy load the documents on expansion:

  • loading: load only the fields currently displayed, lazy load the documents on expansion
  • processing: determine the field list via some api call instead of iterating over all fields and their values client-side [edited for correctness]
  • rendering: avoid unnecessary re-rendering using react, redux and memoization

@timroes
Copy link
Contributor

timroes commented Jun 1, 2021

Closing this in favor of #98497 and #101041

@timroes timroes closed this as completed Jun 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Discover Discover Application performance release_note:enhancement Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

No branches or pull requests

8 participants