Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming documents from ES to Discover? #170062

Open
afharo opened this issue Oct 27, 2023 · 3 comments
Open

Streaming documents from ES to Discover? #170062

afharo opened this issue Oct 27, 2023 · 3 comments
Labels
Feature:Discover Discover Application Feature:Search Querying infrastructure in Kibana impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:x-large Extra Large Level of Effort Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL.

Comments

@afharo
Copy link
Member

afharo commented Oct 27, 2023

I found an issue for /internal/bsearch that I think only affects Discover because we retrieve the full JSON documents: Elasticsearch's response size is dependent on the users' data model, so large avg-sized documents could cause Kibana to struggle when /internal/bsearch holds 500 documents in memory (to compress them and return it to the client).

For context: I recently ran some tests simulating the requests made by Discover in an index populated via makelogs.
The average size of the documents is 9kB, and Discover fetches 500 docs via /internal/bsearch. This results in Kibana using 4.4MB only to handle that request. When we multiply this by multiple concurrent users performing the same request, that can quickly add up (50 users can cause a spike in memory usage of 220MB).

Since Discover is fetching the data (mostly) as-is from ES, we may want to stream the response to avoid overusing the server's resources.

I managed to replicate it locally on main:

  1. Start ES via yarn es snapshot
  2. Start Kibana with 550MB max heap size: NODE_OPTIONS="--max-old-space-size=550" yarn start --no-base-path
  3. While still in a clean state (no data in ES yet), confirm that 100's requests don't OOM Kbana (source request.sh 200)
  4. Load some data in the index: yarn makelogs --auth="system_indices_superuser:changeme" --host 127.0.0.1:9200 -d 90/1 -c 10000
  5. Run 1 request to confirm that now it takes a bit longer to respond (source request.sh 1)
  6. Run 5 requests and see how Kibana OOMs (source request.sh 5)

The content of request.sh is below:

for i in {1..$1}
do
  curl 'http://localhost:5601/internal/bsearch?compress=true' \
    -s -o /dev/null \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Basic c3lzdGVtX2luZGljZXNfc3VwZXJ1c2VyOmNoYW5nZW1l' \
    -H 'elastic-api-version: 1' \
    -H 'kbn-version: 8.12.0' \
    -H 'x-elastic-internal-origin: Kibana' \
    --data-raw '{"batch":[{"request":{"params":{"index":"logstash-*","body":{"sort":[{"@timestamp":{"order":"desc","format":"strict_date_optional_time","unmapped_type":"boolean"}},{"_doc":{"order":"desc","unmapped_type":"boolean"}}],"fields":[{"field":"*","include_unmapped":"true"},{"field":"@timestamp","format":"strict_date_optional_time"},{"field":"relatedContent.article:modified_time","format":"strict_date_optional_time"},{"field":"relatedContent.article:published_time","format":"strict_date_optional_time"},{"field":"utc_time","format":"strict_date_optional_time"}],"size":500,"version":true,"script_fields":{},"stored_fields":["*"],"runtime_mappings":{},"_source":false,"query":{"bool":{"must":[],"filter":[{"range":{"@timestamp":{"format":"strict_date_optional_time","gte":"2022-10-02T22:00:00.000Z","lte":"2023-10-31T16:14:53.238Z"}}}],"should":[],"must_not":[]}},"highlight":{"pre_tags":["@kibana-highlighted-field@"],"post_tags":["@/kibana-highlighted-field@"],"fields":{"*":{}},"fragment_size":2147483647}},"track_total_hits":false,"preference":1696349687154}},"options":{"sessionId":"7feb5c38-dbd2-4c20-83d6-a1ac9e54571$i","isRestore":false,"strategy":"ese","isStored":false,"isSearchStored":false,"executionContext":{"type":"application","name":"discover","url":"/jdm/app/discover","page":"app","id":"new","description":"fetch documents"}}},{"request":{"params":{"index":"logstash-*","body":{"aggs":{"0":{"date_histogram":{"field":"@timestamp","calendar_interval":"1w","time_zone":"Europe/Madrid","min_doc_count":1}}},"size":0,"fields":[{"field":"@timestamp","format":"date_time"},{"field":"relatedContent.article:modified_time","format":"date_time"},{"field":"relatedContent.article:published_time","format":"date_time"},{"field":"utc_time","format":"date_time"}],"script_fields":{},"stored_fields":["*"],"runtime_mappings":{},"_source":{"excludes":[]},"query":{"bool":{"must":[],"filter":[{"range":{"@timestamp":{"format":"strict_date_optional_time","gte":"2022-10-02T22:00:00.000Z","lte":"2023-10-31T16:14:53.238Z"}}}],"should":[],"must_not":[]}}},"preference":1696349687154}},"options":{"sessionId":"7feb5c38-dbd2-4c20-83d6-a1ac9e54571$i","isRestore":false,"strategy":"ese","isStored":false,"isSearchStored":false,"executionContext":{"type":"application","name":"discover","url":"/jdm/app/discover","page":"app","id":"new","description":"fetch chart data and total hits","child":{"type":"lens","name":"lnsXY","id":"unifiedHistogramLensComponent","description":"Edit visualization","url":"/jdm/app/lens#/edit_by_value"}}}}]}'  &
done
wait
@afharo afharo added Feature:Discover Discover Application Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. labels Oct 27, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)

@kertal kertal added Feature:Search Querying infrastructure in Kibana impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:needs-research This issue requires some research before it can be worked on or estimated labels Nov 7, 2023
@lukasolson lukasolson added loe:x-large Extra Large Level of Effort and removed loe:needs-research This issue requires some research before it can be worked on or estimated labels Jan 3, 2024
@thomasneirynck
Copy link
Contributor

thx @afharo , this applies to docs in Discover, but imho would be equally favorable for agg-responses for dashboard visualizations

@thomasneirynck
Copy link
Contributor

I believe elastic/elasticsearch#109576 is blocking this. Async-search requires reading some metadata about the response, and this cannot be done without unpacking the ES-response on the server at this point.

cc @lukasolson @ppisljar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Discover Discover Application Feature:Search Querying infrastructure in Kibana impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:x-large Extra Large Level of Effort Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL.
Projects
None yet
Development

No branches or pull requests

5 participants