[data views] async and partial loading of field list #152159

mattkime · 2023-02-25T02:24:08Z

Currently, the data views api loads the full field list when a data view is loaded and subsequent references to the field list are synchronous. This leads to slower performance with data views larger than 1k fields and can cause big problems above 6k fields.

Changing how data view fields are accessed touches nearly all kibana apps. The work to integrate the new api will be larger than that to create it.

To address this, the data views api should return a new type of data view called DataViewAsyncFields its just like a data view but the fields array is replaced with a getFields method. DataViewAsyncFields.getDataView would return an old style DataView to allow for migration. It would be removed once migration is complete.

The goal for the getFields api is to provide api consumers with a way of describing the fields they want so they don't need to iterate over the results to find what they need.

getFields params
  fields: string or array of field names. Wildcards `*` supported
  includeUnmapped: boolean. defaults to false // maybe this should be mapping type - mapped / unmapped / runtime / scripted / meta
  types: types of fields to be returned. es types (long, int, etc) with kbn types (Number, string, etc) shim
  perPage: number of fields to return (default 500)
  page: which page of fields to return
  searchable: boolean
  aggregatable: boolean
  isMetadataField:boolean // maybe this isn't needed if there's a more general tool that does the same.
  indexFilter: query object

Todo

caching
stream based parsing - do we need it if we're loading smaller batches of fields?

Addresses #147484

Similar / related issues:
#134306
#139340
#151249

The text was updated successfully, but these errors were encountered:

elasticmachine · 2023-02-25T02:24:11Z

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)

thomasneirynck · 2023-03-23T18:35:20Z

After some offline discussion with @mattk, we thought it be useful to relay that discussion here:

Using async/paging is a re-envisioning of how we retrieve fields in the stack. It may impact both Elasticsearch (with possible new requirements for API), and also Kibana. Besides new API-requirements for Elasticsearch, the impact on Kibana will certainly have two components:

new internal field-fetching logic, to handle the paging/async request
new UX-patterns in relevant apps (Dsicover, ...) to deal with "incomplete" field-states

This can certainly be the best long-term way forward. That said, I wonder how much of "Kibana not being able to deal with large field-lists" issue is related to lower-level implementation details of how Kibana handles large field_caps responses.

Generally speaking, browsers should be able to handle response sizes in the order of 10s to 100s of megabytes, which is from what I understand the range of field_caps respones for 1000s of fields.

(1) Let's take the following assumption: large field_caps responses are not an issue for Elasticsearch. The request neither times-out, nor does the cluster raise any errors/crashes/... (This is just the working assumption (!) @DaveCTurner may confirm/deny :) (?))

(2) On Kibana-side, I'd investigate following bottlenecks:

In Kibana-server:

there's known endpoints where kibana decodes and re-encodes an ES-response (_bsearch is an example). Kibana-server holds an entire JSON-response in-memory server-side.
We need to ensure that for field-caps, the Kibana endpoint:
- only streams the raw ES-response straight to browser (ie. it doesn't buffer the full response)
- do any decoding/encoding on it (e.g. some kibana endpoints re-encode to base64. I don't think this is the case, but let's ensure)

In Kibana-browser:

parsing the response. For large JSON-docs, we should not be using JSON.parse. JSON.parse is suboptimal because of two reasons:
- it requires the entire response to be serialized into a string-object. This string-object has no useful relevance whatsoever, yet it creates a ton of duplicate cruft.
- Calling JSON.parse blocks all other threads while the data is unmarshalled. This is a common place where browsers "hang" or may even crash.
retaining field-list in memory
- I don't think there should be an issue of heap-allocating thousands of fields as thousands of JS-objects, even in the order of 100s of megs. (I don't rule it out, but I don't see a big issue given the state of browsers and machines today).
- Partically speaking, it should mean that DataView#getFields certainly can respond with 10000s of fields to the client.
DOM insertion
- at some point, all these fields will be inserted somewhere in the DOM (e.g. as nodes in Discover left field-list sidebar). I am uncertain if this is really causing issues. I would expect "no", given that thousands of nodes is not that strange, but React may struggle with it nonetheless. If this is an issue (ie. browser borks because DOM gets too large), we can think of client-side only issues on how to resolve (e.g. a client-side tabbing or paging mechanism).

So I would first confirm we're not introducing bottlenecks on these lower levels. If we still have performance issues after addressing those, I would look into more sophisticated field-loading strategies like paging and/or async search.

All that said; I am certainly in favor of:

Apps should only load the fields they need (e.g. Lens chart on dashboard does not need all 10000s fields)
DataView APIs are expanded to have more fine-grained use-cases.

Those just impose more design requirements, more coordination-effort, and imho could divert attention from lower hanging fruit.

mattkime · 2023-03-24T02:44:42Z

@thomasneirynck

We should come up with a definition of what it means to support a large field list. We're seeing 10k field lists more routinely. I think we had an SDH that involved a 100k fields. Its definitely a judgement call to decide what we need to support. I'd like to make sure we support 250k fields.

src/plugins/data_views/server/fetcher/index_patterns_fetcher.ts does do some processing of the ES field_caps response. I think this is worth investigating if we find a delay in receiving the response.

Regarding dom insertion - I think anything dealing with a big field list is already using a virtualized rendering of some sort - I should verify this.

My action items -

Check in with security solution devs to get a better problem definition from their perspective. I think they have a better idea regarding the number of fields and the specific impact.
I should produce some ballpark numbers for the relationship between field count, payload size, and parsing time.

davismcphee · 2023-03-24T19:22:08Z

at some point, all these fields will be inserted somewhere in the DOM (e.g. as nodes in Discover left field-list sidebar). I am uncertain if this is really causing issues.

Regarding dom insertion - I think anything dealing with a big field list is already using a virtualized rendering of some sort - I should verify this.

@thomasneirynck @mattkime I don't have much to add to this discussion overall, but just wanted to confirm that we use virtualization in the Unified Field List currently (both Discover and Lens) since it was brought up, so large field lists should perform well there at least.

DaveCTurner · 2023-03-24T22:03:53Z

Let's take the following assumption: large field_caps responses are not an issue for Elasticsearch. The request neither times-out, nor does the cluster raise any errors/crashes/... (This is just the working assumption (!) @DaveCTurner may confirm/deny :) (?))

We're pretty robust in this area in recent versions, but there's a few parts worthy of clarification:

re. crashes, we would consider it an ES bug if field caps (or any other) requests could be harmful to the cluster in this way.
re. errors, we may return 429 Too Many Requests if we are overloaded, and we may be unable to respond (or may only partially respond) if the cluster is unhealthy for some other reason. But it's an ES bug if this happens for no obvious reason (e.g. while the load on the cluster is within expectations and the cluster reports no health problems). Users will always find ways to push their clusters too hard and end up in these situations, so we will always need Kibana to do its best to react gracefully to errors.
re. timeouts, Elasticsearch does not itself have any timeouts in this area, but there may be timeouts imposed by intermediary proxies or by Kibana itself over which Elasticsearch has no control. We sometimes need to fan-out the field caps computation across tens-of-thousands of shards in hundreds of remote clusters so we're not offering any real-time guarantees here. As with errors, it's an ES bug if we take too long for no obvious reason, and likewise we still need Kibana to do its best to react gracefully to slow responses.

I thought of a couple of other things we could do in ES to help too:

I believe it's difficult to parse a pure JSON response in a streaming fashion within Kibana, but we are free to choose a different and more streamable format (e.g NDJSON) if that would help.
I think we don't have any caching in ES around field caps at the moment, but that's definitely an option if we encounter further performance constraints. IMO it's good to push caching into ES because this allows for a cache to be shared across multiple clients in ways that a Kibana-side cache cannot.

Finally, to repeat my comments from earlier in the week, it's always going to be faster (and cheaper, and more scalable and robust) if ES is computing smaller responses, so it would be great if we could try and find ways to avoid needing to extract a list of All The Fields from ES wherever possible. I think we don't consider field caps to be an "interactive" API today and accept that it might take several hundred milliseconds to respond, but maybe we need to change our thinking in this area.

mattkime · 2023-04-06T22:55:18Z

I'm currently investigating options for increasing the speed of loading fields in the current format.

mattkime · 2023-05-22T15:47:39Z

@aarju and the Infosec Detections and Analytics team has a use case where field caps takes 9s to resolve -

kibana-app.whitesector.inf.elasticnet.co (2).har.zip

I spoke to @dnhatn and he explained

If a field-caps hits many indices and these indices don’t have identical mappings and contain multiple fields, the request will be slow. It is preferable to only ask for fields that we will actually use. This reduces the merging cost and response size. In some cases, requesting all fields might be faster, but that typically occurs when the response is very fast. In most cases, requesting specific fields will reduce the worst-case scenario.

As best I can tell this makes an argument for loading fields as needed.

kertal · 2023-05-22T16:32:25Z

As best I can tell this makes an argument for loading fields as needed.

Definitely! I have a follow up question to @dnhatn, would it actually help if we had an option in the fieldAPI request to just get fields that actually have values?

dnhatn · 2023-05-22T18:33:20Z

would it actually help if we had an option in the fieldAPI request to just get fields that actually have values?

@kertal I am not sure if that option would help. ES will need to perform checks to determine if a field has values or not. It will also have to handle cases where one shard has values for a field while other shards do not

e40pud · 2023-05-31T15:14:46Z

At security solution we would benefit from any kind of improvements in fields fetching a lot.

In 8.7 we introduced the feature where we show to user warnings about field conflicts and when filed is unmapped in some of the indices. For that we have to fetch extended information about each field using includeUnmapped flag. Apparently this slowed down dramatically the usage of Rule Exception flyout as we need to load all fields on opening. One of the first who came back to us were InfoSec folks as they have huge amount of fields in their indices.

After some research, it seems like there is no way around this at the moment. We are fully in for any changes that could improve that for us.

e40pud · 2023-06-28T13:08:25Z

We’ve got couple more SDH related to heavy _field_caps responses. This time it happens with EQL queries https://github.com/elastic/security-team/issues/6962

dhurley14 · 2023-10-04T21:17:55Z

Another SDH came in for large field list loading issues: https://github.com/elastic/sdh-security-team/issues/737

mattkime · 2024-07-12T02:40:20Z

closed by #173948

mattkime added Feature:Data Views Data Views code and UI - index patterns before 8.0 Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. labels Feb 25, 2023

mattkime mentioned this issue Feb 28, 2023

[data views] large field caps responses can block main thread, reducing responsiveness #151249

Closed

kertal assigned mrebeschini and mattkime and unassigned mrebeschini Mar 7, 2023

mattkime mentioned this issue Mar 23, 2023

[data views] telemetry is too resource intensive, skip field loading #151248

Closed

kertal added the discuss label Mar 28, 2023

kertal mentioned this issue Jun 20, 2023

Lazy field list loading #134306

Closed

This was referenced Sep 11, 2023

[META] Dashboard Performance #166211

Closed

[Lens] Should only load necessary fields from DataViews when viewing charts/maps #166213

Open

mattkime closed this as completed Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[data views] async and partial loading of field list #152159

[data views] async and partial loading of field list #152159

mattkime commented Feb 25, 2023 •

edited by kertal

Loading

elasticmachine commented Feb 25, 2023

thomasneirynck commented Mar 23, 2023 •

edited

Loading

mattkime commented Mar 24, 2023

davismcphee commented Mar 24, 2023

DaveCTurner commented Mar 24, 2023 •

edited

Loading

mattkime commented Apr 6, 2023

mattkime commented May 22, 2023

kertal commented May 22, 2023

dnhatn commented May 22, 2023

e40pud commented May 31, 2023

e40pud commented Jun 28, 2023

dhurley14 commented Oct 4, 2023

mattkime commented Jul 12, 2024

[data views] async and partial loading of field list #152159

[data views] async and partial loading of field list #152159

Comments

mattkime commented Feb 25, 2023 • edited by kertal Loading

elasticmachine commented Feb 25, 2023

thomasneirynck commented Mar 23, 2023 • edited Loading

mattkime commented Mar 24, 2023

davismcphee commented Mar 24, 2023

DaveCTurner commented Mar 24, 2023 • edited Loading

mattkime commented Apr 6, 2023

mattkime commented May 22, 2023

kertal commented May 22, 2023

dnhatn commented May 22, 2023

e40pud commented May 31, 2023

e40pud commented Jun 28, 2023

dhurley14 commented Oct 4, 2023

mattkime commented Jul 12, 2024

mattkime commented Feb 25, 2023 •

edited by kertal

Loading

thomasneirynck commented Mar 23, 2023 •

edited

Loading

DaveCTurner commented Mar 24, 2023 •

edited

Loading