-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data views] async and partial loading of field list #152159
Comments
Pinging @elastic/kibana-data-discovery (Team:DataDiscovery) |
After some offline discussion with @mattk, we thought it be useful to relay that discussion here: Using async/paging is a re-envisioning of how we retrieve fields in the stack. It may impact both Elasticsearch (with possible new requirements for API), and also Kibana. Besides new API-requirements for Elasticsearch, the impact on Kibana will certainly have two components:
This can certainly be the best long-term way forward. That said, I wonder how much of "Kibana not being able to deal with large field-lists" issue is related to lower-level implementation details of how Kibana handles large field_caps responses. Generally speaking, browsers should be able to handle response sizes in the order of 10s to 100s of megabytes, which is from what I understand the range of field_caps respones for 1000s of fields. (1) Let's take the following assumption: large field_caps responses are not an issue for Elasticsearch. The request neither times-out, nor does the cluster raise any errors/crashes/... (This is just the working assumption (!) @DaveCTurner may confirm/deny :) (?)) (2) On Kibana-side, I'd investigate following bottlenecks: In Kibana-server:
In Kibana-browser:
So I would first confirm we're not introducing bottlenecks on these lower levels. If we still have performance issues after addressing those, I would look into more sophisticated field-loading strategies like paging and/or async search. All that said; I am certainly in favor of:
Those just impose more design requirements, more coordination-effort, and imho could divert attention from lower hanging fruit. |
We should come up with a definition of what it means to support a large field list. We're seeing 10k field lists more routinely. I think we had an SDH that involved a 100k fields. Its definitely a judgement call to decide what we need to support. I'd like to make sure we support 250k fields.
Regarding dom insertion - I think anything dealing with a big field list is already using a virtualized rendering of some sort - I should verify this. My action items -
|
@thomasneirynck @mattkime I don't have much to add to this discussion overall, but just wanted to confirm that we use virtualization in the Unified Field List currently (both Discover and Lens) since it was brought up, so large field lists should perform well there at least. |
We're pretty robust in this area in recent versions, but there's a few parts worthy of clarification:
I thought of a couple of other things we could do in ES to help too:
Finally, to repeat my comments from earlier in the week, it's always going to be faster (and cheaper, and more scalable and robust) if ES is computing smaller responses, so it would be great if we could try and find ways to avoid needing to extract a list of All The Fields from ES wherever possible. I think we don't consider field caps to be an "interactive" API today and accept that it might take several hundred milliseconds to respond, but maybe we need to change our thinking in this area. |
I'm currently investigating options for increasing the speed of loading fields in the current format. |
@aarju and the Infosec Detections and Analytics team has a use case where field caps takes 9s to resolve - I spoke to @dnhatn and he explained
As best I can tell this makes an argument for loading fields as needed. |
Definitely! I have a follow up question to @dnhatn, would it actually help if we had an option in the fieldAPI request to just get fields that actually have values? |
@kertal I am not sure if that option would help. ES will need to perform checks to determine if a field has values or not. It will also have to handle cases where one shard has values for a field while other shards do not |
We’ve got couple more SDH related to heavy |
Another SDH came in for large field list loading issues: https://github.com/elastic/sdh-security-team/issues/737 |
closed by #173948 |
Currently, the data views api loads the full field list when a data view is loaded and subsequent references to the field list are synchronous. This leads to slower performance with data views larger than 1k fields and can cause big problems above 6k fields.
Changing how data view fields are accessed touches nearly all kibana apps. The work to integrate the new api will be larger than that to create it.
To address this, the data views api should return a new type of data view called
DataViewAsyncFields
its just like a data view but thefields
array is replaced with agetFields
method.DataViewAsyncFields.getDataView
would return an old styleDataView
to allow for migration. It would be removed once migration is complete.The goal for the
getFields
api is to provide api consumers with a way of describing the fields they want so they don't need to iterate over the results to find what they need.Todo
Addresses #147484
Similar / related issues:
#134306
#139340
#151249
The text was updated successfully, but these errors were encountered: