-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a bulk-loading mode to indexes #97534
Comments
Pinging @elastic/es-distributed (Team:Distributed) |
I also wonder if there's a way we could auto-detect this mode switch. For instance would it work to start a new (non-data-stream) index in bulk-load mode and then flip it into regular mode on the first search? |
This sounds like it could be useful to avoid issues with users forgetting to switch back from bulk-load mode to regular. |
In fact I wonder if we could piggy-back these things onto the existing search-idle mode. If a shard is search-idle it already skips scheduled refreshes. Could we also set the merge factor according to search-idleness? Does adjusting the flush interval or size make a meaningful performance difference in these cases? By default we flush every 12h or 512MiB which already seems pretty relaxed to me. |
I guess we could but I would be a bit reluctant to apply some of the above ideas to an index that is in a steady state, e.g. the increase in flush size/interval could make recoveries take significantly longer. Another challenge is that the more search-y use-cases that have this bulk load use-case would generally like to have their first query fast, so even though we would make it automatic via the search-idle mechanism, they would still have somewhat complex workflows to also e.g. wait for big merges to complete before making their indexes serve searches.
For reference, it was changed to 1 min / 10 GiB in #93524 as flushing every 512MiB boils down to flushing every 3-5 seconds with the TSDB track on moderately powerful hardware. On some datasets like those that have knn vectors that are expensive to merge, saving segment refreshes/flushes can make a good difference. |
Description
It is a frequent use-case to have an initial load of data when no searches are expected, followed by rare updates but heavy searches.
For such use-cases, it would be interesting to tune Elasticsearch appropriately for each of these two mode, e.g. the bulk load mode could:
And then we could also specialize the rare-update/frequent-search use-case. In addition to bringing above values back to normal:
It's possible to do all these things manually today already, but it would be nice to package it better so that there's a single setting that needs to be updated to change the index "mode".
The text was updated successfully, but these errors were encountered: