Change search.max_buckets from Cluster Setting to Index Level Setting #61042

lanerjo · 2020-08-12T15:34:38Z

For tuning purposes, not all indices are "killer" query indices.
There are use cases where some datasets should not be allowed even 10000 buckets. Whereas other datasets need 10000 or more buckets.

I propose that user should be able to specify on a per index level the search.max_buckets setting in the index settings.

This would provide much better tuning capabilities and provide more robust bucket settings.

polyfractal · 2020-08-13T16:19:56Z

Hiya @lanerjo. So first, we just recently changed the semantics of how this setting works, and bumped the threshold considerably at that time: #57042 This will be taking effect starting in 7.9

Regarding index vs cluster setting... I think that one's tricky since search requests can span multiple indices. E.g. if a search hits an index that has 10k limit, and another has 20k... which do you use? Or a situation where two indices are configured for 10k which makes sense for that data... but when combined might really need 20k?

The original purpose of the setting was more for safety than anything else, so it made sense to be at a cluster level. Now that it's purely cosmetic (just keeping response size reasonable) I think cluster level is probably correct. Marking this for the analytics team to take a look at, see if anyone else has thoughts :)

elasticmachine · 2020-08-13T16:20:03Z

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

lanerjo · 2020-08-13T17:55:07Z

Hiya @lanerjo. So first, we just recently changed the semantics of how this setting works, and bumped the threshold considerably at that time: #57042 This will be taking effect starting in 7.9

Regarding index vs cluster setting... I think that one's tricky since search requests can span multiple indices. E.g. if a search hits an index that has 10k limit, and another has 20k... which do you use? Or a situation where two indices are configured for 10k which makes sense for that data... but when combined might really need 20k?

The original purpose of the setting was more for safety than anything else, so it made sense to be at a cluster level. Now that it's purely cosmetic (just keeping response size reasonable) I think cluster level is probably correct. Marking this for the analytics team to take a look at, see if anyone else has thoughts :)

After reading through #57042 This would be very dangerous for a couple of the datasets that I manage. I have a dataset that has millions of unique values across several fields per day. Even at 10k search.max_buckets setting some user queries would topple coordinating only nodes with 30Gb of Ram.

Regarding searching multiple indices with different settings based on what would be safe for that index... You limit the number of buckets per index... If you have several indices with different settings, you apply the setting to each individual index based on the number of buckets produced. Example: index a has limit of 10,000 buckets, index b has limit of 20,000 buckets, index c has limit of 5000 buckets... Each index has it's own limit and the query is cancelled once that index hits it's limit... You could potentially have 35000 buckets being returned.

Perhaps keeping the search.max_buckets at the cluster level - run only on the merge phase of a query, and adding an index.max_buckets setting would be a better option - that is evaluated during shard reduce phase. This way admins could tune each index and still limit the total number of buckets that are returned.

polyfractal · 2020-08-13T18:16:35Z

This would be very dangerous for a couple of the datasets that I manage. I have a dataset that has millions of unique values across several fields per day. Even at 10k search.max_buckets setting some user queries would topple coordinating only nodes with 30Gb of Ram.

Sorry, I didn't fully explain the change: the reason we're comfortable increasing the limit is because we've improved memory accounting elsewhere (#46751 and others). The max_buckets threshold was a crude proxy for memory. It often under- or over-estimated the "cost" of an aggregation, because it was just looking at the number of buckets which only somewhat mirrors the actual cost of an aggregation. But it was generally effective, albeit with very confusing semantics (individual shards could trip the threshold despite the "size" of the overall agg being low, or aggs like rare_terms which prune their results later).

With recent changes, we are tracking memory much more closely and this proxy is no longer needed. You'll just get a real CircuitBreakerException when running into memory pressure situations. So there shouldn't be a net change to the "danger" of a cluster, and it should more accurately track the memory of any particular request, both at the shard and at the coordinator (which the old proxies were notoriously poor at stopping, because counting happened at a different time than when the coordinator started to handle results).

lanerjo · 2020-08-18T15:21:55Z

Has this been tested across datasets with 6Tb of Data per day, 95 primary shards, 3 billion records per day, 60 days retention, around 600 fields, and nearly half of those have high cardinality fields? With a few of the fields having million of unique values per day, across say 60 days of data?

I would say most likely not. Datasets of this size have a tendency to run even dedicated coordinator only nodes OOM if aggregations are too large... And Let's face it, even using 1% as the circuit breaker, will still run a coordinator only node OOM, with the wrong query.

Improving the real memory usage is very likely to help, but I do not believe that is the answer.

With that being said, adding the max_buckets setting to the index level, while still keeping it at the cluster level would further allow administrators the ability to limit the impact an individual user can have on the entire cluster.

jimczi · 2020-08-18T15:32:15Z

I would say most likely not. Datasets of this size have a tendency to run even dedicated coordinator only nodes OOM if aggregations are too large... And Let's face it, even using 1% as the circuit breaker, will still run a coordinator only node OOM, with the wrong query.

We're working on a change to allow circuit breaker on the coordinating node:
#37182
That would be much more reliable in my opinion since the max buckets option does not take into account the size of the bucket, nor the memory available on the node.

wchaparro · 2022-03-21T15:41:24Z

@lanerjo just checking to see if the circuit breaker change described above was able to solve your use case.

wchaparro · 2022-12-14T16:53:05Z

Closing as not planned.

lanerjo added >enhancement needs:triage Requires assignment of a team area label labels Aug 12, 2020

$@polyfractal$ polyfractal added :Analytics/Aggregations Aggregations and removed needs:triage Requires assignment of a team area label labels Aug 13, 2020

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Aug 13, 2020

wchaparro closed this as not planned Won't fix, can't repro, duplicate, stale Dec 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change search.max_buckets from Cluster Setting to Index Level Setting #61042

Change search.max_buckets from Cluster Setting to Index Level Setting #61042

lanerjo commented Aug 12, 2020

polyfractal commented Aug 13, 2020

elasticmachine commented Aug 13, 2020

lanerjo commented Aug 13, 2020

polyfractal commented Aug 13, 2020

lanerjo commented Aug 18, 2020

jimczi commented Aug 18, 2020

wchaparro commented Mar 21, 2022

wchaparro commented Dec 14, 2022

Change search.max_buckets from Cluster Setting to Index Level Setting #61042

Change search.max_buckets from Cluster Setting to Index Level Setting #61042

Comments

lanerjo commented Aug 12, 2020

polyfractal commented Aug 13, 2020

elasticmachine commented Aug 13, 2020

lanerjo commented Aug 13, 2020

polyfractal commented Aug 13, 2020

lanerjo commented Aug 18, 2020

jimczi commented Aug 18, 2020

wchaparro commented Mar 21, 2022

wchaparro commented Dec 14, 2022