Bucket Aggregation size setting should never throw too_many_buckets_exception if size is less than respect search.max_buckets #51559

niemyjski · 2020-01-28T19:12:26Z

Elasticsearch version (bin/elasticsearch --version): 7.5.2

Plugins installed: []

JVM version (java -version): 7.5.2

OS version (uname -a if on a Unix-like system): docker

Description of the problem including expected versus actual behavior:
Bucket Aggregation size setting should never throw too_many_buckets_exception if size is less than respect search.max_buckets. If I have a simple terms aggregation (no nesting) then I'd think it would always return the max number of buckets as determined by the size property. I get that for accuracy more records might be returned from various shards which may be over the 10k limit but my end result returned should be <= 10k as defined by the size property.

TLDR: I don't care what queries happen behind the scenes to get me my 10k buckets. All I care about is I get my 10k buckets which is valid size as it's <= search.max_buckets :-)

Steps to reproduce:
Assuming I have more than 10k unique document ids..

POST /events/_search
{
  "aggs": {
    "terms_id": {
      "meta": {
        "@field_type": "keyword"
      },
      "terms": {
        "field": "id",
        "size": 10000
      }
    }
  }

Should return 10k unique buckets with an id in each bucket..

What happens is:

{
  "error": {
    "root_cause": [
      {
        "type": "too_many_buckets_exception",
        "reason": "Trying to create too many buckets. Must be less than or equal to: [10000] but was [10001]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
        "max_buckets": 10000
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "events",
        "node": "8v-46gnFQWGp2EsalBzwYw",
        "reason": {
          "type": "too_many_buckets_exception",
          "reason": "Trying to create too many buckets. Must be less than or equal to: [10000] but was [10001]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
          "max_buckets": 10000
        }
      }
    ]
  },
  "status": 503
}

Reasoning:

I'd love to learn more why this happens, if we could get a detailed response on this choice that would be greatly appreciated. I know I wasn't the only one as it was discussed here too: https://discuss.elastic.co/t/large-aggregate-too-many-buckets-exception/189091/15

If I'm understanding this issue correctly, wouldn't the following scenario also throw this error. Let's say I have two shards and shard 1 contains 10k+ unique ids and shard2 contains 10k+ different unique ids. The combination of both of them being queried would return 20k buckets that need to be merged down into the respected bucket size of 10k. But creating 1 bucket over the max behind the scenes would throw this error.

The text was updated successfully, but these errors were encountered:

elastic/elasticsearch#51559

elasticmachine · 2020-01-30T09:35:46Z

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

polyfractal · 2020-01-30T14:56:10Z

Hey @niemyjski :) I just recently replied to a different issue explaining a little how the setting works: #34209 (comment). At a minimum, it seems we need to document how this setting works internally so users have a better understanding.

I think the main difficulty is that the setting serves two purposes today, which confuses the semantics a little. First, it's a soft-limit for response size. Easy enough to reason about.

But the other purpose (and arguably the more important one to us) is that it acts as a "breaker" to help kill execution of expensive aggregations. There are several ways aggs can go off the rails and cause issues, but one of the predominant mechanisms are "abusive" aggs that generate too many buckets. The max_buckets threshold allows us to abort aggs that are generating an excessive amount of buckets. This is helpful in two areas:

On the shard while actually aggregating the data
On the coordinator, while incrementally reducing shard responses and during the final reduction.

The coordinator can't return results until all shards have reported in, so it has to buffer the intermediate reductions in memory. In an extreme example, if you as for size: 1, shard_size:1000000 and have ten shards all with unique values (say, aggregating on an ID field or similar), the coordinator may need to buffer 1m*10 == 10m buckets before it can select the single, top bucket to return. This is super expensive, especially when you get to real world scenarios where aggs are complex and multi-layered.

So the max_buckets threshold is a soft limit that allows us to proactively abort aggs that are growing past the point of safety.

I agree that does make it confusing/complicated for users, particularly since some aggs (like rare_terms) actually reduce their bucket count on the coordinator but might trip the limit on the shards. There is some work going on elsewhere (like #46751) that hopefully will help decouple the memory aspects of max_buckets and leave it purely as a response-size limiter.

polyfractal · 2020-03-13T13:53:39Z

Small update: there is discussion ongoing about deprecating or changing the behavior of max_buckets, with the tentative plan leaning towards only counting buckets at the very end of reduction (e.g. so that it behaves purely as a bucket size limiter on the response), and potentially changing the default to a much higher value.

#51731

polyfractal · 2020-06-18T14:30:02Z

This should now be resolved by #57042. The limit has been drastically increased, and bucket counts are now only tallied after all reductions are complete. SO the various and confusing edge-cases that I described above no longer apply, and the setting really does what it says on the tin. :)

🎉

niemyjski · 2020-06-18T21:07:57Z

Thanks!

niemyjski added a commit to FoundatioFx/Foundatio.Parsers that referenced this issue Jan 28, 2020

Specify Shard Size when calculated size could be 10k or more.

2be363a

elastic/elasticsearch#51559

cbuescher added the :Analytics/Aggregations Aggregations label Jan 30, 2020

rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020

$@polyfractal$ polyfractal closed this as completed Jun 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bucket Aggregation size setting should never throw too_many_buckets_exception if size is less than respect search.max_buckets #51559

Bucket Aggregation size setting should never throw too_many_buckets_exception if size is less than respect search.max_buckets #51559

niemyjski commented Jan 28, 2020 •

edited

Loading

elasticmachine commented Jan 30, 2020

polyfractal commented Jan 30, 2020

polyfractal commented Mar 13, 2020

polyfractal commented Jun 18, 2020

niemyjski commented Jun 18, 2020

Bucket Aggregation size setting should never throw too_many_buckets_exception if size is less than respect search.max_buckets #51559

Bucket Aggregation size setting should never throw too_many_buckets_exception if size is less than respect search.max_buckets #51559

Comments

niemyjski commented Jan 28, 2020 • edited Loading

Reasoning:

elasticmachine commented Jan 30, 2020

polyfractal commented Jan 30, 2020

polyfractal commented Mar 13, 2020

polyfractal commented Jun 18, 2020

niemyjski commented Jun 18, 2020

niemyjski commented Jan 28, 2020 •

edited

Loading