Add a new cluster setting to limit the total number of buckets returned by a request #27581

jimczi · 2017-11-29T10:29:37Z

This commit adds a new dynamic cluster setting named search.max_buckets that can be used to limit the number
of buckets created per shard or by the reduce phase. Each multi bucket aggregator can consume buckets during the final build
of the aggregation at the shard level or during the reduce phase (final or not) in the coordinating node. When an aggregator consumes a bucket,
a global count for the request is incremented and if this number is greater than the limit an exception is thrown (TooManyBuckets exception).
This change adds the ability for multi bucket aggregator to "consume" buckets in the global limit, the default is 10,000.
It's an opt-in consumer so each multi-bucket aggregator must explicitly call the consumer when a bucket is added in the response.

Closes #27452 #26012

colings86

Left some minor comments but LGTM (once the build passes ;) )

colings86 · 2017-11-29T11:35:19Z

core/src/main/java/org/elasticsearch/action/search/SearchPhaseController.java

+        this(settings, (b) -> new ReduceContext(bigArrays, scriptService, b));
+    }
+
+    public SearchPhaseController(Settings settings, Function<Boolean, ReduceContext> reduceContextFunction) {


Could you add some JavaDocs here explaining what the reduceContextFunction is?

colings86 · 2017-11-29T11:37:22Z

core/src/main/java/org/elasticsearch/search/aggregations/MultiBucketConsumerService.java

+import java.util.function.BiFunction;
+import java.util.function.IntConsumer;
+
+public class MultiBucketConsumerService {


Can we add JavaDocs to this class to explain what it is used for?

colings86 · 2017-11-29T11:38:59Z

core/src/main/java/org/elasticsearch/search/aggregations/MultiBucketConsumerService.java

+        this.maxBucket = maxBucket;
+    }
+
+    public static class TooManyBuckets extends ElasticsearchException {


Can we name this TooManyBucketsException? Also should this extend AggregationExecutionException?

colings86 · 2017-11-29T11:40:45Z

core/src/main/java/org/elasticsearch/search/aggregations/MultiBucketConsumerService.java

+
+        @Override
+        public void accept(int value) {
+            count += value;


Maybe add a note explaining that its ok that the count is not an AtomicInteger since aggregations execute in a single thread?

jimczi · 2017-11-30T13:03:21Z

Thanks @colings86
I pushed more commits to address your review and to handle all multi bucket aggregations. Can you take another look ?

colings86

@jimczi i left some more comments

colings86 · 2017-11-30T14:00:43Z

...csearch/search/aggregations/bucket/significant/GlobalOrdinalsSignificantTermsAggregator.java

@@ -131,6 +131,9 @@ public SignificantStringTerms buildAggregation(long owningBucketOrdinal) throws
            // global stats
            spare.updateScore(significanceHeuristic);
            spare = ordered.insertWithOverflow(spare);
+            if (spare == null) {
+                consumeBucketsAndMaybeBreak(1);
+            }


Should there be an else here to remove the buckets inside spare if its not null?

There is no bucket inside spare yet. We are just selecting the top terms here. When the selection is done we build the bucket aggregation for each top terms only so we'll count the inner buckets only for the final top terms.

colings86 · 2017-11-30T14:01:31Z

...org/elasticsearch/search/aggregations/bucket/significant/SignificantLongTermsAggregator.java

@@ -101,6 +101,9 @@ public SignificantLongTerms buildAggregation(long owningBucketOrdinal) throws IO

            spare.bucketOrd = i;
            spare = ordered.insertWithOverflow(spare);
+            if (spare == null) {
+                consumeBucketsAndMaybeBreak(1);
+            }