Skip to content

Commit

Permalink
[7.x] Add support for range aggregations on histogram mapped fields (#…
Browse files Browse the repository at this point in the history
…74146) (#74682)

* Add support for range aggregations on histogram mapped fields (#74146)

This adds support for the range aggregation over `histogram` mapped fields.

Decisions made for implementation:

 - Sub-aggregations are not allowed. This is to simplify implementation and follows the prior art set by the `histogram` aggregation
 - Nothing fancy is done with the ranges. No filter translations as we cannot easily do a `range` filter query against histogram fields. This may be an optimization in the future.
 - Ranges check the histogram value ONLY. No interpolation of values is done. If we have better statistics around the histogram this MAY be possible.
  • Loading branch information
benwtrent committed Jun 29, 2021
1 parent 0a5d4e7 commit 7f4df16
Show file tree
Hide file tree
Showing 9 changed files with 973 additions and 50 deletions.
100 changes: 100 additions & 0 deletions docs/reference/aggregations/bucket/range-aggregation.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -324,3 +324,103 @@ Response:
}
--------------------------------------------------
// TESTRESPONSE[s/\.\.\.//]
[[search-aggregations-bucket-range-aggregation-histogram-fields]]
==== Histogram fields

Running a range aggregation over histogram fields computes the total number of counts for each configured range.

This is done without interpolating between the histogram field values. Consequently, it is possible to have a range
that is "in-between" two histogram values. The resulting range bucket would have a zero doc count.

Here is an example, executing a range aggregation against the following index that stores pre-aggregated histograms
with latency metrics (in milliseconds) for different networks:

[source,console]
--------------------------------------------------
PUT metrics_index/_doc/1
{
"network.name" : "net-1",
"latency_histo" : {
"values" : [1, 3, 8, 12, 15],
"counts" : [3, 7, 23, 12, 6]
}
}
PUT metrics_index/_doc/2
{
"network.name" : "net-2",
"latency_histo" : {
"values" : [1, 6, 8, 12, 14],
"counts" : [8, 17, 8, 7, 6]
}
}
POST /metrics_index/_search?size=0&filter_path=aggregations
{
"aggs": {
"latency_ranges": {
"range": {
"field": "latency_histo",
"ranges": [
{"to": 2},
{"from": 2, "to": 3},
{"from": 3, "to": 10},
{"from": 10}
]
}
}
}
}
--------------------------------------------------


The `range` aggregation will sum the counts of each range computed based on the `values` and
return the following output:

[source,console-result]
--------------------------------------------------
{
"aggregations": {
"latency_ranges": {
"buckets": [
{
"key": "*-2.0",
"to": 2,
"doc_count": 11
},
{
"key": "2.0-3.0",
"from": 2,
"to": 3,
"doc_count": 0
},
{
"key": "3.0-10.0",
"from": 3,
"to": 10,
"doc_count": 55
},
{
"key": "10.0-*",
"from": 10,
"doc_count": 31
}
]
}
}
}
--------------------------------------------------
// TESTRESPONSE[skip:test not setup]

[IMPORTANT]
========
Range aggregation is a bucket aggregation, which partitions documents into buckets rather than calculating metrics over fields like
metrics aggregations do. Each bucket represents a collection of documents which sub-aggregations can run on.
On the other hand, a histogram field is a pre-aggregated field representing multiple values inside a single field:
buckets of numerical data and a count of items/documents for each bucket. This mismatch between the range aggregations expected input
(expecting raw documents) and the histogram field (that provides summary information) limits the outcome of the aggregation
to only the doc counts for each bucket.
**Consequently, when executing a range aggregation over a histogram field, no sub-aggregations are allowed.**
========
1 change: 1 addition & 0 deletions docs/reference/mapping/types/histogram.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ following aggregations and queries:
* <<search-aggregations-metrics-percentile-rank-aggregation,percentile ranks>> aggregation
* <<search-aggregations-metrics-boxplot-aggregation,boxplot>> aggregation
* <<search-aggregations-bucket-histogram-aggregation-histogram-fields,histogram>> aggregation
* <<search-aggregations-bucket-range-aggregation-histogram-fields,range>> aggregation
* <<query-dsl-exists-query,exists>> query

[[mapping-types-histogram-building-histogram]]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ public String getKey() {
return this.key;
}

boolean matches(double value) {
public boolean matches(double value) {
return value >= from && value < to;
}

Expand Down Expand Up @@ -430,17 +430,17 @@ public static Aggregator buildWithoutAttemptedToAdaptToFilters(
);
}

private final ValuesSource.Numeric valuesSource;
protected final ValuesSource valuesSource;
private final DocValueFormat format;
protected final Range[] ranges;
private final boolean keyed;
private final InternalRange.Factory rangeFactory;
private final double averageDocsPerRange;

private RangeAggregator(
public RangeAggregator(
String name,
AggregatorFactories factories,
ValuesSource.Numeric valuesSource,
ValuesSource valuesSource,
DocValueFormat format,
InternalRange.Factory rangeFactory,
Range[] ranges,
Expand Down Expand Up @@ -469,23 +469,6 @@ public ScoreMode scoreMode() {
return super.scoreMode();
}

@Override
public LeafBucketCollector getLeafCollector(LeafReaderContext ctx, LeafBucketCollector sub) throws IOException {
final SortedNumericDoubleValues values = valuesSource.doubleValues(ctx);
return new LeafBucketCollectorBase(sub, values) {
@Override
public void collect(int doc, long bucket) throws IOException {
if (values.advanceExact(doc)) {
final int valuesCount = values.docValueCount();
for (int i = 0, lo = 0; i < valuesCount; ++i) {
final double value = values.nextValue();
lo = RangeAggregator.this.collect(sub, doc, value, bucket, lo);
}
}
}
};
}

protected long subBucketOrdinal(long owningBucketOrdinal, int rangeOrd) {
return owningBucketOrdinal * ranges.length + rangeOrd;
}
Expand Down Expand Up @@ -556,10 +539,61 @@ public InternalAggregation buildEmptyAggregation() {
}
}

protected abstract int collect(LeafBucketCollector sub, int doc, double value, long owningBucketOrdinal, int lowBound)
throws IOException;
private abstract static class NumericRangeAggregator extends RangeAggregator {

NumericRangeAggregator(
String name,
AggregatorFactories factories,
ValuesSource.Numeric valuesSource,
DocValueFormat format,
Factory<?, ?> rangeFactory,
Range[] ranges,
double averageDocsPerRange,
boolean keyed,
AggregationContext context,
Aggregator parent,
CardinalityUpperBound cardinality,
Map<String, Object> metadata
) throws IOException {
super(
name,
factories,
valuesSource,
format,
rangeFactory,
ranges,
averageDocsPerRange,
keyed,
context,
parent,
cardinality,
metadata
);
}

@Override
public LeafBucketCollector getLeafCollector(LeafReaderContext ctx, LeafBucketCollector sub) throws IOException {
final SortedNumericDoubleValues values = ((ValuesSource.Numeric)this.valuesSource).doubleValues(ctx);
return new LeafBucketCollectorBase(sub, values) {
@Override
public void collect(int doc, long bucket) throws IOException {
if (values.advanceExact(doc)) {
final int valuesCount = values.docValueCount();
for (int i = 0, lo = 0; i < valuesCount; ++i) {
final double value = values.nextValue();
lo = NumericRangeAggregator.this.collect(sub, doc, value, bucket, lo);
}
}
}
};
}

protected abstract int collect(LeafBucketCollector sub, int doc, double value, long owningBucketOrdinal, int lowBound)
throws IOException;
}

static class NoOverlap extends NumericRangeAggregator {

static class NoOverlap extends RangeAggregator {
NoOverlap(
String name,
AggregatorFactories factories,
Expand Down Expand Up @@ -609,13 +643,13 @@ protected int collect(LeafBucketCollector sub, int doc, double value, long ownin
}
}

private static class Overlap extends RangeAggregator {
private static class Overlap extends NumericRangeAggregator {
Overlap(
String name,
AggregatorFactories factories,
Numeric valuesSource,
DocValueFormat format,
Factory rangeFactory,
Factory<?, ?> rangeFactory,
Range[] ranges,
double averageDocsPerRange,
boolean keyed,
Expand Down Expand Up @@ -690,7 +724,7 @@ protected int collect(LeafBucketCollector sub, int doc, double value, long ownin

for (int i = startLo; i <= endHi; ++i) {
if (ranges[i].matches(value)) {
collectBucket(sub, doc, subBucketOrdinal(owningBucketOrdinal, i));
collectBucket(sub, doc, subBucketOrdinal(owningBucketOrdinal, i));
}
}

Expand Down Expand Up @@ -759,7 +793,7 @@ public void collectDebugInfo(BiConsumer<String, Object> add) {
}
}

private static boolean hasOverlap(Range[] ranges) {
public static boolean hasOverlap(Range[] ranges) {
double lastEnd = ranges[0].to;
for (int i = 1; i < ranges.length; ++i) {
if (ranges[i].from < lastEnd) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,8 @@ public List<Consumer<ValuesSourceRegistry.Builder>> getAggregationExtentions() {
AnalyticsAggregatorFactory::registerHistoBackedAverageAggregator,
AnalyticsAggregatorFactory::registerHistoBackedHistogramAggregator,
AnalyticsAggregatorFactory::registerHistoBackedMinggregator,
AnalyticsAggregatorFactory::registerHistoBackedMaxggregator
AnalyticsAggregatorFactory::registerHistoBackedMaxggregator,
AnalyticsAggregatorFactory::registerHistoBackedRangeAggregator
);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
package org.elasticsearch.xpack.analytics.aggregations;

import org.elasticsearch.search.aggregations.bucket.histogram.HistogramAggregationBuilder;
import org.elasticsearch.search.aggregations.bucket.range.RangeAggregationBuilder;
import org.elasticsearch.search.aggregations.metrics.AvgAggregationBuilder;
import org.elasticsearch.search.aggregations.metrics.MaxAggregationBuilder;
import org.elasticsearch.search.aggregations.metrics.MinAggregationBuilder;
Expand All @@ -18,6 +19,7 @@
import org.elasticsearch.search.aggregations.metrics.ValueCountAggregationBuilder;
import org.elasticsearch.search.aggregations.support.ValuesSourceRegistry;
import org.elasticsearch.xpack.analytics.aggregations.bucket.histogram.HistoBackedHistogramAggregator;
import org.elasticsearch.xpack.analytics.aggregations.bucket.range.HistoBackedRangeAggregator;
import org.elasticsearch.xpack.analytics.aggregations.metrics.HistoBackedAvgAggregator;
import org.elasticsearch.xpack.analytics.aggregations.metrics.HistoBackedHDRPercentileRanksAggregator;
import org.elasticsearch.xpack.analytics.aggregations.metrics.HistoBackedHDRPercentilesAggregator;
Expand Down Expand Up @@ -101,4 +103,13 @@ public static void registerHistoBackedMaxggregator(ValuesSourceRegistry.Builder
builder.register(MaxAggregationBuilder.REGISTRY_KEY, AnalyticsValuesSourceType.HISTOGRAM, HistoBackedMaxAggregator::new, true);
}

public static void registerHistoBackedRangeAggregator(ValuesSourceRegistry.Builder builder) {
builder.register(
RangeAggregationBuilder.REGISTRY_KEY,
AnalyticsValuesSourceType.HISTOGRAM,
HistoBackedRangeAggregator::build,
true
);
}

}

0 comments on commit 7f4df16

Please sign in to comment.