New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline aggregations: Ability to perform computations on aggregations #10568

Merged
merged 78 commits into from Apr 29, 2015

Conversation

Projects
None yet
6 participants
@colings86
Member

colings86 commented Apr 13, 2015

Adds a new type of aggregation called 'reducers' which act on the output of aggregations and compute extra information that they add to the aggregation tree. Reducers look much like any other aggregation in the request but have a buckets_path parameter which references the aggregation(s) to use.

Internally there are two types of reducer; the first is given the output of its parent aggregation and computes new aggregations to add to the buckets of its parent, and the second (a specialisation of the first) is given a sibling aggregation and outputs an aggregation to be a sibling at the same level as that aggregation.

This PR includes the framework for the reducers, the derivative reducer (#9293), the moving average reducer(#10002) and the maximum bucket reducer(#10000). These reducer implementations are not all yet fully complete.

Known work left to do (these points will be done once this PR is merged into the master branch):

  • Add x-axis normalisation to the derivative reducer
  • Add lots more JUnit tests for all reducers

Contributes to #9876, #10002, #9293, and #10000

colings86 and others added some commits Feb 11, 2015

make InternalAggregation.reduce(ReduceContext) use template pattern
sub-classes of InternalAggregation now implement doReduce(ReduceContext) that is called from InternalAggregation.reduce(ReduceContext) which is now final
Adds reducers list to InternalAggregation.reduce()
The list of reducers is fed through from the AggregatorFactory
AggregatorFactories now stores reducers as well as aggregators
These reducers will be passed through from the AggregatorParser
Fixing compile issues after rebase with master
Mostly due to @jpountz's leaf collector changes
Tests for derivative reducer
Most tests have been marked with @AwaitsFix since they require functionality to be implemented before they will pass
InternalHistogram.Factory.create() can now work from prototype
Another InternalHistogram instance can be passed into the method with the buckets and the name and will be used to set all the options such as minDocCount, formatter, Order etc.
Merge branch 'master' into feature/aggs_2_0
Conflicts:
	src/main/java/org/elasticsearch/search/aggregations/bucket/geogrid/GeoHashGridParser.java
	src/main/java/org/elasticsearch/search/aggregations/metrics/percentiles/InternalPercentileRanks.java
	src/main/java/org/elasticsearch/search/aggregations/metrics/percentiles/InternalPercentiles.java
	src/main/java/org/elasticsearch/search/aggregations/metrics/tophits/InternalTopHits.java
	src/main/java/org/elasticsearch/search/aggregations/metrics/tophits/TopHitsAggregator.java
	src/test/java/org/elasticsearch/search/aggregations/bucket/significant/SignificanceHeuristicTests.java
Move GapPolicy and resolveBucketValues() to static helper methods
Will allow many reducers to share the same helper functionality without repeating code.  Chose
to put these in static helpers instead of adding to Reducer base class.  I can imagine other reducers
that aren't time-based (or don't care about contiguous buckets), which would make things like
gap policy useless.

Since these seemed more like helpers than inherent traits of a Reducer, they went into their own
static class.

Closes #9954
Merge branch 'master' into feature/aggs_2_0
Conflicts:
	src/main/java/org/elasticsearch/search/aggregations/metrics/scripted/ScriptedMetricAggregator.java
	src/test/java/org/elasticsearch/search/aggregations/bucket/significant/SignificanceHeuristicTests.java
+ " must reference either a number value or a single value numeric metric aggregation");
}
// doc count never has missing values so gap policy doesn't apply here
boolean isDocCountProperty = aggPathAsList.size() == 1 && "_count".equals(aggPathAsList.get(0));

This comment has been minimized.

@jpountz

jpountz Apr 24, 2015

Contributor

Should it be aggPathAsList.get(aggPathAsList.size() - 1).equals("_count") ?

@jpountz

jpountz Apr 24, 2015

Contributor

Should it be aggPathAsList.get(aggPathAsList.size() - 1).equals("_count") ?

This comment has been minimized.

@colings86

colings86 Apr 27, 2015

Member

Not sure it matters? If the whole path is _count then it will work as is, and you can't get deeper buckets if there is a doc count of 0 in the top level anyway?

@colings86

colings86 Apr 27, 2015

Member

Not sure it matters? If the whole path is _count then it will work as is, and you can't get deeper buckets if there is a doc count of 0 in the top level anyway?

}
}
} catch (InvalidAggregationPathException e) {
return null;

This comment has been minimized.

@jpountz

jpountz Apr 24, 2015

Contributor

I'm concerned this might hide actual issues?

@jpountz

jpountz Apr 24, 2015

Contributor

I'm concerned this might hide actual issues?

This comment has been minimized.

@colings86

colings86 Apr 27, 2015

Member

Should we add a debug log here then? it still needs to return null as a bucket may not have an entry for a particular aggregation name

@colings86

colings86 Apr 27, 2015

Member

Should we add a debug log here then? it still needs to return null as a bucket may not have an entry for a particular aggregation name

@jpountz

This comment has been minimized.

Show comment
Hide comment
@jpountz

jpountz Apr 24, 2015

Contributor

I like the fact that it's not too invasive and makes efforts to not have side-effects on other aggs by copying the agg trees instead of modifying them in-place.

Most of my comments are cosmetics, in particular there are lots of indentation issues in aggregator constructor and anonymous LeafCollector definitions.

Regarding documentation, I think we need to move general informations to the main page about aggs/reducers instead of having it scattered across individual reducers/aggs.

Contributor

jpountz commented Apr 24, 2015

I like the fact that it's not too invasive and makes efforts to not have side-effects on other aggs by copying the agg trees instead of modifying them in-place.

Most of my comments are cosmetics, in particular there are lots of indentation issues in aggregator constructor and anonymous LeafCollector definitions.

Regarding documentation, I think we need to move general informations to the main page about aggs/reducers instead of having it scattered across individual reducers/aggs.

@colings86

This comment has been minimized.

Show comment
Hide comment
@colings86

colings86 Apr 29, 2015

Member

@jpountz I pushed an update and replied to some of your comments. Any chance you could have another look?

Member

colings86 commented Apr 29, 2015

@jpountz I pushed an update and replied to some of your comments. Any chance you could have another look?

@jpountz

This comment has been minimized.

Show comment
Hide comment
@jpountz

jpountz Apr 29, 2015

Contributor

It looks to me like there are still some open TODOs about documentation but other than that LGTM. I pushed commits for some indentation issues I found.

Contributor

jpountz commented Apr 29, 2015

It looks to me like there are still some open TODOs about documentation but other than that LGTM. I pushed commits for some indentation issues I found.

@polyfractal

This comment has been minimized.

Show comment
Hide comment
@polyfractal

polyfractal Apr 29, 2015

Member

I'm working on the larger doc TODOs (e.g. centralize docs about _count, buckets_path and path syntax, gap_policy, etc). I don't think it should block merging this though, especially since I'm restructuring part of the aggs docs as a whole and want some eyeballs...don't want to hold up this PR.

Member

polyfractal commented Apr 29, 2015

I'm working on the larger doc TODOs (e.g. centralize docs about _count, buckets_path and path syntax, gap_policy, etc). I don't think it should block merging this though, especially since I'm restructuring part of the aggs docs as a whole and want some eyeballs...don't want to hold up this PR.

colings86 added some commits Apr 29, 2015

Merge branch 'master' into feature/aggs_2_0
# Conflicts:
#	src/main/java/org/elasticsearch/index/query/CommonTermsQueryBuilder.java
#	src/main/java/org/elasticsearch/search/aggregations/AggregationModule.java
#	src/main/java/org/elasticsearch/search/aggregations/AggregatorFactories.java
#	src/main/java/org/elasticsearch/search/aggregations/AggregatorParsers.java
#	src/main/java/org/elasticsearch/search/aggregations/InternalMultiBucketAggregation.java
#	src/main/java/org/elasticsearch/search/aggregations/bucket/nested/NestedAggregator.java
#	src/main/java/org/elasticsearch/search/aggregations/metrics/InternalNumericMetricsAggregation.java
#	src/test/java/org/elasticsearch/search/aggregations/bucket/nested/NestedAggregatorTest.java
Muted intermittently failing tests
To reproduce the failures use `-Dtests.seed=D9EF60095522804F`
Merge branch 'master' into feature/aggs_2_0
# Conflicts:
#	src/main/java/org/elasticsearch/search/builder/SearchSourceBuilder.java

@colings86 colings86 merged commit 0589adb into master Apr 29, 2015

@kevinkluge kevinkluge removed the review label Apr 29, 2015

@colings86 colings86 deleted the feature/aggs_2_0 branch May 19, 2015

@clintongormley clintongormley changed the title from Ability to perform computations on aggregations to Pipeline aggregations: Ability to perform computations on aggregations Jun 6, 2015

@eskibars eskibars referenced this pull request Aug 27, 2015

Open

Histogram Forecast #1381

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment