New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize the composite aggregation for match_all and range queries #28745
Merged
Merged
Changes from 3 commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
d46143c
Optimize the composite aggregation for match_all and range queries
jimczi 6a8e866
fix checkstyle
jimczi 0bc679e
handle null point values
jimczi 0581226
restore global ord execution for normal execution
jimczi 8fd25e1
add missing change
jimczi 0d32ab0
fix checkstyle
jimczi 32f0904
fix global ord comparaison
jimczi 0634551
Merge branch 'master' into composite_sort_optim
jimczi a7f8ffe
add tests for the composite queue and address review comments
jimczi 35c017e
Merge branch 'master' into composite_sort_optim
jimczi 841118c
cosmetics
jimczi 6445f23
adapt heuristic to disable sorted docs producer
jimczi 4437f2e
protect against empty reader
jimczi cdba4c2
Add missing license
jimczi 93e3345
refactor the composite source to create the sorted docs producer dire…
jimczi cc6539c
Merge branch 'master' into composite_sort_optim
jimczi fc91434
fail composite agg that contains an unmapped field and no missing value
jimczi f9d1eeb
implement deferring collection directly in the collector
jimczi f2588f2
Merge branch 'master' into composite_sort_optim
jimczi eb61b02
line len
jimczi f22dd2a
Merge branch 'master' into composite_sort_optim
jimczi 8bf9703
more javadocs and cleanups
jimczi e58e540
make sure that the cost is within the integer range when building the…
jimczi 1d71d98
Merge branch 'master' into composite_sort_optim
jimczi File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
198 changes: 198 additions & 0 deletions
198
...icsearch/search/aggregations/bucket/composite/BestCompositeBucketsDeferringCollector.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,198 @@ | ||
/* | ||
* Licensed to Elasticsearch under one or more contributor | ||
* license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright | ||
* ownership. Elasticsearch licenses this file to you under | ||
* the Apache License, Version 2.0 (the "License"); you may | ||
* not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
|
||
package org.elasticsearch.search.aggregations.bucket.composite; | ||
|
||
import org.apache.lucene.index.LeafReaderContext; | ||
import org.apache.lucene.search.DocIdSet; | ||
import org.apache.lucene.search.DocIdSetIterator; | ||
import org.apache.lucene.search.Query; | ||
import org.apache.lucene.search.Scorer; | ||
import org.apache.lucene.search.Weight; | ||
import org.apache.lucene.util.DocIdSetBuilder; | ||
import org.apache.lucene.util.RoaringDocIdSet; | ||
import org.elasticsearch.search.aggregations.BucketCollector; | ||
import org.elasticsearch.search.aggregations.LeafBucketCollector; | ||
import org.elasticsearch.search.aggregations.bucket.DeferringBucketCollector; | ||
import org.elasticsearch.search.internal.SearchContext; | ||
|
||
import java.io.IOException; | ||
import java.util.ArrayList; | ||
import java.util.List; | ||
|
||
/** | ||
* A specialization of {@link DeferringBucketCollector} that collects all | ||
* matches and then, in a second pass, replays the documents that contain a top bucket | ||
* selected during the first pass. | ||
*/ | ||
final class BestCompositeBucketsDeferringCollector extends DeferringBucketCollector { | ||
private static class Entry { | ||
final LeafReaderContext context; | ||
final DocIdSet docIdSet; | ||
|
||
Entry(LeafReaderContext context, DocIdSet docIdSet) { | ||
this.context = context; | ||
this.docIdSet = docIdSet; | ||
} | ||
} | ||
|
||
private final SearchContext searchContext; | ||
private final CompositeValuesCollectorQueue queue; | ||
private final boolean isCollectionSorted; | ||
private final List<Entry> entries = new ArrayList<>(); | ||
|
||
private BucketCollector collector; | ||
private LeafReaderContext context; | ||
private RoaringDocIdSet.Builder sortedBuilder; | ||
private DocIdSetBuilder builder; | ||
private boolean finished = false; | ||
|
||
/** | ||
* Sole constructor. | ||
* @param context The search context. | ||
* @param queue The queue that is used to record the top composite buckets. | ||
* @param isCollectionSorted true if the parent aggregator will pass documents sorted by doc_id. | ||
*/ | ||
BestCompositeBucketsDeferringCollector(SearchContext context, CompositeValuesCollectorQueue queue, boolean isCollectionSorted) { | ||
this.searchContext = context; | ||
this.queue = queue; | ||
this.isCollectionSorted = isCollectionSorted; | ||
} | ||
|
||
@Override | ||
public boolean needsScores() { | ||
if (collector == null) { | ||
throw new IllegalStateException(); | ||
} | ||
return collector.needsScores(); | ||
} | ||
|
||
/** Set the deferred collectors. */ | ||
@Override | ||
public void setDeferredCollector(Iterable<BucketCollector> deferredCollectors) { | ||
this.collector = BucketCollector.wrap(deferredCollectors); | ||
} | ||
|
||
private void finishLeaf() { | ||
if (context != null) { | ||
DocIdSet docIdSet = isCollectionSorted ? sortedBuilder.build() : builder.build(); | ||
entries.add(new Entry(context, docIdSet)); | ||
} | ||
context = null; | ||
} | ||
|
||
@Override | ||
public LeafBucketCollector getLeafCollector(LeafReaderContext ctx) throws IOException { | ||
finishLeaf(); | ||
|
||
context = ctx; | ||
if (isCollectionSorted) { | ||
sortedBuilder = new RoaringDocIdSet.Builder(ctx.reader().maxDoc()); | ||
} else { | ||
builder = new DocIdSetBuilder(ctx.reader().maxDoc()); | ||
} | ||
|
||
return new LeafBucketCollector() { | ||
int lastDoc = -1; | ||
@Override | ||
public void collect(int doc, long bucket) throws IOException { | ||
if (lastDoc != doc) { | ||
if (isCollectionSorted) { | ||
sortedBuilder.add(doc); | ||
} else { | ||
builder.grow(1).add(doc); | ||
} | ||
} | ||
lastDoc = doc; | ||
} | ||
}; | ||
} | ||
|
||
@Override | ||
public void preCollection() throws IOException { | ||
collector.preCollection(); | ||
} | ||
|
||
@Override | ||
public void postCollection() throws IOException { | ||
finishLeaf(); | ||
finished = true; | ||
} | ||
|
||
@Override | ||
public void prepareSelectedBuckets(long... selectedBuckets) throws IOException { | ||
// the selected buckets are extracted directly from the queue | ||
assert selectedBuckets.length == 0; | ||
if (!finished) { | ||
throw new IllegalStateException("Cannot replay yet, collection is not finished: postCollect() has not been called"); | ||
} | ||
|
||
final boolean needsScores = needsScores(); | ||
Weight weight = null; | ||
if (needsScores) { | ||
Query query = searchContext.query(); | ||
weight = searchContext.searcher().createNormalizedWeight(query, true); | ||
} | ||
for (Entry entry : entries) { | ||
DocIdSetIterator docIdSetIterator = entry.docIdSet.iterator(); | ||
if (docIdSetIterator == null) { | ||
continue; | ||
} | ||
final LeafBucketCollector subCollector = collector.getLeafCollector(entry.context); | ||
final LeafBucketCollector collector = | ||
queue.getLeafCollector(entry.context, getSecondPassCollector(subCollector)); | ||
DocIdSetIterator scorerIt = null; | ||
if (needsScores) { | ||
Scorer scorer = weight.scorer(entry.context); | ||
// We don't need to check if the scorer is null | ||
// since we are sure that there are documents to replay (docIdSetIterator it not empty). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how do we know it is not empty? |
||
scorerIt = scorer.iterator(); | ||
subCollector.setScorer(scorer); | ||
} | ||
int docID; | ||
while ((docID = docIdSetIterator.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { | ||
if (needsScores) { | ||
assert scorerIt.docID() < docID; | ||
scorerIt.advance(docID); | ||
// aggregations should only be replayed on matching documents | ||
assert scorerIt.docID() == docID; | ||
} | ||
collector.collect(docID); | ||
} | ||
} | ||
collector.postCollection(); | ||
} | ||
|
||
/** | ||
* Replay the top buckets from the matching documents. | ||
*/ | ||
private LeafBucketCollector getSecondPassCollector(LeafBucketCollector subCollector) { | ||
return new LeafBucketCollector() { | ||
@Override | ||
public void collect(int doc, long bucket) throws IOException { | ||
Integer slot = queue.compareCurrent(); | ||
if (slot != null) { | ||
// The candidate key is a top bucket. | ||
// We can defer the collection of this document/bucket to the sub collector | ||
subCollector.collect(doc, slot); | ||
} | ||
} | ||
}; | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
36 changes: 0 additions & 36 deletions
36
...va/org/elasticsearch/search/aggregations/bucket/composite/CompositeAggregationPlugin.java
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this should really never occur? Should we make this an assertion instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++, I replaced it with an assert.