New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip DisiPriorityQueue on single filter agg #99215
Conversation
The iterator is used to combine filtering with querying in leaf collection. Its benefit is that rangers with docs that are filtered out by all filters are skipped from doc collection. The competitive iterator is restricted to FiltersAggregator, not used in FilterByFilterAggregator that's already optimized. It only applies to top-level filter aggregations with no "other" bucket defined; the latter leads to collecting all docs so there's no point in skipping doc ranges. Fixes elastic#97544
# Conflicts: # server/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/FiltersAggregator.java
When FiltersAggregator has a single filter, there is no benefit in using a DisiPriorityQueue as the heap will only contain values from a single iterator. In such a case, it's preferable to use the filtering approximation iterator directly as competitive iterator. Fixes elastic#99202
# Conflicts: # server/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/FiltersAggregator.java
Hi @kkrik-es, I've created a changelog YAML for you. |
Rally results on track [noaa], challenge [filter-aggs] look promising:
|
@elasticsearchmachine run elasticsearch-ci/part-1 |
@elasticsearchmachine run elasticsearch-ci/part-2 |
Pinging @elastic/es-analytics-geo (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woohoo, the speedup is greater than I expected, nice! It looks good, I just left two minor suggestions.
} | ||
return (lastMatchingDoc == doc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking out loud, we could change this logic to have a single comparison in the (not uncommon) case when twoPhaseView
is null
:
boolean checkDocForMatch(int doc) throws IOException {
if (twoPhaseView == null) {
return true;
}
if (lastCheckedTwoPhaseDoc < doc) {
lastCheckedTwoPhaseDoc = doc;
if (twoPhaseView.matches()) {
lastMatchingTwoPhaseDoc = doc;
}
}
return lastMatchingTwoPhaseDoc == doc;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I missed this. I created separate subclasses to avoid the branch - Martijn suggested this is hot code.
SingleFilterLeafCollector( | ||
LeafBucketCollector sub, | ||
LeafReaderContext leafReaderContext, | ||
QueryToFilterAdapter filter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should convert QueryToFilterAdapter
objects to Scorer
s up-front. This way we could also use the Single*
implementation if there are multiple filters but only one of them has matches on the segment, and we could use the Multi*
implementation when no filters have matches on the segments, which would in-turn help remove the filterIterator != null
check in SingleFilterLeafCollector#collect
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this indeed can improve performance in more cases, while the code is cleaner since more logic is moved outside the subclasses.
* Use a competitive iterator in FiltersAggregator. The iterator is used to combine filtering with querying in leaf collection. Its benefit is that rangers with docs that are filtered out by all filters are skipped from doc collection. The competitive iterator is restricted to FiltersAggregator, not used in FilterByFilterAggregator that's already optimized. It only applies to top-level filter aggregations with no "other" bucket defined; the latter leads to collecting all docs so there's no point in skipping doc ranges. Fixes elastic#97544 * Fix function name. * Advance iterator on two-phase mismatch. * Restore docid tracking. * Fix failing tests. * Fix failing test. * Fix more tests. * Update docs/changelog/98360.yaml * More test fixes. * Update docs/changelog/98360.yaml * Skip checking useCompetitiveIterator in collect * Find approximate matches in CompetitiveIterator * Use DisiPriorityQueue to simplify FiltersAggregator * Skip competitive iterator when all docs match. * Check for empty priority queue. * Skip DisiPriorityQueue on single filter agg. When FiltersAggregator has a single filter, there is no benefit in using a DisiPriorityQueue as the heap will only contain values from a single iterator. In such a case, it's preferable to use the filtering approximation iterator directly as competitive iterator. Fixes elastic#99202 * Update docs/changelog/99215.yaml * Use FilterMatchingDisiWrapper in leaf collectors.
When FiltersAggregator has a single filter, there is no benefit in using
a DisiPriorityQueue as the heap will only contain values from a single
iterator. In such a case, it's preferable to use the filtering
approximation iterator directly as competitive iterator.
Fixes #99202