Introduce incremental reduction of TopDocs #23946

s1monw · 2017-04-06T15:13:50Z

This commit adds support for incremental top N reduction if the number of
expected shards in the search request is high enough. The changes here
also clean up more code in SearchPhaseController to make the separation
between values that are the same on each search result and values that
are per response. The reduced search phase result doesn't hold an arbitrary
result to obtain values like from, size or sort values which is now
cleanly encapsulated.

This commit adds support for incremental top N reduction if the number of expected shards in the search request is high enough. The changes here also clean up more code in SearchPhaseController to make the separation between values that are the same on each search result and values that are per response. The reduced search phase result doesn't hold an arbitrary result to obtain values like `from`, `size` or sort values which is now cleanly encapsulated.

s1monw · 2017-04-06T15:15:57Z

core/src/test/java/org/elasticsearch/search/SearchServiceTests.java

+        try {
+            // the search context should inherit the default timeout
+            assertThat(contextWithDefaultTimeout.timeout(), equalTo(TimeValue.timeValueSeconds(5)));
+        } finally {


these tests where annoyingly slow since they waited for timeouts since shards where still locked - this shaved 10 seconds off the test

jimczi

And that's how you incrementally introduced the incremental reduction of search results !
Looks great !

jimczi · 2017-04-06T23:14:47Z

core/src/main/java/org/elasticsearch/action/search/SearchPhaseController.java

-     * @param bufferdAggs a list of pre-collected / buffered aggregations. if this list is non-null all aggregations have been consumed
+     * @param bufferedAggs a list of pre-collected / buffered aggregations. if this list is non-null all aggregations have been consumed
+     *                    from all non-null query results.
+     * @param bufferedAggs a list of pre-collected / buffered top docs. if this list is non-null all top docs have been consumed


nit: bufferedTopDocs

jimczi · 2017-04-06T23:16:21Z

core/src/main/java/org/elasticsearch/action/search/SearchPhaseController.java

+        // the top docs sort fields used to sort the score docs, <code>null</code> if the results are not sorted
+        final SortField[] sortField;
+        // <code>true</code> iff the result score docs is sorted
+        final boolean isSorted;


nit: hasFields ? score docs are always sorted ;)

I renamed it to isSortedByField since this is really what it is :)

jpountz

I left some comments, mostly about readability but I like the change in general. In many cases that you get many shard results, I think most shard results would be empty so I'm wondering whether we should optimize for that case (in another PR).

jpountz · 2017-04-07T09:25:55Z

core/src/main/java/org/elasticsearch/action/search/SearchPhaseController.java

-        if (size != -1) {
-            final ScoreDoc[] mergedScoreDocs = mergeTopDocs(topDocs, size, ignoreFrom ? 0 : from);
+        final boolean hasNoHits = groupedCompletionSuggestions.isEmpty() && topDocs.isEmpty();
+        if (hasNoHits == false) {


can we avoid the double negation that makes things a bit harder to read by calling the var hasHits?

jpountz · 2017-04-07T09:27:27Z

core/src/main/java/org/elasticsearch/action/search/SearchPhaseController.java

        if (results.isEmpty()) {
-            return EMPTY_DOCS;
+            return null;


out of curiosity, why was it an issue to return EMPTY_DOCS?

because the return value is again TopDocs and not ScoreDocs[] so private static final ScoreDoc[] EMPTY_DOCS = new ScoreDoc[0]; wouldn't cut it

jpountz · 2017-04-07T09:28:32Z

core/src/main/java/org/elasticsearch/action/search/SearchPhaseController.java

-                                                     List<InternalAggregations> bufferdAggs, int numReducePhases) {
+                                                    List<InternalAggregations> bufferedAggs,
+                                                    List<TopDocs> bufferedTopDocs, TopDocsStats topDocsStats, int numReducePhases,
+                                                boolean isScrollRequest) {


can you fix the indentation here? all parameters do not seem to start on the same column

jpountz · 2017-04-07T13:06:08Z

core/src/main/java/org/elasticsearch/action/search/SearchPhaseController.java

@@ -204,23 +209,35 @@ private static long optionalSum(long left, long right) {
                    }
                }
            }
-            return scoreDocs;
+            final boolean isSorted;


is it what is called isSortedByField elsewhere?

jpountz · 2017-04-07T13:06:49Z

core/src/main/java/org/elasticsearch/action/search/SearchPhaseController.java

+    static class SortedTopDocs {
+        static final SortedTopDocs EMPTY = new SortedTopDocs(EMPTY_DOCS, false, null);
+        final ScoreDoc[] scoreDocs;
+        final boolean sorted;


is it what is called isSortedByField elsewhere?

s1monw · 2017-04-07T16:11:28Z

@jpountz I pushed a new commit

s1monw · 2017-04-07T18:45:59Z

In many cases that you get many shard results, I think most shard results would be empty so I'm wondering whether we should optimize for that case (in another PR).

can you elaborate on this a bit I am not sure I am following.

jpountz · 2017-04-10T07:39:40Z

I was just thinking about the fact that adding more reductions can make results less accurate eg. for terms aggregations. Yet, I expect that incremental reduction would be especially useful in cases that you have time-based indices (eg. one per day). But if you set a date filter that only matches documents on a limited number of indices (eg. last 30 days while you have 1 year worth of data), most shard responses would be empty? So it is a bit disappointing to trade accuracy because of shard responses that do not contribute anything to the final (merged) response?

s1monw · 2017-04-10T07:42:55Z

So it is a bit disappointing to trade accuracy because of shard responses that do not contribute anything to the final (merged) response?

oh I see what you mean, today if we get a result we don't check if it had any hits at all and in such a case we can just skip it (not buffer it). Is it that what you mean? that is a low hanging fruit I guess...

This commit adds support for incremental top N reduction if the number of expected shards in the search request is high enough. The changes here also clean up more code in SearchPhaseController to make the separation between values that are the same on each search result and values that are per response. The reduced search phase result doesn't hold an arbitrary result to obtain values like `from`, `size` or sort values which is now cleanly encapsulated.

jpountz · 2017-04-10T07:47:11Z

Is it that what you mean?

Yes. I'm wondering there might be issues with the min_doc_count:0 option since it an empty result set could still create non-empty aggregations but hopefully we'll find a way to work around this.

ViggoC · 2021-05-27T10:03:27Z

core/src/main/java/org/elasticsearch/action/search/SearchPhaseController.java

+        final boolean hasTopDocs = source == null || source.size() != 0;
+
+        if (isScrollRequest == false && (hasAggs || hasTopDocs)) {
+            // no incremental reduce if scroll is used - we only hit a single shard or sometimes more...


@s1monw Would you mind explaining why cannot use incremental reduce if scroll is used. This confuses me.

s1monw added :Search/Search Search-related issues that do not fall into other categories >enhancement review v5.4.0 v6.0.0-alpha1 labels Apr 6, 2017

s1monw requested review from jpountz and jimczi April 6, 2017 15:13

s1monw commented Apr 6, 2017

View reviewed changes

jimczi approved these changes Apr 7, 2017

View reviewed changes

apply feedback

eff8f11

jpountz reviewed Apr 7, 2017

View reviewed changes

apply feedback from @jpountz

fe072ac

jpountz approved these changes Apr 10, 2017

View reviewed changes

s1monw merged commit 1f40f8a into elastic:master Apr 10, 2017

s1monw deleted the incrementally_reduce_top_n branch April 10, 2017 07:37

skearns64 mentioned this pull request Apr 10, 2017

Index pattern creation UX: Remove UI for creating index patterns based on event times elastic/kibana#10443

Closed

epixa mentioned this pull request Apr 11, 2017

Make the field_stats behavior for expanding indices not default for new index patterns elastic/kibana#11150

Closed

jpountz mentioned this pull request Apr 14, 2017

Build Field Stats-based Resolution into Search API #24057

Closed

clintongormley added >feature release highlight and removed >enhancement labels Apr 28, 2017

IdanWo mentioned this pull request Jul 21, 2017

Remove support for sorting terms aggregation by ascending count #17614

Closed

ViggoC reviewed May 27, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce incremental reduction of TopDocs #23946

Introduce incremental reduction of TopDocs #23946

s1monw commented Apr 6, 2017

s1monw Apr 6, 2017

jimczi left a comment

jimczi Apr 6, 2017

jimczi Apr 6, 2017

s1monw Apr 7, 2017

jpountz left a comment

jpountz Apr 7, 2017

jpountz Apr 7, 2017

s1monw Apr 7, 2017

jpountz Apr 7, 2017

jpountz Apr 7, 2017

jpountz Apr 7, 2017

s1monw commented Apr 7, 2017

s1monw commented Apr 7, 2017

jpountz commented Apr 10, 2017

s1monw commented Apr 10, 2017

jpountz commented Apr 10, 2017

ViggoC May 27, 2021

Introduce incremental reduction of TopDocs #23946

Introduce incremental reduction of TopDocs #23946

Conversation

s1monw commented Apr 6, 2017

Choose a reason for hiding this comment

jimczi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpountz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s1monw commented Apr 7, 2017

s1monw commented Apr 7, 2017

jpountz commented Apr 10, 2017

s1monw commented Apr 10, 2017

jpountz commented Apr 10, 2017

Choose a reason for hiding this comment