Always rewrite search shard request outside of the search thread pool #51708

jimczi · 2020-01-30T21:55:14Z

This change ensures that the rewrite of the shard request is executed in the network thread or in the refresh listener when waiting for an active shard. This allows queries that rewrite to match_no_docs to bypass the search thread pool entirely even if the can_match phase was skipped (pre_filter_shard_size > number of shards).

Coordinating nodes don't have the ability to create empty responses so this change also ensures that at least one shard creates a full empty response while the other can return null ones. This is needed since creating true empty responses on shards require to create concrete aggregators which would be too costly to build on a network thread. We should move this functionality to aggregation builders in a follow up but that would be a much bigger change.

This change is also important for #49601 since we want to add the ability to use the result of other shards to rewrite the request of subsequent ones. For instance if the first M shards have their top N computed, the top worst document in the global queue can be pass to subsequent shards that can then rewrite to match_no_docs if they can guarantee that they don't have any document better than the provided one.

This change ensures that the rewrite of the shard request is executed in the network thread or in the refresh listener when waiting for an active shard. This allows queries that rewrite to match_no_docs to bypass the search thread pool entirely even if the can_match phase was skipped (pre_filter_shard_size > number of shards). Coordinating nodes don't have the ability to create empty responses so this change also ensures that at least one shard creates a full empty response while the other can return null ones. This is needed since creating true empty responses on shards require to create concrete aggregators which would be too costly to build on a network thread. We should move this functionality to aggregation builders in a follow up but that would be a much bigger change. This change is also important for elastic#49601 since we want to add the ability to use the result of other shards to rewrite the request of subsequent ones. For instance if the first M shards have their top N computed, the top worst document in the global queue can be pass to subsequent shards that can then rewrite to match_no_docs if they can guarantee that they don't have any document better than the provided one.

elasticmachine · 2020-01-30T21:55:17Z

Pinging @elastic/es-search (:Search/Search)

elasticmachine · 2020-01-30T21:55:18Z

Pinging @elastic/es-distributed (:Distributed/Engine)

dnhatn

I understand this PR, and it looks great. Thanks, Jim. I left a question to discuss.

server/src/main/java/org/elasticsearch/search/SearchService.java

server/src/main/java/org/elasticsearch/action/search/AbstractSearchAsyncAction.java

server/src/main/java/org/elasticsearch/search/internal/ShardSearchRequest.java

dnhatn

Two minor nits, LGTM.

server/src/main/java/org/elasticsearch/action/search/AbstractSearchAsyncAction.java

x-pack/plugin/frozen-indices/src/test/java/org/elasticsearch/index/engine/FrozenIndexTests.java

jpountz

I believe this change could help a lot in certain cases.

jimczi · 2020-02-06T07:55:36Z

Thanks @dnhatn and @jpountz !

…elastic#51708) This change ensures that the rewrite of the shard request is executed in the network thread or in the refresh listener when waiting for an active shard. This allows queries that rewrite to match_no_docs to bypass the search thread pool entirely even if the can_match phase was skipped (pre_filter_shard_size > number of shards). Coordinating nodes don't have the ability to create empty responses so this change also ensures that at least one shard creates a full empty response while the other can return null ones. This is needed since creating true empty responses on shards require to create concrete aggregators which would be too costly to build on a network thread. We should move this functionality to aggregation builders in a follow up but that would be a much bigger change. This change is also important for elastic#49601 since we want to add the ability to use the result of other shards to rewrite the request of subsequent ones. For instance if the first M shards have their top N computed, the top worst document in the global queue can be pass to subsequent shards that can then rewrite to match_no_docs if they can guarantee that they don't have any document better than the provided one.

…#51708) (#51979) This change ensures that the rewrite of the shard request is executed in the network thread or in the refresh listener when waiting for an active shard. This allows queries that rewrite to match_no_docs to bypass the search thread pool entirely even if the can_match phase was skipped (pre_filter_shard_size > number of shards). Coordinating nodes don't have the ability to create empty responses so this change also ensures that at least one shard creates a full empty response while the other can return null ones. This is needed since creating true empty responses on shards require to create concrete aggregators which would be too costly to build on a network thread. We should move this functionality to aggregation builders in a follow up but that would be a much bigger change. This change is also important for #49601 since we want to add the ability to use the result of other shards to rewrite the request of subsequent ones. For instance if the first M shards have their top N computed, the top worst document in the global queue can be pass to subsequent shards that can then rewrite to match_no_docs if they can guarantee that they don't have any document better than the provided one.

Relates #51708

We need to either exclude null responses from the scroll search response or always create a search context for every target shards, although that scroll query can be written to match_no_docs. Otherwise, we won't find search_context for subsequent scroll requests. This commit implements the latter option as it's less error-prone. Relates #51708

We might leak a searcher if the target shard is removed (i.e., its index is deleted) or relocated while we are creating a SearchContext from a SearchRewriteContext. Relates #51708 Closes #52021 I labelled this non-issue for an unreleased bug introduced in #51708.

This commit, built on top of #51708, allows to modify shard search requests based on informations collected on other shards. It is intended to speed up sorted queries on time-based indices. For queries that are only interested in the top documents. This change will rewrite the shard queries to match none if the bottom sort value computed in prior shards is better than all values in the shard. For queries that mix top documents and aggregations this change will reset the size of the top documents to 0 instead of rewriting to match none. This means that we don't need to keep a search context open for this shard since we know in advance that it doesn't contain any competitive hit.

Field level security was interacting in bad ways with the can-match phase on frozen tier shards (interaction between FieldSubsetReader and RewriteCachingDirectoryReader). This made can-match phase fail, which in the normal case would result in extra load on the frozen tier, and in the extreme case (in interaction with #51708) made searches fail. This is a bug that was indirectly introduced by #78988. Closes #82044

Field level security was interacting in bad ways with the can-match phase on frozen tier shards (interaction between FieldSubsetReader and RewriteCachingDirectoryReader). This made can-match phase fail, which in the normal case would result in extra load on the frozen tier, and in the extreme case (in interaction with elastic#51708) made searches fail. This is a bug that was indirectly introduced by elastic#78988. Closes elastic#82044

Field level security was interacting in bad ways with the can-match phase on frozen tier shards (interaction between FieldSubsetReader and RewriteCachingDirectoryReader). This made can-match phase fail, which in the normal case would result in extra load on the frozen tier, and in the extreme case (in interaction with #51708) made searches fail. This is a bug that was indirectly introduced by #78988. Closes #82044

Field level security was interacting in bad ways with the can-match phase on frozen tier shards (interaction between FieldSubsetReader and RewriteCachingDirectoryReader). This made can-match phase fail, which in the normal case would result in extra load on the frozen tier, and in the extreme case (in interaction with #51708) made searches fail. This is a bug that was indirectly introduced by #78988. Closes #82044 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

jimczi added >enhancement :Search/Search Search-related issues that do not fall into other categories :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. v8.0.0 v7.7.0 labels Jan 30, 2020

jimczi requested review from jpountz and dnhatn January 30, 2020 21:55

jimczi added 4 commits January 30, 2020 23:03

add serialization test

534b552

iter

c02f352

Merge branch 'master' into rewrite_shard_request_no_rejection

010ec08

fix bwc issue

f5684ec

dnhatn reviewed Feb 3, 2020

View reviewed changes

jimczi added 6 commits February 3, 2020 11:17

address review

0acf244

adapt test

6016fa4

fix test

a058127

fix topNSize when size is reset to 0

8534ed2

add more comments

27cdf19

Merge branch 'master' into rewrite_shard_request_no_rejection

a313d1d

dnhatn approved these changes Feb 3, 2020

View reviewed changes

server/src/main/java/org/elasticsearch/action/search/AbstractSearchAsyncAction.java Outdated Show resolved Hide resolved

x-pack/plugin/frozen-indices/src/test/java/org/elasticsearch/index/engine/FrozenIndexTests.java Outdated Show resolved Hide resolved

more review

662972c

jimczi mentioned this pull request Feb 4, 2020

Shortcut query phase using the results of other shards #51852

Merged

jpountz approved these changes Feb 5, 2020

View reviewed changes

jimczi merged commit eb69c6f into elastic:master Feb 6, 2020

jimczi deleted the rewrite_shard_request_no_rejection branch February 6, 2020 07:55

jimczi mentioned this pull request Feb 6, 2020

Always rewrite search shard request outside of the search thread pool #51979

Merged

jimczi added a commit that referenced this pull request Feb 6, 2020

Adapt version after backport

b471532

Relates #51708

jimczi added a commit that referenced this pull request Feb 6, 2020

Adapt missing version after backport (bis)

0494bfa

Relates #51708

jimczi added a commit that referenced this pull request Feb 6, 2020

Adapt missing version after backport (bis)

7923833

Relates #51708

droberts195 mentioned this pull request Feb 7, 2020

[CI] MachineLearningLicensingTests testMachineLearningStopDatafeedActionNotRestricted failing with shardIndex is already set #52030

Closed

dnhatn mentioned this pull request Feb 7, 2020

Always create search context for scroll queries #52078

Merged

DaveCTurner mentioned this pull request Feb 8, 2020

[CI] Multiple tests failing with "some shards are still open" error #52021

Closed

dnhatn mentioned this pull request Feb 8, 2020

Fix leaking searcher when shards are removed or relocated #52099

Merged

dnhatn mentioned this pull request Feb 12, 2020

[CI] XPackRestIT p0=indices.freeze/10_basic/Basic L74 Failure #52209

Closed

jimczi mentioned this pull request Mar 13, 2020

Change pre_filter_shard_size default to 1 for frozen index searches #39835

Closed

ywelsch mentioned this pull request Mar 23, 2020

Searcher acquired on transport thread #53985

Closed

jimczi mentioned this pull request Mar 24, 2020

Avoid I/O operations when rewriting shard search request #54044

Merged

codebrain mentioned this pull request Apr 1, 2020

7.7.0 meta ticket (Part 2) elastic/elasticsearch-net#4533

Closed

jimczi mentioned this pull request Jun 10, 2020

Fix possible NPE on search phase failure #57952

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

This was referenced Jan 13, 2022

Document Field Level Security on Frozen Tier not working correctly #82044

Closed

Fix FLS for frozen tier #82521

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always rewrite search shard request outside of the search thread pool #51708

Always rewrite search shard request outside of the search thread pool #51708

jimczi commented Jan 30, 2020

elasticmachine commented Jan 30, 2020

elasticmachine commented Jan 30, 2020

dnhatn left a comment

dnhatn left a comment

jpountz left a comment

jimczi commented Feb 6, 2020

Always rewrite search shard request outside of the search thread pool #51708

Always rewrite search shard request outside of the search thread pool #51708

Conversation

jimczi commented Jan 30, 2020

elasticmachine commented Jan 30, 2020

elasticmachine commented Jan 30, 2020

dnhatn left a comment

Choose a reason for hiding this comment

dnhatn left a comment

Choose a reason for hiding this comment

jpountz left a comment

Choose a reason for hiding this comment

jimczi commented Feb 6, 2020