Skip to content

Commit

Permalink
Add heuristics to compute pre_filter_shard_size when unspecified (ela…
Browse files Browse the repository at this point in the history
…stic#53873)

This commit changes the pre_filter_shard_size default from 128 to unspecified.
This allows to apply heuristics based on the request and the target indices when deciding
whether the can match phase should run or not. When unspecified, this pr runs the can match phase
automatically if one of these conditions is met:
  * The request targets more than 128 shards.
  * The request contains read-only indices.
  * The primary sort of the query targets an indexed field.
Users can opt-out from this behavior by setting the `pre_filter_shard_size` to a static value.

Closes elastic#39835
  • Loading branch information
jimczi committed Mar 23, 2020
1 parent 965af3a commit 09853ba
Show file tree
Hide file tree
Showing 14 changed files with 290 additions and 129 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -426,7 +426,9 @@ static void addSearchRequestParams(Params params, SearchRequest searchRequest) {
params.withIndicesOptions(searchRequest.indicesOptions());
params.withSearchType(searchRequest.searchType().name().toLowerCase(Locale.ROOT));
params.putParam("ccs_minimize_roundtrips", Boolean.toString(searchRequest.isCcsMinimizeRoundtrips()));
params.putParam("pre_filter_shard_size", Integer.toString(searchRequest.getPreFilterShardSize()));
if (searchRequest.getPreFilterShardSize() != null) {
params.putParam("pre_filter_shard_size", Integer.toString(searchRequest.getPreFilterShardSize()));
}
params.withMaxConcurrentShardRequests(searchRequest.getMaxConcurrentShardRequests());
if (searchRequest.requestCache() != null) {
params.withRequestCache(searchRequest.requestCache());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2043,7 +2043,9 @@ private static void setRandomSearchParams(SearchRequest searchRequest,
if (randomBoolean()) {
searchRequest.setPreFilterShardSize(randomIntBetween(2, Integer.MAX_VALUE));
}
expectedParams.put("pre_filter_shard_size", Integer.toString(searchRequest.getPreFilterShardSize()));
if (searchRequest.getPreFilterShardSize() != null) {
expectedParams.put("pre_filter_shard_size", Integer.toString(searchRequest.getPreFilterShardSize()));
}
}

public static void setRandomIndicesOptions(Consumer<IndicesOptions> setter, Supplier<IndicesOptions> getter,
Expand Down
13 changes: 2 additions & 11 deletions docs/reference/frozen-indices.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,8 @@ POST /twitter/_forcemerge?max_num_segments=1
== Searching a frozen index

Frozen indices are throttled in order to limit memory consumptions per node. The number of concurrently loaded frozen indices per node is
limited by the number of threads in the <<search-throttled,search_throttled>> threadpool, which is `1` by default.
Search requests will not be executed against frozen indices by default, even if a frozen index is named explicitly. This is
limited by the number of threads in the <<search-throttled,search_throttled>> threadpool, which is `1` by default.
Search requests will not be executed against frozen indices by default, even if a frozen index is named explicitly. This is
to prevent accidental slowdowns by targeting a frozen index by mistake. To include frozen indices a search request must be executed with
the query parameter `ignore_throttled=false`.

Expand All @@ -85,15 +85,6 @@ GET /twitter/_search?q=user:kimchy&ignore_throttled=false
--------------------------------------------------
// TEST[setup:twitter]

[IMPORTANT]
================================
While frozen indices are slow to search, they can be pre-filtered efficiently. The request parameter `pre_filter_shard_size` specifies
a threshold that, when exceeded, will enforce a round-trip to pre-filter search shards that cannot possibly match.
This filter phase can limit the number of shards significantly. For instance, if a date range filter is applied, then all indices (frozen or unfrozen) that do not contain documents within the date range can be skipped efficiently.
The default value for `pre_filter_shard_size` is `128` but it's recommended to set it to `1` when searching frozen indices. There is no
significant overhead associated with this pre-filter phase.
================================

[role="xpack"]
[testenv="basic"]
[[monitoring_frozen_indices]]
Expand Down
37 changes: 21 additions & 16 deletions docs/reference/search/multi-search.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ GET twitter/_msearch
==== {api-description-title}

The multi search API executes several searches from a single API request.
The format of the request is similar to the bulk API format and makes use
The format of the request is similar to the bulk API format and makes use
of the newline delimited JSON (NDJSON) format.

The structure is as follows:
Expand Down Expand Up @@ -85,7 +85,7 @@ Maximum number of concurrent searches the multi search API can execute.
--
(Optional, integer)
Maximum number of concurrent shard requests that each sub-search request
executes per node. Defaults to `5`.
executes per node. Defaults to `5`.

You can use this parameter to prevent a request from overloading a cluster. For
example, a default request hits all indices in a cluster. This could cause shard
Expand All @@ -103,8 +103,13 @@ Defines a threshold that enforces a pre-filter roundtrip to prefilter search
shards based on query rewriting if the number of shards the search request
expands to exceeds the threshold. This filter roundtrip can limit the number of
shards significantly if for instance a shard can not match any documents based
on it's rewrite method i.e., if date filters are mandatory to match but the
shard bounds and the query are disjoint. Defaults to `128`.
on its rewrite method i.e., if date filters are mandatory to match but the
shard bounds and the query are disjoint.
When unspecified, the pre-filter phase is executed if any of these
conditions is met:
- The request targets more than `128` shards.
- The request targets one or more read-only index.
- The primary sort of the query targets an indexed field.

`rest_total_hits_as_int`::
(Optional, boolean)
Expand All @@ -121,7 +126,7 @@ to a specific shard.
--
(Optional, string)
Indicates whether global term and document frequencies should be used when
scoring returned documents.
scoring returned documents.

Options are:

Expand All @@ -134,7 +139,7 @@ This is usually faster but less accurate.
Documents are scored using global term and document frequencies across all
shards. This is usually slower but more accurate.
--

`typed_keys`::
(Optional, boolean)
Specifies whether aggregation and suggester names should be prefixed by their
Expand Down Expand Up @@ -196,7 +201,7 @@ to a specific shard.
--
(Optional, string)
Indicates whether global term and document frequencies should be used when
scoring returned documents.
scoring returned documents.

Options are:

Expand Down Expand Up @@ -234,18 +239,18 @@ Number of hits to return. Defaults to `10`.
==== {api-response-body-title}

`responses`::
(array) Includes the search response and status code for each search request
matching its order in the original multi search request. If there was a
complete failure for a specific search request, an object with `error` message
and corresponding status code will be returned in place of the actual search
(array) Includes the search response and status code for each search request
matching its order in the original multi search request. If there was a
complete failure for a specific search request, an object with `error` message
and corresponding status code will be returned in place of the actual search
response.


[[search-multi-search-api-example]]
==== {api-examples-title}

The header part includes which index / indices to search on, the `search_type`,
`preference`, and `routing`. The body includes the typical search body request
The header part includes which index / indices to search on, the `search_type`,
`preference`, and `routing`. The body includes the typical search body request
(including the `query`, `aggregations`, `from`, `size`, and so on).

[source,js]
Expand Down Expand Up @@ -308,7 +313,7 @@ See <<url-access-control>>
==== Template support

Much like described in <<search-template>> for the _search resource, _msearch
also provides support for templates. Submit them like follows for inline
also provides support for templates. Submit them like follows for inline
templates:

[source,console]
Expand Down Expand Up @@ -377,6 +382,6 @@ GET _msearch/template
[[multi-search-partial-responses]]
==== Partial responses

To ensure fast responses, the multi search API will respond with partial results
if one or more shards fail. See <<shard-failures, Shard failures>> for more
To ensure fast responses, the multi search API will respond with partial results
if one or more shards fail. See <<shard-failures, Shard failures>> for more
information.
Loading

0 comments on commit 09853ba

Please sign in to comment.