Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change pre_filter_shard_size default to 1 for frozen index searches #39835

Closed
timroes opened this issue Mar 8, 2019 · 5 comments · Fixed by #53873
Closed

Change pre_filter_shard_size default to 1 for frozen index searches #39835

timroes opened this issue Mar 8, 2019 · 5 comments · Fixed by #53873
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories

Comments

@timroes
Copy link
Contributor

timroes commented Mar 8, 2019

We have currently an open issue in Kibana (elastic/kibana#32742) about setting pre_filter_shard_size to 1 for all requests in case you want to query frozen indexes. This caused a couple of questions I am copying over from the other issue:

As far as I understand the documentation, the pre-filtering phase does not have any "significant overhead". Why is this value 128 by default, and not by 1? If there is no significant overhead, but we know that especially when including frozen index, we'll gain a lot of performance benefit, what's the reason of not enabling that phase by default? Since we would set this to 1 in every request (since we don't know if it includes a frozen index or not), once you enabled querying frozen indexes in the advanced setting, I would really like to understand the drawbacks of setting this to 1.

Would it potentially make more sense, setting this by default to 1 on the Elasticsearch side once ignore_throttled=false is set. Or maybe ES would even be possible to determine if an frozen index will be hit by a query and then using that appropriate value? Or (much of the question on top): Could that potentially be 1 for all requests, what are the drawbacks there?

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@javanna javanna self-assigned this May 24, 2019
@javanna javanna removed the stalled label May 24, 2019
javanna added a commit to javanna/elasticsearch that referenced this issue May 24, 2019
When a search on some indices may take a long time, it may cause problems to other indices that are being searched as part of the same search request, because the search context needs to stay open for a long time, in case the faster indices are being written to. This is especially a problem when searching against throttled and non-throttled indices as part of the same request.

This commit splits the search in two sub-searches in this case: one for throttled indices, and one for non-throttled indices. This way the two don't interfere with each other.

Also, the sub-search against the throttled indices can have pre_filter_shard_size set to 1 automatically, which is what we currently recommend our users to do.

Closes elastic#39835
Closes elastic#40900
javanna added a commit to javanna/elasticsearch that referenced this issue Jun 19, 2019
Now that we split the search execution in two whenever searching read-only and
write indices as part of the same request (see elastic#42510), we can also automatically
set `pre_filter_shard_size` to the appropriate value whenever not explicitly
provided: `1` for readonly indices, and `128` (like before this change) for write
indices.

Note that we may still end up searching write and readonly indices as part of the
same search execution, for instance when a scroll is provided or size is set to `0`,
in which case we set `pre_filter_shard_size` to `128` when not explicitly set.

Closes elastic#39835
@javanna javanna removed their assignment Feb 28, 2020
@jimczi
Copy link
Contributor

jimczi commented Mar 13, 2020

Since we reverted #42510 I think it makes sense to re-evaluate a simple solution to automatically execute the can match phase despite the default. This shouldn't be an issue going forward in Kibana since they plan to use the new async search which sets the pre_filter_shard_size to 1 by default but we should make the change in blocking _search nevertheless. We've improved the handling of queries on frozen indices that didn't run the can match phase by avoiding the usage of the throttled search thread pool on shards that cannot match the date range filter so it shouldn't be required to run the can match phase for these indices anymore. However the can_match_phase is now also used to pre-sort shards on sorted queries so we could automatically run this phase if it can optimize the query phase significantly.
@javanna WDYT ?

@javanna
Copy link
Member

javanna commented Mar 13, 2020

heya @jimczi what do you mean with "automatically execute the can match phase despite the default"?

@jimczi
Copy link
Contributor

jimczi commented Mar 13, 2020

Defaulting to a static value of 128 is not flexible enough so today we require our users to set this value correctly. What I meant is that we should consider the default (pre_filter_shard_size is not present in the request) dynamic in order to be able:

  • Automatically run the can_match phase if frozen indices are part of the query
    • or if a primary field sort is used (sort by a field).
    • ... (we can add more conditions here)

Users would be able to opt-out by setting pre_filter_shard_size to a static value in their requests but that shouldn't be needed in the majority of cases.

@javanna
Copy link
Member

javanna commented Mar 13, 2020

++ to having a more dynamic default, good idea @jimczi . We could then probably open a discussion on the need for the request parameter, and whether it still needs to be a threshold based on number of shards. Maybe it should become something to force enabling/disabling the execution of the can match phase.

jimczi added a commit to jimczi/elasticsearch that referenced this issue Mar 20, 2020
This commit changes the pre_filter_shard_size default from 128 to unspecified.
This allows to apply heuristics based on the request and the target indices when deciding
whether the can match phase should run or not. When unspecified, this pr runs the can match phase
automatically if one of these conditions is met:
  * The request targets more than 128 shards.
  * The request contains read-only indices.
  * The primary sort of the query targets an indexed field.
Users can opt-out from this behavior by setting the `pre_filter_shard_size` to a static value.

Closes elastic#39835
jimczi added a commit that referenced this issue Mar 23, 2020
)

This commit changes the pre_filter_shard_size default from 128 to unspecified.
This allows to apply heuristics based on the request and the target indices when deciding
whether the can match phase should run or not. When unspecified, this pr runs the can match phase
automatically if one of these conditions is met:
  * The request targets more than 128 shards.
  * The request contains read-only indices.
  * The primary sort of the query targets an indexed field.
Users can opt-out from this behavior by setting the `pre_filter_shard_size` to a static value.

Closes #39835
jimczi added a commit to jimczi/elasticsearch that referenced this issue Mar 23, 2020
…stic#53873)

This commit changes the pre_filter_shard_size default from 128 to unspecified.
This allows to apply heuristics based on the request and the target indices when deciding
whether the can match phase should run or not. When unspecified, this pr runs the can match phase
automatically if one of these conditions is met:
  * The request targets more than 128 shards.
  * The request contains read-only indices.
  * The primary sort of the query targets an indexed field.
Users can opt-out from this behavior by setting the `pre_filter_shard_size` to a static value.

Closes elastic#39835
jimczi added a commit that referenced this issue Mar 24, 2020
) (#54007)

This commit changes the pre_filter_shard_size default from 128 to unspecified.
This allows to apply heuristics based on the request and the target indices when deciding
whether the can match phase should run or not. When unspecified, this pr runs the can match phase
automatically if one of these conditions is met:
  * The request targets more than 128 shards.
  * The request contains read-only indices.
  * The primary sort of the query targets an indexed field.
Users can opt-out from this behavior by setting the `pre_filter_shard_size` to a static value.

Closes #39835
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
5 participants