Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a cluster setting to disallow expensive queries #51385

Merged
merged 31 commits into from
Feb 12, 2020

Conversation

matriv
Copy link
Contributor

@matriv matriv commented Jan 23, 2020

Add a new cluster setting search.allow_expensive_queries which by
default is true. If set to false, certain queries that have
usually slow performance cannot be executed and an error message
is returned.

  • Queries that need to do linear scans to identify matches:
    • Script queries
  • Queries that have a high up-front cost:
    • Fuzzy queries
    • Regexp queries
    • Prefix queries (without index_prefixes enabled
    • Wildcard queries
    • Range queries on text and keyword fields
  • Joining queries
    • HasParent queries
    • HasChild queries
    • ParentId queries
    • Nested queries
  • Queries on deprecated 6.x geo shapes (using PrefixTree implementation)
  • Queries that may have a high per-document cost:
    • Script score queries
    • Percolate queries

Closes: #29050

@matriv matriv added >feature :Search/Search Search-related issues that do not fall into other categories v8.0.0 v7.7.0 labels Jan 23, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@matriv matriv force-pushed the impl-29050 branch 2 times, most recently from bb294f8 to 3a219ca Compare January 24, 2020 15:29
Add a new cluster setting `search.disallow_slow_queries` which by
default is `false`. If set to `true` then certain queries (prefix,
fuzzy, regexp and wildcard) that have usually slow performance cannot
be executed and an exception is thrown.

Closes: elastic#29050
@matriv matriv requested review from jimczi and jpountz January 24, 2020 17:00
@matriv matriv marked this pull request as ready for review January 24, 2020 17:00
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new setting looks good, although I am still not decided whether we should add more granularity to the detection of slow queries. One possibility would be to have some static limits, like number of characters in the prefix query, wildcard with leading wildcards, ... Another possibility could be to hook into the rewrite of multi-terms to limit the expansion to few terms but the first option might be enough for most of the cases.
Should we also add the exception for script and script_score queries ? We could also try to restrict them only if they are the only required clause in the query. This would allow users to continue to use script but only if they have another required clause based on inverted lists.
I'd also like @jpountz and @romseygeek to take a look to validate the approach and the limitations that we want to enforce in this new setting.

@@ -223,6 +226,8 @@ public IndexService(
this.globalCheckpointTask = new AsyncGlobalCheckpointTask(this);
this.retentionLeaseSyncTask = new AsyncRetentionLeaseSyncTask(this);
updateFsyncTaskIfNecessary();


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove change

<<query-dsl-regexp-query,`regexp`>> and <<query-dsl-bool-query,`wildcard`>> ,
that are usually slow performance can affect the cluster performance.
The execution of such queries can be prevented by setting the value of the `search.disallow_slow_queries`
setting to `true` (defaults tp `false`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/tp/to

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One general suggestion: can we use positive logic instead of double negative? ie have the setting be allow-slow-queries with a default of true.

@jasontedor
Copy link
Member

Putting aside @rjernst's thoughts about the double negative, I have some general concerns about the name. The problem that I have is that this setting does not forbid slow queries (i.e., queries that from a user-facing perspective are taking a long period of time, which then get aborted if this setting is set) but rather queries that are "expensive" to execute, because they scale poorly. Perhaps the naming could reflect that, like allow_expensive_queries.

@matriv
Copy link
Contributor Author

matriv commented Jan 27, 2020

I agree with both suggestions regarding the negative naming and the expensive vs slow.
@jimczi @jpountz @romseygeek what do you think?

@matriv
Copy link
Contributor Author

matriv commented Jan 27, 2020

Currently I have added integration tests for each one of the disallowed queries. For the Script/ScriptScore it's not possible to include them in the yml test since the required ScriptPlugins are not loaded for those tests, so instead I added a test method in the corresponding java xxxIT tests.

After some discussion I had with @rjernst, he suggested to just have one integ test, since the path of updating/passing the setting down to the QueryShardContext is the same for all the queries.

@jimczi @jpountz @romseygeek What's your opinion?

@matriv
Copy link
Contributor Author

matriv commented Jan 28, 2020

@elasticmachine run elasticsearch-ci/2

@matriv
Copy link
Contributor Author

matriv commented Jan 28, 2020

@rjernst @jasontedor Regarding the negative name and semantics, I have to add that the idea is as a next step to have more fine-grain control on the disallowed queries, so something like search.disallow_expensive_queries : "fuzzy, regexp, wildcard" which imo seems to promote the usage of a negative semantics setting. Otherwise we will have a setting with default value like this: search.allow_expensive_queries : "fuzzy, joining, regexp, prefix, script, script_score, wildcard" where one should start removing values from the setting.

@rjernst
Copy link
Member

rjernst commented Jan 28, 2020

I think allow/disallow should really be a boolean. If we want the user to control the exact queries allowed, it should be a different setting. I also think an inclusive setting, while more verbose, is much simpler for a user to understand and less error prone in the future. If the list setting has negative semantics, new queries which we deem expensive may automatically be included on upgrade when the user already decided which expensive queries to allow. It is also more direct for a user or support to look at an inclusive setting and know what is allowed, instead of needing to consult code or documentation on what the default list of expensive queries are and then mentally remove the values of this setting.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts about this PR:

  • Since I initially opened the issue that this PR addresses, the company seems to have embraced slow operations rather than prevented them (see e.g. searchable snapshots and the async search API) so I wonder whether this is still something we want to do. cc @giladgal
  • Regardless I think there is a theme around slow queries and we need to provide users with a better experience:
    • Make multi-term queries honor the timeout setting and task cancellation by leveraging ExitableDirectoryReader.
    • Better warn users when they run slow queries that have faster alternatives, such as running prefix queries on a text field that doesn't have index_prefixes enabled. There aren't many examples right now, but we expect several ones to come, e.g. when we enable querying numeric fields that have doc values but are not indexed, when we add scripted fields, or when the wildcard_keyword field comes out.

<<query-dsl-regexp-query,`regexp`>> and <<query-dsl-bool-query,`wildcard`>> ,
that are usually slow performance can affect the cluster performance.
The execution of such queries can be prevented by setting the value of the `search.disallow_slow_queries`
setting to `true` (defaults to `false`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to expand a bit more here on what qualifies as a slow query. This will help users understand why some queries are protected by this setting while other queries are not. And this will also help make a decision whether a query qualifies as slow or not as we add more queries in the future:

  • Queries that need to do linear scans to identify matches:
    • script queries
  • Queries that have a high up-front cost:
    • fuzzy queries
    • prefix queries without index_prefixes
    • wildcard queries
    • range queries on keyword fields
    • join queries
    • queries on 6.x geo shapes
  • Queries that may have a high per-document cost
    • percolate queries

@giladgal
Copy link
Contributor

  • Since I initially opened the issue that this PR addresses, the company seems to have embraced slow operations rather than prevented them (see e.g. searchable snapshots and the async search API) so I wonder whether this is still something we want to do. cc @giladgal

I think the PR still makes sense. Giving users the tools to run resource-draining slow queries can make it easier for users to get themselves into troubles, which only makes such a setting more important as a safety mechanism.

@matriv matriv changed the title Add a cluster setting to disallow slow queries Add a cluster setting to disallow expensive queries Jan 29, 2020
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor comments, apart from that the change looks good to me.

Those queries can be categorised as follows:
* Queries that need to do linear scans to identify matches:
** <<query-dsl-script-query, `script queries`>>
** <<query-dsl-script-score-query, `script score queries`>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting this one to be fine since it finds matches using another query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a suggestion by @jimczi to also include those, since I guess the custom score calculation could decrease performance ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how this can be true with a complex score function. Let's move it to the Queries that may have a high per-document cost section, since it's about scoring, not matching?

final AnalysisRegistry analysisRegistry,
final EngineFactory engineFactory,
final Map<String, IndexStorePlugin.DirectoryFactory> directoryFactories) {
this(indexSettings, analysisRegistry, engineFactory, directoryFactories, () -> true);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we have tests call the other constructor instead and pass ()->true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's easy to do.

scriptService, xContentRegistry, namedWriteableRegistry, client, searcher, nowInMillis, indexNameMatcher,
new Index(RemoteClusterAware.buildRemoteIndexName(clusterAlias, indexSettings.getIndex().getName()),
indexSettings.getIndex().getUUID()), isAllowExpensiveQueries);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we avoid duplicating constructors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are ~20 usages of this in tests, so no big deal, I can remove the constructor.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit around naming, but LGTM otherwise

@@ -192,6 +197,10 @@ public BitSetProducer bitsetFilter(Query filter) {
return bitsetFilterCache.getBitSetProducer(filter);
}

public boolean isAllowExpensiveQueries() {
return isAllowExpensiveQueries.getAsBoolean();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just be allowExpensiveQueries()? Otherwise it reads very strangely.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of documentation changes to make it read slightly more naturally.


[[query-dsl-allow-expensive-queries]]
Allow expensive queries::
Execution of certain types of queries have usually slow performance, which can affect the cluster performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certain types of queries will generally execute slowly due to the way they are implemented, which can affect the stability of your cluster.

[[prefix-query-allow-expensive-queries]]
===== Allow expensive queries
Prefix queries will not be executed if <<query-dsl-allow-expensive-queries, `search.allow_expensive_queries`>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, if <<index-prefixes, index_prefixes>> are enabled, an optimised query is built which is not considered slow, and will be executed in spite of this setting.

@matriv matriv requested a review from jpountz February 10, 2020 11:20
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment about the docs, LGTM otherwise.

Those queries can be categorised as follows:
* Queries that need to do linear scans to identify matches:
** <<query-dsl-script-query, `script queries`>>
** <<query-dsl-script-score-query, `script score queries`>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how this can be true with a complex score function. Let's move it to the Queries that may have a high per-document cost section, since it's about scoring, not matching?

@matriv matriv merged commit a8b39ed into elastic:master Feb 12, 2020
@matriv matriv deleted the impl-29050 branch February 12, 2020 17:06
matriv added a commit that referenced this pull request Feb 12, 2020
Add a new cluster setting `search.allow_expensive_queries` which by
default is `true`. If set to `false`, certain queries that have
usually slow performance cannot be executed and an error message
is returned.

- Queries that need to do linear scans to identify matches:
  - Script queries
- Queries that have a high up-front cost:
  - Fuzzy queries
  - Regexp queries
  - Prefix queries (without index_prefixes enabled
  - Wildcard queries
  - Range queries on text and keyword fields
- Joining queries
  - HasParent queries
  - HasChild queries
  - ParentId queries
  - Nested queries
- Queries on deprecated 6.x geo shapes (using PrefixTree implementation)
- Queries that may have a high per-document cost:
  - Script score queries
  - Percolate queries

Closes: #29050
(cherry picked from commit a8b39ed)
@matriv
Copy link
Contributor Author

matriv commented Feb 12, 2020

master : a8b39ed
7.x : dac720d

cluster.get_settings:
flat_settings: true

- match: {search.allow_expensive_queries: null}
Copy link
Member

@javanna javanna Mar 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heya @matriv I think this assertion is problematic, it works with the java runner due to the semantics of HashMap.get , but it may fail with test runners written in other languages. I believe that the setting is not returned, is it? I was chatting to @karmi about this and the proper way to do this null check would be is_false: search.allow_expensive_queries . Would you mind please changing this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, this fails in the Go client runner, what works is is_false: { search.allow_expensive_queries: null }.

Copy link
Member

@javanna javanna Mar 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does is_false accept field values like match? I thought the right syntax would be is_false: search.allow_expensive_queries

Copy link
Contributor Author

@matriv matriv Mar 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_false: search.allow_expensive_queries works with Java.
is_false: { search.allow_expensive_queries: null }doesn't and also seems semantically wrong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, sorry — is_false: search.allow_expensive_queries is correct.

@javanna
Copy link
Member

javanna commented Jul 20, 2020

heya, we were looking with @nik9000 at failing queries against runtime fields (under development in a feature branch) when expensive queries are disallowed. Looking at how the check is performed, we noticed that it is currently executed on each shard, which translates to each shard returning the same error (which then get de-duplicated in the coordinating node). Shall we move these checks to the coordinating node? Ideally, expensive queries would be rejected straight-away before being sent to the shards as part of the query phase.

@matriv
Copy link
Contributor Author

matriv commented Jul 20, 2020

@javanna This sounds good to me.
@jimczi Do you maybe have something to add on this proposed change/improvement?

@matriv
Copy link
Contributor Author

matriv commented Jul 21, 2020

@javana Jim reminded me of the reason we didn't do it in the first place: Currently the mapping is resolved on every shard and for example: a prefix query on a field with some options enabled can be turned into a simple term query so with the proposed approach we would return an error on queries that we accept today (because of the rewrite simplification) when the option is enabled.

Maybe, in the future we could enhance this behaviour by resolving the mapping the co-ordinating node, possibly using field caps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search/Search Search-related issues that do not fall into other categories v7.7.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a switch to disallow slow queries