Add a cluster setting to disallow expensive queries #51385

matriv · 2020-01-23T23:15:31Z

Add a new cluster setting search.allow_expensive_queries which by
default is true. If set to false, certain queries that have
usually slow performance cannot be executed and an error message
is returned.

Queries that need to do linear scans to identify matches:
- Script queries
Queries that have a high up-front cost:
- Fuzzy queries
- Regexp queries
- Prefix queries (without index_prefixes enabled
- Wildcard queries
- Range queries on text and keyword fields
Joining queries
- HasParent queries
- HasChild queries
- ParentId queries
- Nested queries
Queries on deprecated 6.x geo shapes (using PrefixTree implementation)
Queries that may have a high per-document cost:
- Script score queries
- Percolate queries

Closes: #29050

elasticmachine · 2020-01-23T23:15:34Z

Pinging @elastic/es-search (:Search/Search)

Add a new cluster setting `search.disallow_slow_queries` which by default is `false`. If set to `true` then certain queries (prefix, fuzzy, regexp and wildcard) that have usually slow performance cannot be executed and an exception is thrown. Closes: elastic#29050

jimczi

The new setting looks good, although I am still not decided whether we should add more granularity to the detection of slow queries. One possibility would be to have some static limits, like number of characters in the prefix query, wildcard with leading wildcards, ... Another possibility could be to hook into the rewrite of multi-terms to limit the expansion to few terms but the first option might be enough for most of the cases.
Should we also add the exception for script and script_score queries ? We could also try to restrict them only if they are the only required clause in the query. This would allow users to continue to use script but only if they have another required clause based on inverted lists.
I'd also like @jpountz and @romseygeek to take a look to validate the approach and the limitations that we want to enforce in this new setting.

jimczi · 2020-01-27T11:06:15Z

server/src/main/java/org/elasticsearch/index/IndexService.java

@@ -223,6 +226,8 @@ public IndexService(
        this.globalCheckpointTask = new AsyncGlobalCheckpointTask(this);
        this.retentionLeaseSyncTask = new AsyncRetentionLeaseSyncTask(this);
        updateFsyncTaskIfNecessary();
+
+


nit: remove change

jimczi · 2020-01-27T11:20:08Z

docs/reference/query-dsl.asciidoc

+<<query-dsl-regexp-query,`regexp`>> and <<query-dsl-bool-query,`wildcard`>> ,
+that are usually slow performance can affect the cluster performance.
+The execution of such queries can be prevented by setting the value of the `search.disallow_slow_queries`
+setting to `true` (defaults tp `false`).


nit: s/tp/to

rjernst

One general suggestion: can we use positive logic instead of double negative? ie have the setting be allow-slow-queries with a default of true.

jasontedor · 2020-01-27T18:48:05Z

Putting aside @rjernst's thoughts about the double negative, I have some general concerns about the name. The problem that I have is that this setting does not forbid slow queries (i.e., queries that from a user-facing perspective are taking a long period of time, which then get aborted if this setting is set) but rather queries that are "expensive" to execute, because they scale poorly. Perhaps the naming could reflect that, like allow_expensive_queries.

matriv · 2020-01-27T19:16:22Z

I agree with both suggestions regarding the negative naming and the expensive vs slow.
@jimczi @jpountz @romseygeek what do you think?

matriv · 2020-01-27T19:26:00Z

Currently I have added integration tests for each one of the disallowed queries. For the Script/ScriptScore it's not possible to include them in the yml test since the required ScriptPlugins are not loaded for those tests, so instead I added a test method in the corresponding java xxxIT tests.

After some discussion I had with @rjernst, he suggested to just have one integ test, since the path of updating/passing the setting down to the QueryShardContext is the same for all the queries.

@jimczi @jpountz @romseygeek What's your opinion?

matriv · 2020-01-28T16:01:49Z

@elasticmachine run elasticsearch-ci/2

matriv · 2020-01-28T16:36:56Z

@rjernst @jasontedor Regarding the negative name and semantics, I have to add that the idea is as a next step to have more fine-grain control on the disallowed queries, so something like search.disallow_expensive_queries : "fuzzy, regexp, wildcard" which imo seems to promote the usage of a negative semantics setting. Otherwise we will have a setting with default value like this: search.allow_expensive_queries : "fuzzy, joining, regexp, prefix, script, script_score, wildcard" where one should start removing values from the setting.

rjernst · 2020-01-28T17:51:04Z

I think allow/disallow should really be a boolean. If we want the user to control the exact queries allowed, it should be a different setting. I also think an inclusive setting, while more verbose, is much simpler for a user to understand and less error prone in the future. If the list setting has negative semantics, new queries which we deem expensive may automatically be included on upgrade when the user already decided which expensive queries to allow. It is also more direct for a user or support to look at an inclusive setting and know what is allowed, instead of needing to consult code or documentation on what the default list of expensive queries are and then mentally remove the values of this setting.

jpountz

Some thoughts about this PR:

Since I initially opened the issue that this PR addresses, the company seems to have embraced slow operations rather than prevented them (see e.g. searchable snapshots and the async search API) so I wonder whether this is still something we want to do. cc @giladgal
Regardless I think there is a theme around slow queries and we need to provide users with a better experience:
- Make multi-term queries honor the timeout setting and task cancellation by leveraging ExitableDirectoryReader.
- Better warn users when they run slow queries that have faster alternatives, such as running prefix queries on a text field that doesn't have index_prefixes enabled. There aren't many examples right now, but we expect several ones to come, e.g. when we enable querying numeric fields that have doc values but are not indexed, when we add scripted fields, or when the wildcard_keyword field comes out.

jpountz · 2020-01-27T17:10:24Z

docs/reference/query-dsl.asciidoc

+<<query-dsl-regexp-query,`regexp`>> and <<query-dsl-bool-query,`wildcard`>> ,
+that are usually slow performance can affect the cluster performance.
+The execution of such queries can be prevented by setting the value of the `search.disallow_slow_queries`
+setting to `true` (defaults to `false`).


I think we need to expand a bit more here on what qualifies as a slow query. This will help users understand why some queries are protected by this setting while other queries are not. And this will also help make a decision whether a query qualifies as slow or not as we add more queries in the future:

Queries that need to do linear scans to identify matches:

script queries

Queries that have a high up-front cost:

fuzzy queries

prefix queries without index_prefixes

wildcard queries

range queries on keyword fields

join queries

queries on 6.x geo shapes

Queries that may have a high per-document cost

percolate queries

giladgal · 2020-01-28T21:04:34Z

Since I initially opened the issue that this PR addresses, the company seems to have embraced slow operations rather than prevented them (see e.g. searchable snapshots and the async search API) so I wonder whether this is still something we want to do. cc @giladgal

I think the PR still makes sense. Giving users the tools to run resource-draining slow queries can make it easier for users to get themselves into troubles, which only makes such a setting more important as a safety mechanism.

jpountz

I left some minor comments, apart from that the change looks good to me.

jpountz · 2020-02-07T18:35:16Z

docs/reference/query-dsl.asciidoc

+Those queries can be categorised as follows:
+* Queries that need to do linear scans to identify matches:
+** <<query-dsl-script-query, `script queries`>>
+** <<query-dsl-script-score-query, `script score queries`>>


I was expecting this one to be fine since it finds matches using another query?

It was a suggestion by @jimczi to also include those, since I guess the custom score calculation could decrease performance ?

I can see how this can be true with a complex score function. Let's move it to the Queries that may have a high per-document cost section, since it's about scoring, not matching?

jpountz · 2020-02-07T18:37:35Z

server/src/main/java/org/elasticsearch/index/IndexModule.java

+            final AnalysisRegistry analysisRegistry,
+            final EngineFactory engineFactory,
+            final Map<String, IndexStorePlugin.DirectoryFactory> directoryFactories) {
+        this(indexSettings, analysisRegistry, engineFactory, directoryFactories, () -> true);
    }


can't we have tests call the other constructor instead and pass ()->true?

Yep, that's easy to do.

jpountz · 2020-02-07T18:38:34Z

server/src/main/java/org/elasticsearch/index/query/QueryShardContext.java

+                scriptService, xContentRegistry, namedWriteableRegistry, client, searcher, nowInMillis, indexNameMatcher,
+                new Index(RemoteClusterAware.buildRemoteIndexName(clusterAlias, indexSettings.getIndex().getName()),
+                        indexSettings.getIndex().getUUID()), isAllowExpensiveQueries);
+    }


can we avoid duplicating constructors?

There are ~20 usages of this in tests, so no big deal, I can remove the constructor.

romseygeek

One nit around naming, but LGTM otherwise

romseygeek · 2020-02-10T10:39:25Z

server/src/main/java/org/elasticsearch/index/query/QueryShardContext.java

@@ -192,6 +197,10 @@ public BitSetProducer bitsetFilter(Query filter) {
        return bitsetFilterCache.getBitSetProducer(filter);
    }

+    public boolean isAllowExpensiveQueries() {
+        return isAllowExpensiveQueries.getAsBoolean();


Can this just be allowExpensiveQueries()? Otherwise it reads very strangely.

romseygeek

Couple of documentation changes to make it read slightly more naturally.

romseygeek · 2020-02-10T11:01:33Z

docs/reference/query-dsl.asciidoc

+
+[[query-dsl-allow-expensive-queries]]
+Allow expensive queries::
+Execution of certain types of queries have usually slow performance, which can affect the cluster performance.


Certain types of queries will generally execute slowly due to the way they are implemented, which can affect the stability of your cluster.

romseygeek · 2020-02-10T11:03:42Z

docs/reference/query-dsl/prefix-query.asciidoc

+[[prefix-query-allow-expensive-queries]]
+===== Allow expensive queries
+Prefix queries will not be executed if <<query-dsl-allow-expensive-queries, `search.allow_expensive_queries`>>


However, if <<index-prefixes, index_prefixes>> are enabled, an optimised query is built which is not considered slow, and will be executed in spite of this setting.

jpountz

I left a comment about the docs, LGTM otherwise.

jpountz · 2020-02-12T10:12:53Z

docs/reference/query-dsl.asciidoc

+Those queries can be categorised as follows:
+* Queries that need to do linear scans to identify matches:
+** <<query-dsl-script-query, `script queries`>>
+** <<query-dsl-script-score-query, `script score queries`>>


I can see how this can be true with a complex score function. Let's move it to the Queries that may have a high per-document cost section, since it's about scoring, not matching?

Add a new cluster setting `search.allow_expensive_queries` which by default is `true`. If set to `false`, certain queries that have usually slow performance cannot be executed and an error message is returned. - Queries that need to do linear scans to identify matches: - Script queries - Queries that have a high up-front cost: - Fuzzy queries - Regexp queries - Prefix queries (without index_prefixes enabled - Wildcard queries - Range queries on text and keyword fields - Joining queries - HasParent queries - HasChild queries - ParentId queries - Nested queries - Queries on deprecated 6.x geo shapes (using PrefixTree implementation) - Queries that may have a high per-document cost: - Script score queries - Percolate queries Closes: #29050 (cherry picked from commit a8b39ed)

matriv · 2020-02-12T21:56:39Z

master : a8b39ed
7.x : dac720d

javanna · 2020-03-13T10:19:26Z

rest-api-spec/src/main/resources/rest-api-spec/test/search/320_disallow_queries.yml

+      cluster.get_settings:
+        flat_settings: true
+
+  - match: {search.allow_expensive_queries: null}


heya @matriv I think this assertion is problematic, it works with the java runner due to the semantics of HashMap.get , but it may fail with test runners written in other languages. I believe that the setting is not returned, is it? I was chatting to @karmi about this and the proper way to do this null check would be is_false: search.allow_expensive_queries . Would you mind please changing this?

To clarify, this fails in the Go client runner, what works is is_false: { search.allow_expensive_queries: null }.

does is_false accept field values like match? I thought the right syntax would be is_false: search.allow_expensive_queries

is_false: search.allow_expensive_queries works with Java.
is_false: { search.allow_expensive_queries: null }doesn't and also seems semantically wrong?

My bad, sorry — is_false: search.allow_expensive_queries is correct.

javanna · 2020-07-20T12:12:36Z

heya, we were looking with @nik9000 at failing queries against runtime fields (under development in a feature branch) when expensive queries are disallowed. Looking at how the check is performed, we noticed that it is currently executed on each shard, which translates to each shard returning the same error (which then get de-duplicated in the coordinating node). Shall we move these checks to the coordinating node? Ideally, expensive queries would be rejected straight-away before being sent to the shards as part of the query phase.

matriv · 2020-07-20T14:17:38Z

@javanna This sounds good to me.
@jimczi Do you maybe have something to add on this proposed change/improvement?

matriv · 2020-07-21T11:13:10Z

@javana Jim reminded me of the reason we didn't do it in the first place: Currently the mapping is resolved on every shard and for example: a prefix query on a field with some options enabled can be turned into a simple term query so with the proposed approach we would return an error on queries that we accept today (because of the rewrite simplification) when the option is enabled.

Maybe, in the future we could enhance this behaviour by resolving the mapping the co-ordinating node, possibly using field caps.

matriv added >feature :Search/Search Search-related issues that do not fall into other categories v8.0.0 v7.7.0 labels Jan 23, 2020

matriv force-pushed the impl-29050 branch 2 times, most recently from bb294f8 to 3a219ca Compare January 24, 2020 15:29

matriv force-pushed the impl-29050 branch from 3a219ca to d9a14a6 Compare January 24, 2020 16:59

matriv requested review from jimczi and jpountz January 24, 2020 17:00

matriv marked this pull request as ready for review January 24, 2020 17:00

jimczi reviewed Jan 27, 2020

View reviewed changes

matriv mentioned this pull request Jan 27, 2020

Add a switch to disallow slow queries #29050

Closed

address comments

181d838

rjernst reviewed Jan 27, 2020

View reviewed changes

Add script and script_score queries to the disallowed ones

4ffc00f

Disallow joining queries

55200c7

jpountz reviewed Jan 28, 2020

View reviewed changes

Rename setting to positive semantics

16f6c8b

matriv changed the title ~~Add a cluster setting to disallow slow queries~~ Add a cluster setting to disallow expensive queries Jan 29, 2020

matriv added 2 commits February 5, 2020 12:46

Merge remote-tracking branch 'upstream/master' into impl-29050

126448f

Merge remote-tracking branch 'upstream/master' into impl-29050

b1ca9cb

remove unused import

d1c5e1c

jpountz reviewed Feb 7, 2020

View reviewed changes

matriv and others added 3 commits February 7, 2020 22:10

Remove duplicate constructors to ease up tests

78e3c9e

fix tests

1c64502

Merge remote-tracking branch 'upstream/master' into impl-29050

6fdab2b

romseygeek approved these changes Feb 10, 2020

View reviewed changes

romseygeek reviewed Feb 10, 2020

View reviewed changes

matriv requested a review from jpountz February 10, 2020 11:20

matriv and others added 2 commits February 10, 2020 12:33

Address comments

623846b

Merge remote-tracking branch 'upstream/master' into impl-29050

97d2ce4

jpountz approved these changes Feb 12, 2020

View reviewed changes

matriv and others added 2 commits February 12, 2020 17:01

fix docs

1759a84

Merge remote-tracking branch 'upstream/master' into impl-29050

4774178

matriv merged commit a8b39ed into elastic:master Feb 12, 2020

matriv deleted the impl-29050 branch February 12, 2020 17:06

matriv added backport pending v7.7.0 and removed v7.7.0 labels Feb 12, 2020

matriv removed the backport pending label Mar 3, 2020

javanna reviewed Mar 13, 2020

View reviewed changes

giladgal mentioned this pull request Mar 19, 2020

Configure search.allow_expensive_queries per role #53607

Open

codebrain mentioned this pull request Apr 1, 2020

7.7.0 meta ticket elastic/elasticsearch-net#4525

Closed

38 tasks

javanna mentioned this pull request Jul 20, 2020

Reduce duplication in checks for expensive queries #59796

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Add a cluster setting to disallow expensive queries #51385

Add a cluster setting to disallow expensive queries #51385

Conversation

matriv commented Jan 23, 2020 • edited Loading

elasticmachine commented Jan 23, 2020

jimczi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rjernst left a comment

Choose a reason for hiding this comment

jasontedor commented Jan 27, 2020

matriv commented Jan 27, 2020

matriv commented Jan 27, 2020

matriv commented Jan 28, 2020

matriv commented Jan 28, 2020

rjernst commented Jan 28, 2020 • edited Loading

jpountz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

giladgal commented Jan 28, 2020

jpountz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romseygeek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romseygeek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpountz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matriv commented Feb 12, 2020

javanna Mar 13, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javanna Mar 13, 2020 • edited Loading

Choose a reason for hiding this comment

matriv Mar 13, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javanna commented Jul 20, 2020

matriv commented Jul 20, 2020

matriv commented Jul 21, 2020

matriv commented Jan 23, 2020 •

edited

Loading

rjernst commented Jan 28, 2020 •

edited

Loading

javanna Mar 13, 2020 •

edited

Loading

javanna Mar 13, 2020 •

edited

Loading

matriv Mar 13, 2020 •

edited

Loading