Add safeguards to prevent simple user errors #11511

Open
clintongormley opened this Issue Jun 5, 2015 · 22 comments

Projects

None yet
@clintongormley
Member
clintongormley commented Jun 5, 2015 edited

There are a number of places where a naive user can break Elasticsearch very easily. We should add more (dynamically overridable) safeguards that prevent users from hurting themselves.

Note:

  • We are adding high limits to start so that we don't suddenly disable things that users already do today, but so that sysadmins have tools that they can use to protect their clusters. We can revisit the limits later on.
  • All these settings should be prefixed by policy. to make them easier to document together and to understand their purpose.

Accepted limits:

  • #9311 Hard limit on from/size
  • #12149 Global default value for search timeouts (Could be ridiculously high like an hour and it would still help)
  • #17386 Disable fielddata-loading on analyzed text fields by default (Adrien)
  • #17396 Limit the max number of shards to 1000 (Adrien)
  • #17133 Limit the size of all in-flight requests (Daniel)
  • #17357 Limit the number of fields that can be added to a mapping to 1000
  • #17400 Add maximum mapping depth to 20
  • Add sane limits for thread size and queue size (Jim)
  • Don't allow search requests greater than (eg) 10MB (Colin)
  • #14983 Limit the number of nested fields per index to 50 (Yannick)
  • #17522 Limit window_size in rescore API (@nik9000)
  • #17558 Disable script access to _source fields by default
  • #18739 Limit the number of shards that can be rerouted at the same time
  • Hard limit on from/size in top hits (much smaller than a normal query)
  • #19694 Limit script compilation rate to avoid hard coding of params in scripts
  • #20705 Max number of shards per node (enforced as total shards per cluster)
  • #20760 Limit index creation rate

For discussion:

  • Disable certain query types, eg wildcard, span etc?
  • #14046 Limit on the number of buckets returned by aggs
  • #9310 Limit the size of the response (eg for very large doc bodies)
  • Kill slow scripts when search timeout has lapsed aka while(true) should not require a rolling restart to recover from Don't run a script a second time when the first execution takes longer than 1 second
  • #6470 Disable searching on all indices by default Handled by max number of shards

Any other ideas?

@jpountz
Contributor
jpountz commented Jun 5, 2015

Limit the max number of shards

I'm wondering if we should do it per index or per cluster. If we do it per index, then we might also want to have a max number of indices per cluster.

Limit the size of a bulk request

I guess it would also apply to multi-get and multi-search.

@alexbrasetvik
Member

Some of this could go into a "sanity checker"-kind of plugin akin to the migration plugin that runs a bunch of tests as well.

That one could warn when e.g. minimum master nodes looks wrong, and when the number of shards/indexes/fields looks silly / approaches the above limits.

@clintongormley
Member

@alexbrasetvik the requires the user to actually run the check. Often poor sysadmins are at the mercy of their users. What I'd like to do is to prevent users from blowing things up by mistake.

@alexbrasetvik
Member

@clintongormley Agreed! I still think there's room for both, though such a tool should be another issue.

For example, a high number of indexes with few documents and identical mappings can be a sign that the user is doing per-user index partitioning when he shouldn't. That will turn into a problem, even if the current values are far from hitting above mentioned limits.

@pickypg
Member
pickypg commented Jul 15, 2015

Any other ideas?

  • Limit the max number of indices
    • It's effectively covered by limiting by shards, but touching too many indices may indicate more of a logical issue than the shard count (e.g., with daily indices, it's much easier to realize that sending a request to 5 indices represents five days rather than 25 shards with default counts).
  • Limit the concurrent request size
    • Request circuit breaker across all concurrent requests
@dakrone
Member
dakrone commented Jul 16, 2015

Limit the concurrent request size

This is already available with the thread pools and queue_sizes to limit the number of requests per-node and apply backpressure.

EDIT: I guess I am taking "size" as "count", is that what you mean?

@pickypg
Member
pickypg commented Jul 17, 2015

@dakrone Size of an actual request. For instance, if one request comes in with an aggregation that uses size: 0 at the same time as another, then maybe we should block the second one (or at least delay).

@clintongormley clintongormley added v2.0.0 and removed v2.0.0-beta1 labels Aug 12, 2015
@clintongormley clintongormley added v2.1.0 and removed v2.1.0 v2.0.0 labels Oct 6, 2015
@jpountz
Contributor
jpountz commented Oct 30, 2015

Another protection to add: check mapping depth #14370

@ppf2
Member
ppf2 commented Nov 2, 2015

Limit the max value that can be set for queue_size for our search, bulk, index, etc.. thread pools so users can't set them to unlimited, millions, etc..?

@clintongormley clintongormley added v2.2.0 and removed v2.1.0 labels Nov 20, 2015
@spinscale spinscale added v2.3.0 and removed v2.2.0 labels Dec 23, 2015
@clintongormley clintongormley added v5.0.0 and removed v2.3.0 labels Mar 16, 2016
@clintongormley clintongormley added v5.0.0 and removed v5.0.0-alpha1 labels Apr 4, 2016
@clintongormley clintongormley added v5.0.0 and removed v5.0.0-alpha2 labels Apr 26, 2016
@makeyang
Contributor

dose the size in term aggregation is considered by this issue?

@clintongormley
Member

@makeyang it's covered by #14046, which is under discussion

@makeyang
Contributor

is it reasonable to add max_doc_number per index?
is it reasonable to add enable_all_for_search?

@clintongormley
Member

is it reasonable to add max_doc_number per index?

Well, there's already a hard limit but what are you trying to achieve with this one? And what is the user supposed to do instead of indexing into the same index?

is it reasonable to add enable_all_for_search?

What problem are you trying to prevent with disabling access to _all? Why not just disable the _all field if you don't want it used?

@makeyang
Contributor

@clintongormley

  1. some of users even put daily rolling log data into one index. so with max_doc_number parameter, I actually want to force users to think about put data into multi indices.
  2. enable_all_for_search is not about _all field, it is 'http://localhost:9200/_all/_query?q=tag:wow', when i put one cluster for multi users, I really don't want users to search _all indices.
@clintongormley
Member

some of users even put daily rolling log data into one index. so with max_doc_number parameter, I actually want to force users to think about put data into multi indices.

OK, we have a better solution for this that we're thinking about - basically an alias that will generate a new index when it reaches a specified limit (eg size, number of docs, time)

enable_all_for_search is not about _all field, it is 'http://localhost:9200/_all/_query?q=tag:wow', when i put one cluster for multi users, I really don't want users to search _all indices.

Querying all indices is not a problem per se. Rather, it is the total number of shards, which is already handled by #17396

@makeyang
Contributor

@clintongormley thanks a lot. that's all I need.
btw: when will the better solution you are mentioned above will be formed a issue?

@chenryn
chenryn commented May 17, 2016

Would you consider howto use cgroup to control resource usage of search/index/percolator... threads?

elasticsearch need to run cross linux/windows...so, maybe there is a quick way: ES only need to give all thread a threadname, for example, a search thread named search-thread-1 etc, then the linux users can get thread ids by grep threadname and then put tids into cgroup.

@clintongormley clintongormley added v5.0.0 and removed v5.0.0-alpha3 labels May 24, 2016
@jonaf
jonaf commented Jun 6, 2016

I'd like to put in a vote for an additional safeguard: some kind of protection on Terms queries that have hundreds or thousands of terms. I've seen many times where applications will produce Terms queries with hundreds or thousands of terms, and it craters Elasticsearch very easily. It'd be nice to have a default cap and truncate the query, like have a default terms limit (similar to default hits) that can be increased. Knowing that doing this is a problem early on can help application developers to architect their application to avoid needing terms queries that are so huge.

@clintongormley
Member

@jonaf i like the idea. You want to open a separate issue where we can discuss it, and we can link it this this meta issue

@clintongormley clintongormley added v6.0.0 and removed v5.0.0 labels Jun 21, 2016
@s1monw
Contributor
s1monw commented Aug 5, 2016

I think we should also limit the number of shards in an index. If somebody creates an index with 10k shards the node might go nuts immediately. I think we should limit this to 32 or maybe 128?

@s1monw
Contributor
s1monw commented Aug 5, 2016

I also wonder if we should hard limit it and follow moors law and increase it every N years? :) lets start with 256 and force multi index?

@ppf2
Member
ppf2 commented Aug 5, 2016

I think we should also limit the number of shards in an index. If somebody creates an index with 10k shards the node might go nuts immediately. I think we should limit this to 32 or maybe 128?

Nice idea. Similarly, for multitenant use cases that may have a ton of single sharded per-user indices, it can be nice to have a limit or warning when the # of shards per node becomes ridiculous. Not sure what this limit will be based on, perhaps a combination of # of file descriptor, cores and heap. But it will be nice to prevent users from having something like N # of shards per node, etc..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment