Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved Request Circuit Breaker #11070

Closed
pickypg opened this issue May 8, 2015 · 5 comments
Closed

Improved Request Circuit Breaker #11070

pickypg opened this issue May 8, 2015 · 5 comments
Labels
:Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload discuss high hanging fruit

Comments

@pickypg
Copy link
Member

pickypg commented May 8, 2015

In certain circumstances, the request circuit breaker is not blocking requests that are individually fine, but holistically a problem. For example, if you have an aggregation on a very high-cardinality field and you allow the shard_size to become Integer.MAX_VALUE (either directly, or indirectly by setting it to 0), then you can create a lot of CPU and network congestion (this is documented behavior).

On a per-request basis, this may be caught and safely blocked. However, for requests that manage to sneak in under the request threshold, I have come across scenarios where I can have multiple in-flight requests that manage to crash the node that handles the request.

In particular, I have seen a client node forced into OOM conditions due parallel aggregations with a lot of shards:

/my-index/_search
{
  "aggs": {
    "agg-name":{
      "terms": {
        "field" : "high-cardinality"
        "size" : 10,
        "shard_size" : 0  // THIS IS THE CULPRIT!
      },
      "aggs": {
        // ...
      }
    }
}

In this case, an individual shard response was only ~70 MB, but there were many shards. Worse, other aggregations were in-flight at the same time. Eventually the memory became too much, causing the client node (in this case) to drop out due to OOM. I suspect that a similar problem could surface if a data node were forced to handle the initial request.

This is certainly not an easy problem to catch, nor will the solution to it be easy, but hopefully we can figure something out to combat the issue.

@pickypg pickypg added the :Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload label May 8, 2015
@cfeio
Copy link

cfeio commented May 13, 2015

Could the circuit breaker be enhanced to set a threshold that any query which will be sent to more than X shards should be blocked?

We have a similar issue where we have some search requests that are hitting every shard and we would like to block such requests from executing.

@ppf2
Copy link
Member

ppf2 commented Sep 30, 2015

Have a similar request where the end user asked for circuit breaker functionality at the node level, i.e limiting accumulative memory used by all queries on a node.

@dakrone
Copy link
Member

dakrone commented Sep 30, 2015

So, for the cardinality aggregation and request-level semantics, the circuit breaker (even though it's called "request breaker") has no notion of the request itself. Instead, it is part of the BigArrays class which is a generic class that chunks of byte arrays are requested from. This is nice from a programming level because they are decoupled, but not as nice from an end-user perspective because there is no notion of the actual search/agg request, just the byte array that was requested.

Now, multiple queries using BigArrays should be using the same circuit breaker and thus already be at the node level, however, for things like @pickypg mentioned, the terms aggregation does not currently go through the request breaker (needs to be added!), which is why it can still kill a node.

@pweerd
Copy link

pweerd commented Mar 4, 2016

Does someone know what the outlook for this feature is?
I really would like this feature, because currently same user requests blowup a client node. See
aggregations-blowing-up-client-node

@clintongormley
Copy link

Fixed by #19394

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload discuss high hanging fruit
Projects
None yet
Development

No branches or pull requests

6 participants