Improved Request Circuit Breaker #11070

pickypg · 2015-05-08T18:52:14Z

In certain circumstances, the request circuit breaker is not blocking requests that are individually fine, but holistically a problem. For example, if you have an aggregation on a very high-cardinality field and you allow the shard_size to become Integer.MAX_VALUE (either directly, or indirectly by setting it to 0), then you can create a lot of CPU and network congestion (this is documented behavior).

On a per-request basis, this may be caught and safely blocked. However, for requests that manage to sneak in under the request threshold, I have come across scenarios where I can have multiple in-flight requests that manage to crash the node that handles the request.

In particular, I have seen a client node forced into OOM conditions due parallel aggregations with a lot of shards:

/my-index/_search
{
  "aggs": {
    "agg-name":{
      "terms": {
        "field" : "high-cardinality"
        "size" : 10,
        "shard_size" : 0  // THIS IS THE CULPRIT!
      },
      "aggs": {
        // ...
      }
    }
}

In this case, an individual shard response was only ~70 MB, but there were many shards. Worse, other aggregations were in-flight at the same time. Eventually the memory became too much, causing the client node (in this case) to drop out due to OOM. I suspect that a similar problem could surface if a data node were forced to handle the initial request.

This is certainly not an easy problem to catch, nor will the solution to it be easy, but hopefully we can figure something out to combat the issue.

The text was updated successfully, but these errors were encountered:

cfeio · 2015-05-13T18:43:07Z

Could the circuit breaker be enhanced to set a threshold that any query which will be sent to more than X shards should be blocked?

We have a similar issue where we have some search requests that are hitting every shard and we would like to block such requests from executing.

ppf2 · 2015-09-30T08:28:36Z

Have a similar request where the end user asked for circuit breaker functionality at the node level, i.e limiting accumulative memory used by all queries on a node.

dakrone · 2015-09-30T08:35:42Z

So, for the cardinality aggregation and request-level semantics, the circuit breaker (even though it's called "request breaker") has no notion of the request itself. Instead, it is part of the BigArrays class which is a generic class that chunks of byte arrays are requested from. This is nice from a programming level because they are decoupled, but not as nice from an end-user perspective because there is no notion of the actual search/agg request, just the byte array that was requested.

Now, multiple queries using BigArrays should be using the same circuit breaker and thus already be at the node level, however, for things like @pickypg mentioned, the terms aggregation does not currently go through the request breaker (needs to be added!), which is why it can still kill a node.

pweerd · 2016-03-04T11:30:32Z

Does someone know what the outlook for this feature is?
I really would like this feature, because currently same user requests blowup a client node. See
aggregations-blowing-up-client-node

clintongormley · 2016-09-08T09:03:58Z

Fixed by #19394

pickypg added the :Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload label May 8, 2015

clintongormley added the discuss label May 15, 2015

pickypg mentioned this issue May 28, 2015

Client Node Request/Fetch Circuit Breaker #11401

Closed

ppf2 mentioned this issue Oct 28, 2015

Circuit breaker to manage native script plugin memory usage #14325

Closed

clintongormley added the high hanging fruit label Jan 18, 2016

clintongormley closed this as completed Sep 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved Request Circuit Breaker #11070

Improved Request Circuit Breaker #11070

pickypg commented May 8, 2015

cfeio commented May 13, 2015

ppf2 commented Sep 30, 2015

dakrone commented Sep 30, 2015

pweerd commented Mar 4, 2016

clintongormley commented Sep 8, 2016

Improved Request Circuit Breaker #11070

Improved Request Circuit Breaker #11070

Comments

pickypg commented May 8, 2015

cfeio commented May 13, 2015

ppf2 commented Sep 30, 2015

dakrone commented Sep 30, 2015

pweerd commented Mar 4, 2016

clintongormley commented Sep 8, 2016