Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add single request circuit breaker. #46962

Closed

Conversation

howardhuanghua
Copy link
Contributor

@howardhuanghua howardhuanghua commented Sep 23, 2019

Currently we have request circuit breaker, but all the query requests would share this memory limit pool, then requests would be impacted with each other.

This PR introduced a single request circuit breaker which could limit memory usage of single request. A data analysis cluster may support many regular small search requests, however, if user triggers a query that consumes a lot of memory may cause other queries to fail. In this case, we could set this single request circuit breaker to prevent single search request that consumes a lot of memory and impact regular search requests.

By default, we set new setting indices.breaker.single_request.limit to 30% which is half of indices.breaker.request.limit.

@original-brownbear original-brownbear added the :Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload label Sep 23, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@original-brownbear
Copy link
Member

@howardhuanghua thanks for this suggestion.

I have a question: this change does not really add a per-request circuit breaker, does it?
It seems that simply limits the maximum size of any single allocation (at least as seen by the circuit breaker) as far as I can see. I wonder what others think, but I'm not sure adding that kind of limit will have much practical impact since we rarely do these kinds of single, massive allocations and per-request cost is mostly just a sum of various allocations that would have to be tracked for the lifetime of a request.

@howardhuanghua what kind of massively expensive request is this kind of allocation limit catching for you in practice?

@howardhuanghua
Copy link
Contributor Author

Hi @original-brownbear, sorry for the delay. Yes you are right, this PR is only considered a small piece of memory in search request. The problem we have encountered is mainly about coordinate node breaker, and I also opened #46751 and #47806. Sometimes a single big range aggregation query request consumes lots of memory on coordinate node and may impact other regular requests. So we try to add limit for single request.

It’s hard to collect all the pieces of memory usage over a single search request. But I think we could collect major memory usage for single request. I have some ideas for single request on coordinate node:

  1. Collect all the shard results response in InboundHandler.java#handleResponse for a single search request before deserialization as described in Coordinate node memory checking during accumulating shard result response #47806.
  2. Estimate response object size based on received message length for each shard result.
  3. Add all estimated response object size together in request level (each single search request has a unique AbstractSearchAsyncAction object), and compare with single request limit setting for checking single request breaker.

Alternatively, if we have already pre-calculated each shard result size in InboundHandler.java#handleResponse, we could also calculate all the shard results size for current search request during result consuming:

void consumeResult(Result result) {
assert results.get(result.getShardIndex()) == null : "shardIndex: " + result.getShardIndex() + " is already set";
results.set(result.getShardIndex(), result);
}

In this case, since all the received shard result responses have been deserialized, so this calculated size is the real memory usage rather than pre-check.

@rjernst rjernst added the Team:Core/Infra Meta label for core/infra team label May 4, 2020
@rjernst
Copy link
Member

rjernst commented Feb 17, 2021

After reviewing this PR, we do not believe this change is worth the added complexity. This change will not prevent many if any conceivable out of memory situations that would not be covered by other circuit breakers already with a setting value of 30%. Also, as pointed out above, there is not necessarily a correlation between message size and the amount of memory a request will actually consume which limits the use-cases for this approach.

For now we decided to only pursue limiting the request size like that on the REST layer (see #67804) which will also filter out large transport messages from indexing on the transport layer. Since we do not plan to move forward with the approach in this PR, I hope you don't mind I close it.

@rjernst rjernst closed this Feb 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload feedback_needed Team:Core/Infra Meta label for core/infra team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants