Limit request size #16011

Closed
danielmitterdorfer opened this Issue Jan 15, 2016 · 17 comments

Projects

None yet

6 participants

@danielmitterdorfer
Member

Overly large bulk request can threaten the stability of Elasticsearch. Hence we want to limit the size of a bulk request.

Although the need originally arose for bulk requests, the solution will apply to requests in general and not just bulk requests.

There will be two (configurable) limits:

  • A limit on the size (in bytes) of an individual request
  • A limit on the size (in bytes) of all requests that are currently in flight
@danielmitterdorfer danielmitterdorfer self-assigned this Jan 15, 2016
@dadoonet
Member

Just a thought: I wonder if the default limit should be something like the default http.max_content_length divided by two or something along those lines?
May be we should (also?) check that this max size is actually not bigger than http.max_content_length and reject the settings in such a case?

@danielmitterdorfer
Member

Thanks for your input @dadoonet! I am not sure about your suggestion on the check of http.max_content_length though as this relies on a protocol level setting whereas I'd try to enforce bulk request size in a protocol-independent fashion. I mean this should also work if somebody uses only the transport protocol and then I wouldn't do a validation of this setting based on a http related one. But tbh I did not look into the code yet. wdyt?

@dadoonet
Member

ha! That is correct. I thought http.max_content_length was applied to transport layer as well...
Ignore me then! :D

@danielmitterdorfer
Member

Ok, thanks for the clarification.

@danielmitterdorfer
Member

Design Parameters

After looking at the source, I have identified a few options which I'd like to discuss before implementing this feature. Below are some design parameters which are worth considering from my point of view.

Type of endpoint

A (bulk) request can hit the cluster via the following endpoints:

  1. HTTP: hint: the REST handler reissues the HTTP request as a transport request via the client interface.
  2. Transport (or more precise: remote transport)
  3. Local transport: An optimization to avoid the overhead of a remote call when just forwarding a request within the same process.

Applicability of the limit

This means whether we want to apply the limit for all types of requests or just for a limited number of explicitly defined request types (like bulk requests).

When to check the limit

  • Apply during deserialization: This would allow us to abort request handling as soon as we know the limit is hit but at the expense of implementing this for all protocols separately. For the transport protocol, the right place looks like MessageChannelHandler#handleRequest().
  • Apply before a (transport) request is handled (i.e. in the appropriate subclass of TransportAction): This would simplify the validation as we would just check once for all endpoint types (as we know that HTTP requests are reissued as transport requests). However, at this point in time the request is already deserialized thus consuming memory and all we can do is preserve any additional system resources by aborting request processing.

Request size calculation

To know whether a request has (b)reached the limit, we need to calculate its size. Considering BulkRequest for now, we have two options on how to determine the request size:

  • BulkRequest#estimatedSize(): This works but has shortcomings in corner cases (see e.g. #15589)
  • If we enforce the limit already during deserialization we can calculate request size exactly, e.g. by writing a custom StreamInput implementation that counts the bytes deserialized.

Proposal

Based on the options above, I want to sketch a simple solution proposal:

Considering that it is likely we want limit checks not only for bulk requests but also similar ones, like multi-gets, we should not tie this too specifically to bulk requests.

Hence, each for each request type that should be size-limited, the corresponding TransportRequest class implements a new interface RequestSizeEstimator (draft name):

public interface RequestSizeEstimator {
  int getRequestSizeInBytes();
}

We define one configurable request size limit (default e.g. 50MB) for all request types and implement limit breach detection in a high-level fashion as an ActionFilter. This ActionFilter checks whether the TransportRequest implements RequestSizeEstimator and checks against this limit. If the limit is breached, we throw an appropriate exception indicating a client error (4xx in HTTP speak).

In summary, the pros and cons of this solution are:

Pros:

  • The limit is only checked in a single place in the code base
  • Easily extensible: We can add checks for new request types easily by having the corresponding transport request class implement RequestSizeEstimator.

Cons:

  • We deserialize the full request before we check the limit. This means we still hit a potential memory spike but we avoid using any other resources (CPU, I/O) after that point.
  • The size calculation may not be entirely accurate (see my comments above).

Feedback is very much appreciated.

@danielmitterdorfer
Member

Based on feedback I got by @bleskes, here's a revised approach:

We limit the size of requests on protocol level during deserialization. We consider two cases:

  • Receiving a request: We will apply one global limit for all requests that are in flight
  • Sending a request (internally in the cluster): We will apply a limit for each individual request

Limiting on protocol level has a couple of advantages:

  • We cancel request processing as soon as we hit the configured memory limit
  • It allows us to introduce a limit per action (not in the scope of this ticket though)

In the first step, we will implement an action-independent limit which applies to all actions not just bulk actions.

Transports that need to be considered are Netty transport and local transport (for testing purposes). HTTP is unaffected because we just stream data from the HTTP layer to the transport layer (no up-front allocation).

We need to check whether circuit breakers are a feasible option to detect limit breaches.

@danielmitterdorfer
Member

@dakrone: Could you share your thoughts whether using CircuitBreaker is feasible for limiting memory usage in the following two scenarios?

  1. Limiting the memory usage of all in-flight requests on a single node level.
  2. Limiting the memory usage of each individual outgoing request of a node.

In both cases the memory usage will be determined by the size of the deserialized request payload.

(more details above; that's just the summary for you)

Scenario 1

I have seen that there is already a "request" circuit breaker but its BreakerSettings are (a) cluster-wide and (b) relative to the amount of available heap memory. I'd tackle this by introducing new BreakerSettings for all requests in flight at node level (which would have to be a new value in org.elasticsearch.common.settings.Settings.Scope)) and with an absolute but user-configurable limit (e.g. 50mb).

Scenario 2

We also want to limit memory usage on a single request basis (not across all requests that are in flight) and tbh I think that circuit breakers are not a good fit because we would need a new circuit breaker instance for each request as far as I understand it. However, it would make the implementation more uniform.

It would be great to hear your input on this.

@dakrone
Member
dakrone commented Feb 1, 2016

@danielmitterdorfer

Could you share your thoughts whether using CircuitBreaker is feasible for
limiting memory usage in the following two scenarios?

Limiting the memory usage of all in-flight requests on a single node level.
Limiting the memory usage of each individual outgoing request of a node.

I have seen that there is already a "request" circuit breaker but its
BreakerSettings are (a) cluster-wide and (b) relative to the amount of
available heap memory. I'd tackle this by introducing new BreakerSettings for
all requests in flight at node level (which would have to be a new value in
org.elasticsearch.common.settings.Settings.Scope)) and with an absolute but
user-configurable limit (e.g. 50mb).

This is exactly what the circuit breaker is for, you can easily define a new
breaker (and breaker type) in HierarchyCircuitBreakerService to do just that.
I would recommend this heartily :) (and I'm happy to help)

We also want to limit memory usage on a single request basis (not across all
requests that are in flight) and tbh I think that circuit breakers are not a
good fit because we would need a new circuit breaker instance for each request
as far as I understand it. However, it would make the implementation more
uniform.

To me, this doesn't sound like a good fit for the circuit breaker (since it is
designed to be per-node, not per-request). Instead, this sounds like a better
fit for the validation framework, since that works on the per-request level.

@danielmitterdorfer
Member

Thanks for your feedback! I'll implement it based on your pointers. Would be great if you could take a look at it then.

@danielmitterdorfer
Member

@dakrone,

I have started implementing request size limiting based on your pointers (see danielmitterdorfer/elasticsearch@802f6ed). I have added a new circuit breaker to HierarchyCircuitBreakerService but then fell over CircuitBreakerServiceIT#testParentChecking(). The test spuriously hangs and the reason is that the parent circuit breaker tripped and we cannot get any request through at the transport layer afterwards. I am not sure whether we want the dependency among circuit breakers at all in this case. wdyt?

Other things I have noted:

  • I am setting an absolute limit for my new circuit breaker but it looks to me that the intention is that all leaf circuit breaker limits add up to 100%, so I think I should reduce the other values accordingly and create a relative setting instead of an absolute one, right?
  • I have introduced an adapter interface Limiter in my implementation and either use one that delegates to a circuit breaker (for scenario 1; in-flight requests) and one were I manually check the bytes used (scenario 2, per request limiting). I throw a CircuitBreakingException in both implementations but for the second case I think it is a bit too implementation specific and does not fit well. I can now either introduce a new exception or rename / move CircuitBreakingException to something like QuotaExceededException which would fit in both cases.
@dakrone
Member
dakrone commented Feb 4, 2016

The test spuriously hangs and the reason is that the parent circuit breaker
tripped and we cannot get any request through at the transport layer
afterwards. I am not sure whether we want the dependency among circuit
breakers at all in this case. wdyt?

This came up when I added the request breaker on BigArrays, because we always
want to be able to allocate a buffer to send an error response! The way that I
worked around it before was splitting it into a circuit-breaking and
non-circuit-breaking category, where the non-circuit-breaking did the
addWithoutBreaking(numBytes) so that the usage was still tracked, it just
didn't prevent requests from being sent out.

So in your case, I think we may want to limit which requests fall into the
circuit breaking category (index requests, bulk requests, etc), and which don't
(errors, cluster health, cluster state, etc).

What do you think?

I am setting an absolute limit for my new circuit breaker but it looks to me
that the intention is that all leaf circuit breaker limits add up to 100%, so
I think I should reduce the other values accordingly and create a relative
setting instead of an absolute one, right?

No, they definitely don't have to add up to 100%, that's why the parent is
there. The idea is that you can have something like this:

fielddata: 40%
big_arrays: 30%
requests (your new one): 50mb

parent: 60%

Which allows us to limit individual parts to certain amounts (like absolute
amounts), while still limiting circuit breaking memory across the node on the
whole (the parent breaker).

I definitely think this new breaker should be absolute instead of relative.
You're on the right track with it.

I have introduced an adapter interface Limiter in my implementation and either
use one that delegates to a circuit breaker (for scenario 1; in-flight
requests) and one were I manually check the bytes used (scenario 2, per
request limiting). I throw a CircuitBreakingException in both implementations
but for the second case I think it is a bit too implementation specific and
does not fit well. I can now either introduce a new exception or rename / move
CircuitBreakingException to something like QuotaExceededException which would
fit in both cases.

What makes you think it is too implementation specific? You could always
sub-class CircuitBreakingException if you wanted it to be a separate type, but
I don't think it's necessarily bad to keep it as such. It is a circuit breaker,
just on the per-request side instead of the node side, so with a good message I
think that can be made clear to the user.

@bleskes
Member
bleskes commented Feb 4, 2016

because we always
want to be able to allocate a buffer to send an error response!

I think it makes sense to never circuit break a response (both when sending and receiving).

@danielmitterdorfer
Member

@dakrone: First of all, thanks for your thoughts.

So in your case, I think we may want to limit which requests fall into the circuit breaking category (index requests, bulk requests, etc), and which don't (errors, cluster health, cluster state, etc).

With the current implementation this is not as easy as I've added a LimitingInputStream and an LimitingOutputStream which wraps the actual streams. Therefore, size limiting is transparent to the user. We could make the client aware though and expose an additional method for disabling limit tracking.

Another approach I am thinking of is to decide at creation time of stream based on the action whether it needs to break or not. If yes, we wrap the original one in a limiting stream, otherwise we just leave the stream as is. For me, the most appropriate place to add this support seems to be TransportRequestOptions.

For now I have stabilized the test in question by increasing the limit so it still breaks as intended but does not hit the request size limit. I think this is ok given that we need some minimum amount to handle the request at all.

[The child circuit breaker limits] definitely don't have to add up to 100%
[...]
I definitely think this new breaker should be absolute instead of relative.
You're on the right track with it.

That's great, then I'll leave that part as is. Thanks for the clarification. :)

I think it is a bit too implementation specific [to throw a CircuitBreakingException in all cases]
What makes you think it is too implementation specific?

My thought was that we should throw CircuitBreakingException only when a "proper" circuit breaker is involved (as the name of the exception indicates) but I am fine with using CircuitBreakingException in the other case as well.

I think it makes sense to never circuit break a response (both when sending and receiving).

@bleskes: Thanks for the hint. I have reduced the scope of size limiting to requests only (i.e. TransportStatus.isRequest(status) == true).

@danielmitterdorfer
Member

I have pushed another commit on the feature branch. We exhibit the following behaviour when a (bulk) request hits the limit:

  • Transport client: CircuitBreakingException is thrown
  • Node client: We get errors on the individual items of a bulk request (i.e. BulkItemResponse.isFailure() == true). This is due to the fact that the node client does not hit the transport layer but starts executing TransportBulkAction directly (i.e. in the same process).
  • REST API: This bypasses the transport layer so the recommended practice is to use http.max_content_length. While I think we could add support in NettyHttpServerTransport these two mechanisms are quite similar so I'd avoid reimplementing it there.

I am not too happy that we behave differently depending on the protocol but as this is implemented on (transport) protocol level, it is not much of a surprise. Wdyt @bleskes?

@danielmitterdorfer danielmitterdorfer changed the title from Limit the size of a bulk request to Limit request size Feb 29, 2016
@Dieken
Dieken commented Mar 5, 2016

@danielmitterdorfer I think the http.max_content_length is to limit content length for any single http request, not the total size of in-flight http requests.

I made a patch Dieken@f2d487e against v2.2.0 to limit total size of in-flight HTTP bulk requests as a temporary solution, because it seems your patch hasn't finished and it may be not merged to 2.x.

@danielmitterdorfer
Member

@Dieken You're right: http.max_content_length limits only individual requests so we need to tackle this differently.

Regarding my changes: Support on transport level is finished but we need to iron out some details (property names, default values).

Your patch looks fine for the bulk use case but I just want to raise your awareness of one (minor) thing though: You use LongAdder which is a JDK 8 class but Elasticsearch 2.x should be built with JDK 7.

@bleskes
Member
bleskes commented Mar 7, 2016

for people following - @danielmitterdorfer and I went through the code together and Daniel is working on another iteration.

@clintongormley clintongormley added v2.4.0 and removed v2.3.0 labels Mar 16, 2016
@danielmitterdorfer danielmitterdorfer added a commit to danielmitterdorfer/elasticsearch that referenced this issue Apr 13, 2016
@danielmitterdorfer danielmitterdorfer Limit request size on transport level
With this commit we limit the size of all in-flight requests on
transport level. The size is guarded by a circuit breaker and is
based on the content size of each request.

By default we use 100% of available heap meaning that the parent
circuit breaker will limit the maximum available size. This value
can be changed by adjusting the setting

network.breaker.inflight_requests.limit

Relates #16011
52b2016
@danielmitterdorfer danielmitterdorfer added a commit to danielmitterdorfer/elasticsearch that referenced this issue Apr 13, 2016
@danielmitterdorfer danielmitterdorfer Limit request size on HTTP level
With this commit we limit the size of all in-flight requests on
HTTP level. The size is guarded by the same circuit breaker that
is also used on transport level. Similarly, the size that is used
is HTTP content length.

Relates #16011
117bc68
@clintongormley clintongormley removed the v2.4.0 label Apr 13, 2016
@danielmitterdorfer danielmitterdorfer added a commit that referenced this issue Apr 14, 2016
@danielmitterdorfer danielmitterdorfer Limit request size on transport level
With this commit we limit the size of all in-flight requests on
transport level. The size is guarded by a circuit breaker and is
based on the content size of each request.

By default we use 100% of available heap meaning that the parent
circuit breaker will limit the maximum available size. This value
can be changed by adjusting the setting

network.breaker.inflight_requests.limit

Relates #16011
26ccd9a
@danielmitterdorfer danielmitterdorfer added a commit that referenced this issue Apr 14, 2016
@danielmitterdorfer danielmitterdorfer Limit request size on HTTP level
With this commit we limit the size of all in-flight requests on
HTTP level. The size is guarded by the same circuit breaker that
is also used on transport level. Similarly, the size that is used
is HTTP content length.

Relates #16011
1106f38
@danielmitterdorfer danielmitterdorfer removed their assignment Apr 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment