Memory management: do not enforce the BigArrays limit on the network layer and the tranlog. #6374

jpountz · 2014-06-02T09:53:50Z

This commit disables memory circuit breaking in BigArrays on FsTranslog,
NettyHttpServerTransport and NettyTransport so that these components are not
impacted by heavy search requests.

Close #6332

…layer and the tranlog. This commit disables memory circuit breaking in BigArrays on FsTranslog, NettyHttpServerTransport and NettyTransport so that these components are not impacted by heavy search requests. Close elastic#6332

bleskes · 2014-06-02T10:08:56Z

src/main/java/org/elasticsearch/common/util/BigArrays.java

+     * used but has a different limit. This is typically useful for having component-specific limits that would be different
+     * from the default one.
+     */
+    public BigArrays limit(long newMaxSizeInBytes) {


I think the method name is a bit confusing as it doesn't set the limit to a new value but rather returns a new BigArray instance. I think we need a better name or maybe have new constructor BigArrays(BigArrays arrays, long limit)

I'd rather go with another name, since a method allows me to override it in MockBigArrays and keep the sanity checks. What about clone(long newMaxSizeInBytes)?

sounds good to me.

limitedTo(long bytes)? Clone is kinda non-specific.

bleskes · 2014-06-02T10:24:10Z

The commit looks good but I worry about the scenario that transmitting a large cluster state (for example), still counts towards all other requests because of the shared counter. This will cause all other requests to fail until the bytes are released. I wonder if we should just introduce a dedicated buffer for the cluster state/translog/critical memory users that shouldn't be blocked.

jpountz · 2014-06-02T10:33:41Z

This will cause all other requests to fail until the bytes are released.

I think this is a good property? Otherwise the risk is that the node would go out-of-memory which is worse? On a related note, the case that you just described was my motivation to open #6375 and try to make sure we do not set a limit that is too low as @kevinkluge mentioned on #6332.

Maybe the right fix for now would be to just completely back out this feature until we have a proper way to share memory across breakers.

bleskes · 2014-06-02T11:20:20Z

what worries me is that if you have a constant load of request that just works, and suddenly there comes a big cluster state update that pushes the buffer above a limit, you only see it in the requests and will not have a way of knowing what happened. This will be hard to debug because cluster state publishing is internal to ES. One alternative is log a warning in the log for that case? (letting X through, but limit has been breached). Another is to say the CS and translog is using their own unlimitted BigArrays as the PR doesn't limit it anyway.

dakrone · 2014-06-02T11:48:17Z

I think the safest thing to do is to leave the breaker as-is, but increase it to a very high amount until we're able to share memory across breakers. I think OOME and rejecting CS/translog are both bad, so I think I'd rather wait until we had it using a breaker that had the potential to shrink (since BigArray pages don't currently ever shrink), since otherwise we could end up overloading a cluster that could potentially recover while remaining under the memory limit.

jpountz · 2014-06-02T17:13:28Z

After more discussions with @bleskes @dakrone and others, we decided to postpone it to 1.3. Closed in favor of #6381.

jpountz added the review label Jun 2, 2014

bleskes reviewed Jun 2, 2014
View reviewed changes

jpountz closed this Jun 2, 2014

s1monw removed the review label Jun 18, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory management: do not enforce the BigArrays limit on the network layer and the tranlog. #6374

Memory management: do not enforce the BigArrays limit on the network layer and the tranlog. #6374

jpountz commented Jun 2, 2014

bleskes Jun 2, 2014

jpountz Jun 2, 2014

bleskes Jun 2, 2014

nik9000 Jun 2, 2014

bleskes commented Jun 2, 2014

jpountz commented Jun 2, 2014

bleskes commented Jun 2, 2014

dakrone commented Jun 2, 2014

jpountz commented Jun 2, 2014

Memory management: do not enforce the BigArrays limit on the network layer and the tranlog. #6374

Memory management: do not enforce the BigArrays limit on the network layer and the tranlog. #6374

Conversation

jpountz commented Jun 2, 2014

bleskes Jun 2, 2014

Choose a reason for hiding this comment

jpountz Jun 2, 2014

Choose a reason for hiding this comment

bleskes Jun 2, 2014

Choose a reason for hiding this comment

nik9000 Jun 2, 2014

Choose a reason for hiding this comment

bleskes commented Jun 2, 2014

jpountz commented Jun 2, 2014

bleskes commented Jun 2, 2014

dakrone commented Jun 2, 2014

jpountz commented Jun 2, 2014