Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory management: do not enforce the BigArrays limit on the network layer and the tranlog. #6374

Closed
wants to merge 1 commit into from

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Jun 2, 2014

This commit disables memory circuit breaking in BigArrays on FsTranslog,
NettyHttpServerTransport and NettyTransport so that these components are not
impacted by heavy search requests.

Close #6332

…layer and the tranlog.

This commit disables memory circuit breaking in BigArrays on FsTranslog,
NettyHttpServerTransport and NettyTransport so that these components are not
impacted by heavy search requests.

Close elastic#6332
@jpountz jpountz added the review label Jun 2, 2014
* used but has a different limit. This is typically useful for having component-specific limits that would be different
* from the default one.
*/
public BigArrays limit(long newMaxSizeInBytes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the method name is a bit confusing as it doesn't set the limit to a new value but rather returns a new BigArray instance. I think we need a better name or maybe have new constructor BigArrays(BigArrays arrays, long limit)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather go with another name, since a method allows me to override it in MockBigArrays and keep the sanity checks. What about clone(long newMaxSizeInBytes)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limitedTo(long bytes)? Clone is kinda non-specific.

@bleskes
Copy link
Contributor

bleskes commented Jun 2, 2014

The commit looks good but I worry about the scenario that transmitting a large cluster state (for example), still counts towards all other requests because of the shared counter. This will cause all other requests to fail until the bytes are released. I wonder if we should just introduce a dedicated buffer for the cluster state/translog/critical memory users that shouldn't be blocked.

@jpountz
Copy link
Contributor Author

jpountz commented Jun 2, 2014

This will cause all other requests to fail until the bytes are released.

I think this is a good property? Otherwise the risk is that the node would go out-of-memory which is worse? On a related note, the case that you just described was my motivation to open #6375 and try to make sure we do not set a limit that is too low as @kevinkluge mentioned on #6332.

Maybe the right fix for now would be to just completely back out this feature until we have a proper way to share memory across breakers.

@bleskes
Copy link
Contributor

bleskes commented Jun 2, 2014

what worries me is that if you have a constant load of request that just works, and suddenly there comes a big cluster state update that pushes the buffer above a limit, you only see it in the requests and will not have a way of knowing what happened. This will be hard to debug because cluster state publishing is internal to ES. One alternative is log a warning in the log for that case? (letting X through, but limit has been breached). Another is to say the CS and translog is using their own unlimitted BigArrays as the PR doesn't limit it anyway.

@dakrone
Copy link
Member

dakrone commented Jun 2, 2014

I think the safest thing to do is to leave the breaker as-is, but increase it to a very high amount until we're able to share memory across breakers. I think OOME and rejecting CS/translog are both bad, so I think I'd rather wait until we had it using a breaker that had the potential to shrink (since BigArray pages don't currently ever shrink), since otherwise we could end up overloading a cluster that could potentially recover while remaining under the memory limit.

@jpountz
Copy link
Contributor Author

jpountz commented Jun 2, 2014

After more discussions with @bleskes @dakrone and others, we decided to postpone it to 1.3. Closed in favor of #6381.

@jpountz jpountz closed this Jun 2, 2014
@s1monw s1monw removed the review label Jun 18, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory management: do not enforce the BigArrays limit on the network layer and the tranlog.
5 participants