Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable the Netty recycler #22452

Merged
merged 1 commit into from Jan 6, 2017

Conversation

Projects
None yet
@jasontedor
Copy link
Member

commented Jan 5, 2017

Netty plays a lot of games with recycling byte buffers in thread local
caches.

The recycler in particular appears to be fraught with peril. It appears
that there are circumstances where the recycler does not recycle quickly
enough and can exceed its capacity leading to heap exhaustion and out of
memory errors. If you spend a few minutes reading the history of the
recycler on the Netty GitHub issues, it appears it has been nothing but
a source of trouble, and the project itself has an open issue that
proposes disabling by default and possibly even removing the recycler.

We are seeing users struggle with issues in 5.x that I think are largely
driven by some of the problems here with Netty.

This change proposes to disable the recycler. I think that disabling
this feature will return some of the stablity that this feature appears
to be losing us.

I have done performance testing on my workstation with disabling this
and I do not see a difference in performance. I propose that we make
this change in master and let some nightly benchmarks run to confirm
that there is not a difference in performance. If we are comfortable
with the performance changes, I propose backporting this to all active
branches.

Relates netty/netty#5904, #22406, #22360, #22189

Disable the Netty recycler and pooled allocator
Netty plays a lot of games with recycling byte buffers in thread local
caches, and using a pooled byte buffer allocator to reduce pressure on
the garbage collector.

The recycler in particular appears to be fraught with peril. It appears
that there are circumstances where the recycler does not recycle quickly
enough and can exceed its capacity leading to heap exhaustion and out of
memory errors. If you spend a few minutes reading the history of the
recycler on the Netty GitHub issues, it appears it has been nothing but
a source of trouble, and the project itself has an open issue that
proposes disabling by default and possibly even removing the recycler.

The pooled byte buffer allocator has problems itself. It sizes the pool
based on the number of runtime processors and can indeed grab a very
large percentage of the heap (in some cases 50% or more). Additionally,
the Netty project continues to struggle with leaks here.

We are seeing users struggle with issues in 5.x that I think are largely
driven by some of the problems here with Netty.

This change proposes to disable the recycler, and to disable the pooled
byte buffer allocator. I think that disabling these features will return
some of the stablity that these features appear to be losing us.

I have done performance testing on my workstation with disabling these
and I do not see a difference in performance. I propose that we make
this change in master and let some nightly benchmarks run to confirm
that there is not a difference in performance. If we are comfortable
with the performance changes, I propose backporting this to all active
branches.
@jaymode

jaymode approved these changes Jan 5, 2017

Copy link
Member

left a comment

LGTM

@jpountz

jpountz approved these changes Jan 5, 2017

Copy link
Contributor

left a comment

+1

@tbrooks8
Copy link
Contributor

left a comment

LGTM

@s1monw
Copy link
Contributor

left a comment

LGTM - man how much we gotta fiddle with this software after all...

@s1monw

s1monw approved these changes Jan 6, 2017

Copy link
Contributor

left a comment

wrong checkbox LGTM again

@jasontedor jasontedor merged commit 9219d66 into elastic:master Jan 6, 2017

2 checks passed

CLA Commit author is a member of Elasticsearch
Details
elasticsearch-ci Build finished.
Details

jasontedor added a commit that referenced this pull request Jan 10, 2017

Disable the Netty recycler
Netty plays a lot of games with recycling byte buffers in thread local
caches.

The recycler in particular appears to be fraught with peril. It appears
that there are circumstances where the recycler does not recycle quickly
enough and can exceed its capacity leading to heap exhaustion and out of
memory errors. If you spend a few minutes reading the history of the
recycler on the Netty GitHub issues, it appears it has been nothing but
a source of trouble, and the project itself has an open issue that
proposes disabling by default and possibly even removing the recycler.

We are seeing users struggle with issues in 5.x that I think are largely
driven by some of the problems here with Netty.

This change proposes to disable the recycler I think that disabling this
feature will return some of the stablity that these features appear to
be losing us.

Relates #22452

jasontedor added a commit that referenced this pull request Jan 10, 2017

Disable the Netty recycler
Netty plays a lot of games with recycling byte buffers in thread local
caches.

The recycler in particular appears to be fraught with peril. It appears
that there are circumstances where the recycler does not recycle quickly
enough and can exceed its capacity leading to heap exhaustion and out of
memory errors. If you spend a few minutes reading the history of the
recycler on the Netty GitHub issues, it appears it has been nothing but
a source of trouble, and the project itself has an open issue that
proposes disabling by default and possibly even removing the recycler.

We are seeing users struggle with issues in 5.x that I think are largely
driven by some of the problems here with Netty.

This change proposes to disable the recycler I think that disabling this
feature will return some of the stablity that these features appear to
be losing us.

Relates #22452

jasontedor added a commit that referenced this pull request Jan 10, 2017

Disable the Netty recycler
Netty plays a lot of games with recycling byte buffers in thread local
caches.

The recycler in particular appears to be fraught with peril. It appears
that there are circumstances where the recycler does not recycle quickly
enough and can exceed its capacity leading to heap exhaustion and out of
memory errors. If you spend a few minutes reading the history of the
recycler on the Netty GitHub issues, it appears it has been nothing but
a source of trouble, and the project itself has an open issue that
proposes disabling by default and possibly even removing the recycler.

We are seeing users struggle with issues in 5.x that I think are largely
driven by some of the problems here with Netty.

This change proposes to disable the recycler I think that disabling this
feature will return some of the stablity that these features appear to
be losing us.

Relates #22452
@jasontedor

This comment has been minimized.

Copy link
Member Author

commented Jan 10, 2017

Our nightly benchmarks show a small increase in GC times in a few (but not all) of the tracks; I'm going back out the pooled to unpooled change to isolate if this is the cause and have only backported the Netty recycler change for now (as this is the most critical change for addressing the performance issues).

jasontedor added a commit that referenced this pull request Jan 10, 2017

Remove disabling Netty pooled allocator
This commit reverts switching to the unpooled allocator (for now) to let
some benchmarks run to see if this is the source of an increase in GC
times.

Relates #22452

@jasontedor jasontedor deleted the jasontedor:ttyl-netty-games branch Jan 10, 2017

@ywelsch

This comment has been minimized.

Copy link
Contributor

commented Jan 11, 2017

@jasontedor can you update the version labels on this PR?

@jasontedor

This comment has been minimized.

Copy link
Member Author

commented Jan 11, 2017

Thank you @ywelsch, sorry for missing that. I've updated the labels.

@jasontedor

This comment has been minimized.

Copy link
Member Author

commented Jan 11, 2017

GC times on the few benchmarks where it increased went back down after pulling the disabling pooling change.

@jasontedor jasontedor changed the title Disable the Netty recycler and pooled allocator Disable the Netty recycler Jan 12, 2017

@jasontedor

This comment has been minimized.

Copy link
Member Author

commented Jan 12, 2017

I've updated the title and body to reflect that the disabling the pooled allocator has been reverted.

@duramen

This comment has been minimized.

Copy link

commented Jan 18, 2017

ES5.1.1,I've added follow to config/jvm.options,and restart es.But it's still have the question.

Dio.netty.recycler.maxCapacityPerThread=0
Dio.netty.allocator.type=unpooled

@clintongormley clintongormley removed the v5.3.0 label Jan 23, 2017

@dude0404

This comment has been minimized.

Copy link

commented Feb 1, 2017

i'm still experiencing the issue described here; catastrophic GC causing node failure with netty4utils logging fatal errors regarding out of memory
i'm running es 5.1.2
log attached
es512nettyerror.txt

@jasontedor

This comment has been minimized.

Copy link
Member Author

commented Feb 1, 2017

@dude0404 I glanced at your logs. I think that your issue has nothing to do with what was occurring here, I think that your heap is full. You should inspect a heap dump and see why your heap appears to be full. If you have additional questions, I suggest that you open a topic on the forum; Elastic reserves GitHub for verified bug reports and feature requests, not general discussion.

@mozinrat

This comment has been minimized.

Copy link

commented Feb 4, 2017

Even I am facing the same issue, with Dio.netty.recycler.maxCapacityPerThread=0
Dio.netty.allocator.type=unpooled on 5.1.1
log attached
es.txt

@jasontedor

This comment has been minimized.

Copy link
Member Author

commented Feb 6, 2017

It is not the case that disabling the recycler would eliminate all causes of out of memory errors in the network layer, only that it eliminated those that arose because of issues in Netty, not because, say, too much data is being pumped through Elasticsearch faster than it can be consumed. Thus, if you're still seeing out of memory issues after disabling the recycler, it means that you need to inspect a heap dump and see where your heap is being consumed.

@forestfantacy

This comment has been minimized.

Copy link

commented May 2, 2018

https://github.com/elastic/elasticsearch/issues/25860
https://github.com/elastic/elasticsearch/issues/22189
this problem happened in es5.x & netty4,
it can solving by downgrading to netty3,
but es6.2 is no longer support netty3, and netty4 don't work still, even though adding the config
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Dio.netty.allocator.type=unpooled

could you tell me how can i do?

@divyanshsinghvi

This comment has been minimized.

Copy link

commented May 22, 2018

@forestfantacy did you copied one elasticsearch server to other ?
Try deleting the directory data/nodes in both and run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.