EPollArrayWrapper.epollWait 100% CPU Usage #327

Closed
blucas opened this Issue May 16, 2012 · 55 comments

Projects

None yet
Contributor
blucas commented May 16, 2012

Hi,

I believe I have an issue similar to #302 but on Linux (Ubuntu 10.04) with JDK (1.6.0u30) and JDK(1.7.0u4) using Netty-4.0.0 (Revision: 52a7d28)

The app is proxying connections to backend systems. The proxy has a pool of channels that it can use to send requests to the backend systems. If the pool is low on channels, new channels are spawned and put into the pool so that requests sent to the proxy can be serviced. The pools get populated on app startup, so that is why it doesn't take long at all for the CPU to spike through the roof (22 seconds into the app lifecycle).

The test box has two CPUs, the output from 'top' is below:

PID  USER   PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
8220 root   20   0 2281m 741m  10m R 50.2 18.7  0:22.57 java                                                                             
8218 root   20   0 2281m 741m  10m R 49.9 18.7  0:22.65 java                                                                             
8219 root   20   0 2281m 741m  10m R 49.2 18.7  0:22.86 java                                                                             
8221 root   20   0 2281m 741m  10m R 49.2 18.7  0:22.20 java 

Thread Dump for the four NioClient based Worker Threads that are chewing up all the CPU.

"backend-worker-pool-7-thread-1" prio=10 tid=0x00007f5918015800 nid=0x201a runnable [0x00007f5924ba3000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0x000000008be93580> (a sun.nio.ch.Util$2)
    - locked <0x000000008be93570> (a java.util.Collections$UnmodifiableSet)
    - locked <0x000000008be92548> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at io.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:55)
    at io.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:261)
    at io.netty.channel.socket.nio.NioWorker.run(NioWorker.java:37)
    at io.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:43)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)   Locked ownable synchronizers:    - <0x000000008be00748> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
"backend-worker-pool-7-thread-2" prio=10 tid=0x00007f5918012000 nid=0x201b runnable [0x00007f5924b82000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0x000000008be94a28> (a sun.nio.ch.Util$2)
    - locked <0x000000008be94a18> (a java.util.Collections$UnmodifiableSet)
    - locked <0x000000008be90648> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at io.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:55)
    at io.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:261)
    at io.netty.channel.socket.nio.NioWorker.run(NioWorker.java:37)
    at io.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:43)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)   Locked ownable synchronizers:    - <0x000000008be904c8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
"backend-worker-pool-7-thread-3" prio=10 tid=0x00007f5918007800 nid=0x201c runnable [0x00007f5924b61000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0x000000008be952e0> (a sun.nio.ch.Util$2)
    - locked <0x000000008be952d0> (a java.util.Collections$UnmodifiableSet)
    - locked <0x000000008be8f858> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at io.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:55)
    at io.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:261)
    at io.netty.channel.socket.nio.NioWorker.run(NioWorker.java:37)
    at io.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:43)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)   Locked ownable synchronizers:    - <0x000000008be8f618> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
"backend-worker-pool-7-thread-4" prio=10 tid=0x00007f5918019000 nid=0x201d runnable [0x00007f5924b40000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0x000000008be003f8> (a sun.nio.ch.Util$2)
    - locked <0x000000008be003e8> (a java.util.Collections$UnmodifiableSet)
    - locked <0x000000008be00408> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at io.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:55)
    at io.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:261)
    at io.netty.channel.socket.nio.NioWorker.run(NioWorker.java:37)
    at io.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:43)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)   Locked ownable synchronizers:    - <0x000000008be004e0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
Contributor
blucas commented May 16, 2012

I've done some googling and it looks like this could be a JDK bug.
Take a look at the thread dump supplied on 13-MAY-2009. It is exactly the same as mine! Although they say the bug has been fixed, it obviously hasn't. Could you please investigate the workaround supplied in the link? They mention cancelling the selector and invoking a selectNow() method. I have NO clue where to do that myself to test if it actually works, so if someone could change it in the repo, I'm more than happy to give it a test.

In fact I've actually found a post by @Trustin on this very subject. But it looks like no changes were made back then.

Contributor
blucas commented May 18, 2012

@trustin @normanmaurer Any thoughts on this?

Please, update. Merged on new server wit j6u33 and now experiencing this bug. Will try to downgrade jre...

It is happens on Netty 3.5.0 Final, 3.2.7 Final and 3.4.7 j6u33 and may be others.

Yeah, I see this too with UltraESB.

I think its a problem with anything that uses the Non-blocking IO library.

incubos commented Jul 26, 2012

I confirm this bug on Netty 3.3.0 Final (from Akka 2.0.2) & j7u4.

vikiitd commented Aug 2, 2012

Hi,

We are seeing this issue in our production environment also under heavy concurrent load. Are there any updates/fix for this issue. On some forums there are discussions related to this issue being resolved by upgrading jdk, etc. I am not seeing this issue with netty.3.2.4.Final.jar but it is happening consistently with netty3.4.4.Final.jar Does anyone know if there is a work-around for this issue besides downgrading back to 3.2.4 netty jar version.

Thanks in advance!

vikiitd commented Aug 8, 2012

I am observing this issue with Netty-3.2.6.Final.jar also, although less frequently. Is it possible that this issue got introduced before 3.3 or it is not a netty issue at all?

Owner
trustin commented Aug 9, 2012

Is there any reliable way to reproduce this problem? Then it would be much easier for me to fix this issue (probably it's more correct to say 'apply workaround').

Hi. This issue is occurring in Netty-3.2.4.Final.jar also. There is no reliable way to reproduce this issue, it is not happening consistently.

@normanmaurer normanmaurer was assigned Aug 23, 2012
Owner

Let me see if I can fix this ...

@normanmaurer normanmaurer added a commit that referenced this issue Aug 25, 2012
@normanmaurer normanmaurer Re-create Selector if we hit the epoll(..) jdk bug which leads to 100…
…% cpu load. This is just a workaround but helps to recover. See #327
ff3f2b6
Owner

@trustin I think this is because of an epoll bug which still is not fixed or was fixed and is now present again.

Please review the following workaround:
#565

So basically we do the following here:

  • check if Selector.select(timeout) returns 0 and complete before the timeout
  • if so we increment a counter
  • if the counter == 10 we create a new selector and register all the Channels from the old one to the new one and close the old Selector. After that we reset the counter.
  • if the counter < 10 we just process
  • if Selector.select(timeout) returns > 0 or complete after the timeout we reset the counter

The fix is kind of the same as jetty,grizzly and mina do :) WDYT ?

vikiitd commented Aug 27, 2012

it will be great if we can have the workaround provided by @normanmaurer . The jdk bug seems to be hanging around for sometime now, not sure if it is getting fixed anytime soon.

Owner

@vikiitd I have lost the hope to get nio bugs fixed in core java a long time ago :(

Owner

Workaround for the epoll(..) bug was commited.

@normanmaurer normanmaurer reopened this Aug 31, 2012
@normanmaurer normanmaurer added a commit that referenced this issue Aug 31, 2012
@normanmaurer normanmaurer Correctly detect if the Selector.select(..) unblocks because of a man…
…ual wakup or because of a closed channel. See #327
63a234b
@normanmaurer normanmaurer added a commit that referenced this issue Aug 31, 2012
@normanmaurer normanmaurer Disable epoll bug workaround by default. It can be enabled via -Dorg.…
…jboss.netty.epollBugWorkaround=true. It will be enabled by default later if it has proven to be stable. See #327
45829ff
@leahxschmidt leahxschmidt added a commit to leahxschmidt/LittleProxy that referenced this issue Jan 3, 2013
@leahxschmidt leahxschmidt upgrade netty to fix CPU usage in Lantern refs netty/netty#327 9a67d42
cmunger commented Jan 18, 2013

We had the same issue on a solaris x86 platform and jdk 1.7.0_09 on multiple applications using the NIO api. Netty 3.5.11 was also flawed. We ended enabling the -Dorg.jboss.netty.epollBugWorkaround system property, this fixed the 100% cpu problem and we had 3-4 warnings a day saying the epoll bug was detected and selector recreated. I've decided to look a the next sun jvm release (1.7.0_10) and saw that they have fixed a similar bug : http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7200742. The good news is that after upgrading our softwares to jdk 1.7.0_11 the 100% cpu bug has disappeared. We are now running netty 3.5.11 without the org.jboss.netty.epollBugWorkaround system props fine. So upgrade your JDK.

@ghost
ghost commented May 3, 2013

Suspect I've also run into this with Netty 3.6.1.Final, on Openstack, Linux 3.2.0-38-virtual #61-Ubuntu SMP, with IcedTea OpenJDK 1.7.0_21 (7u21-2.3.9-0ubuntu0.12.04.1). At 100% CPU, every thread is sleeping except for the Netty workers, which are all in state NATIVE:

Thread 22428: (state = IN_NATIVE)
 - sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int) @bci=0 (Compiled frame; information may be imprecise)
 - sun.nio.ch.EPollArrayWrapper.poll(long) @bci=18, line=228 (Compiled frame)
 - sun.nio.ch.EPollSelectorImpl.doSelect(long) @bci=28, line=81 (Compiled frame)
 - sun.nio.ch.SelectorImpl.lockAndDoSelect(long) @bci=37, line=87 (Compiled frame)
 - sun.nio.ch.SelectorImpl.select(long) @bci=30, line=98 (Compiled frame)
 - org.jboss.netty.channel.socket.nio.SelectorUtil.select(java.nio.channels.Selector) @bci=4, line=64 (Compiled frame)
 - org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(java.nio.channels.Selector) @bci=1, line=410 (Compiled frame)
 - org.jboss.netty.channel.socket.nio.AbstractNioSelector.run() @bci=49, line=207 (Compiled frame)
 - org.jboss.netty.channel.socket.nio.AbstractNioWorker.run() @bci=1, line=88 (Interpreted frame)
 - org.jboss.netty.channel.socket.nio.NioWorker.run() @bci=1, line=178 (Interpreted frame)
 - org.jboss.netty.util.ThreadRenamingRunnable.run() @bci=55, line=108 (Interpreted frame)
 - org.jboss.netty.util.internal.DeadLockProofWorker$1.run() @bci=14, line=42 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1145 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=722 (Interpreted frame)

Strace shows heavy use of futex and epoll_wait, but I'm not sure what a baseline is here:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 82.23  195.596490        1578    123991     21585 futex
 17.01   40.453827        1508     26820           epoll_wait
  0.35    0.821100           1   1309799           gettimeofday
  0.23    0.552041       23002        24         7 restart_syscall
  0.08    0.185864          30      6144           sendto
  0.06    0.139350           6     22414           write
  0.04    0.096209           1    148338           clock_gettime
  0.00    0.010583           1      7175           recvfrom
  0.00    0.003684           0     25116      5638 read
  0.00    0.000000           0      3532           mprotect
  0.00    0.000000           0       156           rt_sigreturn
  0.00    0.000000           0      2230           sched_yield
------ ----------- ----------- --------- --------- ----------------
100.00  237.859148               1675739     27230 total

Haven't been able to reproduce outside of production yet. I've added -Dorg.jboss.netty.epollBugWorkaround=true; we'll see if that prevents the issue from recurring.

Owner
trustin commented May 3, 2013

I'm investigating this issue at Twitter. I can provide a JAR with debug logging enabled to have a better understanding of the problem. Will you run it for me and get back with the log file? It might perform worse due to large amount of messages.l though.

Kyle Kingsbury notifications@github.com wrote:

Suspect I've also run into this with Netty 3.6.1.Final, on Openstack,
Linux riemann-184 3.2.0-38-virtual #61-Ubuntu SMP, with IcedTea OpenJDK
1.7.0_21 (7u21-2.3.9-0ubuntu0.12.04.1). At 100% CPU, every thread is
sleeping except for the Netty workers, which are all in state NATIVE:

Thread 22428: (state = IN_NATIVE)
- sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int) @bci=0
(Compiled frame; information may be imprecise)
- sun.nio.ch.EPollArrayWrapper.poll(long) @bci=18, line=228 (Compiled
frame)
- sun.nio.ch.EPollSelectorImpl.doSelect(long) @bci=28, line=81
(Compiled frame)
- sun.nio.ch.SelectorImpl.lockAndDoSelect(long) @bci=37, line=87
(Compiled frame)
- sun.nio.ch.SelectorImpl.select(long) @bci=30, line=98 (Compiled
frame)
-
org.jboss.netty.channel.socket.nio.SelectorUtil.select(java.nio.channels.Selector)
@bci=4, line=64 (Compiled frame)
-
org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(java.nio.channels.Selector)
@bci=1, line=410 (Compiled frame)
- org.jboss.netty.channel.socket.nio.AbstractNioSelector.run() @bci=49,
line=207 (Compiled frame)
- org.jboss.netty.channel.socket.nio.AbstractNioWorker.run() @bci=1,
line=88 (Interpreted frame)
- org.jboss.netty.channel.socket.nio.NioWorker.run() @bci=1, line=178
(Interpreted frame)
- org.jboss.netty.util.ThreadRenamingRunnable.run() @bci=55, line=108
(Interpreted frame)
- org.jboss.netty.util.internal.DeadLockProofWorker$1.run() @bci=14,
line=42 (Interpreted frame)
-
java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
@bci=95, line=1145 (Interpreted frame)
- java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615
(Interpreted frame)
- java.lang.Thread.run() @bci=11, line=722 (Interpreted frame)

Haven't been able to reproduce outside of production yet.


Reply to this email directly or view it on GitHub:
#327 (comment)

sent from a mobile device
https://twitter.com/trustin
https://twitter.com/trustin_ko
https://twitter.com/netty_project

@ghost
ghost commented May 3, 2013

Trustin, would you like to shoot me an email? kingsbury@factual.com.

@ghost
ghost commented May 3, 2013

Swapped out the JVM for Oracle 1.7.0_21-b11. Still seeing surprisingly high CPU use, but it hasn't spiked to 100% yet. Jvisualvm claims pretty much everything is idle except for the Netty worker threads, which are spending their CPU time in sun.nio.ch.SelectorImpl.select():

screenshot from 2013-05-03 13 57 06

I have been seeing a similar issue in my production environment too. It's Ubuntu 12.04 with jdk 1.6.0_38-b05 and running Netty 3.6.2

Is this patch not in that release or am I missing something ?

@kingsbury @trustin what was the outcome of your debugging ? any pointers ?

@normanmaurer I saw your answer at hazelcast/hazelcast#81 and also tried the fix mentioned in that post but it seems to have not helped my cause.

I did a local test and bumped up my java version to 1.7 and the issue does not seem to have fixed itself.

My "New I/O worker"(s) keep on spinning even though there is no work to be done.

What else could I look at ?

s2oBCN commented Aug 3, 2013

I have the same issue in a production environment.
And I can't change the java version which is jRockit 1.5.0_26
My Netty version is <netty.version>3.6.2.Final</netty.version>
Is this problem fixed in newer versions?

Or Is there any workaround?

Thank you!!!!!!!!!

nettycpu

s2oBCN commented Aug 4, 2013

mmm, and I have a
Red Hat Enterprise Linux Server release 5.3
Linux version 2.6.18-128.el5

Maybe this is the epull bug, but I thought that this was ok in Netty 3.6.2.Final. ??

I think that one solution could be to change from NioClientSocketChannelFactory to OioClientSocketChannelFactory. So anyway, in my clients I only use one connection to the server and they never have a response from the server (setReadable(false)).

This is the trace:
"New I/O worker #1" id=57 idx=0xe0 tid=23001 prio=5 alive, in native
at sun/nio/ch/PollArrayWrapper.poll0(JIJ)I(Native Method)
at sun/nio/ch/PollArrayWrapper.poll(PollArrayWrapper.java:100)[inlined]
at sun/nio/ch/PollSelectorImpl.doSelect(PollSelectorImpl.java:56)[optimized]
at sun/nio/ch/SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)[inlined]
at sun/nio/ch/SelectorImpl.select(SelectorImpl.java:80)[inlined]
at org/jboss/netty/channel/socket/nio/SelectorUtil.select(SelectorUtil.java:64)[inlined]
at org/jboss/netty/channel/socket/nio/AbstractNioSelector.select(AbstractNioSelector.java:409)[optimized]
^-- Holding lock: sun/nio/ch/Util$1@ 0x87494e68[biased lock]
^-- Holding lock: java/util/Collections$UnmodifiableSet@ 0x87494e58[biased lock]
^-- Holding lock: sun/nio/ch/PollSelectorImpl@ 0x874926f8[biased lock]

Than you!!

mzafir commented Jun 13, 2014

Is this issue fixed? or simply got closed without a resolution

jhooda commented Jul 2, 2014

I'm seeing similar issue on Ubuntu 12.04with Sun/Oracle build 1.7.0_60-b19. Below is stack trace:
---------------------------8<---------------
"New I/O worker #7" prio=10 tid=0x00007f31d8d88000 nid=0x2485 runnable [0x00007f31bce65000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x00000000c0105168> (a sun.nio.ch.Util$2)
- locked <0x00000000c0105158> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000c0105040> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:409)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:206)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
--------------------------8<-------------------------------------

Owner

What kernel version?

Am 02.07.2014 um 20:42 schrieb jsh notifications@github.com:

I'm seeing similar issue on Ubuntu 12.04with Sun/Oracle build 1.7.0_60-b19. Below is stack trace:
---------------------------8<---------------
"New I/O worker #7" prio=10 tid=0x00007f31d8d88000 nid=0x2485 runnable [0x00007f31bce65000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)

  • locked (a sun.nio.ch.Util$2)
  • locked (a java.util.Collections$UnmodifiableSet)
  • locked (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
    at org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:409)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:206)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
    --------------------------8<-------------------------------------


Reply to this email directly or view it on GitHub.

mzafir commented Jul 2, 2014

its a RUNNABLE thread, what is the issue you are facing?

On Wed, Jul 2, 2014 at 11:44 AM, Norman Maurer notifications@github.com
wrote:

What kernel version?

Am 02.07.2014 um 20:42 schrieb jsh notifications@github.com:

I'm seeing similar issue on Ubuntu 12.04with Sun/Oracle build
1.7.0_60-b19. Below is stack trace:
---------------------------8<---------------
"New I/O worker #7" prio=10 tid=0x00007f31d8d88000 nid=0x2485 runnable
[0x00007f31bce65000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)

  • locked (a sun.nio.ch.Util$2)
  • locked (a java.util.Collections$UnmodifiableSet)
  • locked (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
    at
    org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)

at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:409)

at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:206)

at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)

at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

at
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)
--------------------------8<-------------------------------------


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub
#327 (comment).

jhooda commented Jul 2, 2014

I'm using io.netty.netty-3.7.1.Final.jar

jhooda commented Jul 2, 2014

I have one boss thread and 9 worker thread in RUNNABLE state that are driving up CPU (my webapp is in in idle state with zero requests) I was expecting the threads to be in WAITING state?

jhooda commented Jul 2, 2014

Sorry missed this info. kernel is "3.13.0-24-generic"

mzafir commented Jul 2, 2014

please do rpm -qa and let me know what netty release are you using.

On Wed, Jul 2, 2014 at 12:02 PM, jsh notifications@github.com wrote:

Sorry missed this info. kernel is "3.13.0-24-generic"


Reply to this email directly or view it on GitHub
#327 (comment).

jhooda commented Jul 2, 2014

We are using 3.7.1.Final: io.netty 3.7.1.Final (http://mvnrepository.com/artifact/io.netty/netty/3.7.1.Final)
(This netty version came bundled with play that we are using, version 2.2.3, from http://www.playframework.com/download)
Thanks.

cbatson commented Feb 3, 2015

I have been trying to make sense of this issue for ages. I'm wondering if it's not a problem with profiling: http://www.brendangregg.com/blog/2014-06-09/java-cpu-sampling-using-hprof.html

krasa commented Apr 13, 2015

@cbatson I have this problem now too, it seems related to YourKit. My app goes nuts when sampling is enabled and eats cpu with the agent enabled.
image

I think it is short of network resource

I think I may be seeing this issue on netty 4.0.33, java 8. Was the workaround for this issue removed in the 4.0 series?

ttashi commented Feb 24, 2016

I am seeing that too with Java 8 and Netty 3.10.5.Final

I have similar problems. java 8. About 5000 I/O workers created and cpu 80%. After restarting jetty everything is fine. Looks like problem in sun.nio.ch.SelectorImpl.select()
screen shot 2016-03-21 at 11 40 16 pm

hi
My netty server crashing very often please find the thread dump below
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x00000000d093b9a0> (a io.netty.channel.nio.SelectedSelectionKeySet)
- locked <0x00000000d0771468> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000d0771be8> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:622)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:310)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:744)

"nioEventLoopGroup-5-238" prio=10 tid=0x00007f4b2c1d4000 nid=0x77a2 runnable [0x00007f4918911000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x00000000d093b9c0> (a io.netty.channel.nio.SelectedSelectionKeySet)
- locked <0x00000000d0884ec8> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000d073db50> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:622)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:310)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:744)
did any one face this kind of issue

ozgurunc commented Apr 9, 2016

Hi have like this problem,

10.137.25.36.3869 10.133.142.93.3868 32768 0 102400 0 17/17 CLOSED

last pid: 8340; load avg: 1.06, 1.05, 1.03; up 3+07:02:58 23:16:48
82 processes: 80 sleeping, 2 on cpu
CPU states: 73.8% idle, 21.3% user, 4.9% kernel, 0.0% iowait, 0.0% swap
Memory: 4095M phys mem, 853M free mem, 16G total swap, 16G free swap

PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
7651 xxxxxx 137 20 0 2489M 556M cpu/3 53:36 25.21% java

if socket state is change with closed then cpu usage high for that thread aprox.. usage %25 cpu thraead dump below for that process...

Thread t@123: (state = IN_NATIVE)

  • sun.nio.ch.DevPollArrayWrapper.poll0(long, int, long, int) @bci=0 (Compiled frame; information may be impreci
    se)
  • sun.nio.ch.DevPollArrayWrapper.poll(long) @bci=18, line=223 (Compiled frame)
  • sun.nio.ch.DevPollSelectorImpl.doSelect(long) @bci=28, line=84 (Compiled frame)
  • sun.nio.ch.SelectorImpl.lockAndDoSelect(long) @bci=37, line=87 (Compiled frame)
  • sun.nio.ch.SelectorImpl.select(long) @bci=30, line=98 (Compiled frame)
  • io.netty.channel.nio.NioEventLoop.select(boolean) @bci=62, line=622 (Compiled frame)
  • io.netty.channel.nio.NioEventLoop.run() @bci=25, line=310 (Compiled frame)
  • io.netty.util.concurrent.SingleThreadEventExecutor$2.run() @bci=13, line=111 (Interpreted frame)
  • io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() @bci=4, line=137 (Interpreted fr
    ame)
  • java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)

hi, any solutiion to this problem?

We are running on a linux server using , Java 8 and Netty 3.10.4.Final

hi, any solutiion to this problem?
We are running on a linux server using , Java 8 and Netty 4.1.5.Final

"nioEventLoopGroup-3-2" #56 prio=10 os_prio=0 tid=0x0000000017f2b800 nid=0x46c7 runnable [0x00002af3fe020000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x00000006f3392138> (a io.netty.channel.nio.SelectedSelectionKeySet)
- locked <0x00000006f33941c8> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000006f33920f0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:731)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:391)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:873)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
at java.lang.Thread.run(Thread.java:745)

@normanmaurer

cpu 100%

"nioEventLoopGroup-3-2" #56 prio=10 os_prio=0 tid=0x0000000017f2b800 nid=0x46c7 runnable [0x00002af3fe020000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

  • locked <0x00000006f3392138> (a io.netty.channel.nio.SelectedSelectionKeySet)
  • locked <0x00000006f33941c8> (a java.util.Collections$UnmodifiableSet)
  • locked <0x00000006f33920f0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:731)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:391)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:873)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
    at java.lang.Thread.run(Thread.java:745)

@normanmaurer

kupci commented Sep 26, 2016

@adjsnlxx What exact version of the JDK are you using? Curious because we ran into a very similar issue, not with Netty but with Oracle Coherence, and turned out to be JDK bug. Our instances would lock up, hit max server threads and stop responding. What drew me to this thread was the epollWait calls in the thread dumps. Here was our thread stack trace:
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <6adfad0b> (a sun.nio.ch.Util$2)
- locked <3a50f2b4> (a java.util.Collections$UnmodifiableSet)
- locked (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at com.oracle.common.internal.net.MultiplexedSocketProvider$MultiplexedSelector.select(MultiplexedSocketProvider.java:484)

Thread analysis showed a thread trying to lock an object (DatagramPacket) that no other thread has locked. This is a soft reference so it is not cleaned up until full GC.

Again, this was Coherence, not Netty. Why I mention this is because the issue is similar and turned out to be a bug in the JDK. Coherence team has filed a bug but Java team closed as they couldn't reproduce it. (We were able to reproduce it with sufficient load.) Therefore, Coherence team resolved the issue.

Some details on this bug: This behavior observed in JDK 1.7.0_80 or later and JDK_1.8.0_45 or later (though as early as JDK1.8.0_25 but not as frequent).

Therefore it would be interesting to know the exact version of JDK you see this issue with.

@kupci I am using Java version is 1.8.0_101,netty 4.1.5

kupci commented Sep 27, 2016

@adjsnlxx Are you able to test the issue with the JDKs mentioned above, i.e. JDK before JDK1.7.0_80, or before JDK1.8.0_25? For example, we did not see issue with JDK1.7.0_25. I realize you may have Java 8 code, or they may be other bugs that prevent you from doing this, but it would help rule out the possibility that what you are seeing is a JDK bug.

hbprotoss commented Oct 14, 2016 edited

Same problem here. CentOS 7.2, kernel 3.10.0, JDK 1.8.0_77

"http-nio-8082-ClientPoller-1" #27 daemon prio=5 os_prio=0 tid=0x00007f7350162800 nid=0x738e runnable [0x00007f731d08a000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    - locked <0x000000008348ff50> (a sun.nio.ch.Util$2)
    - locked <0x000000008348ff40> (a java.util.Collections$UnmodifiableSet)
    - locked <0x000000008348fcc8> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at org.apache.tomcat.util.net.NioEndpoint$Poller.run(Redefined)
    at java.lang.Thread.run(Redefined)
Owner

@hbprotoss your stack shows tomcat and not netty, so its not really about netty. That said I think its a JDK / Kernel issue

Owner

@hbprotoss @kupci can any of you share a reproducer maybe ?

Hey Guys, we are having the same problem, however in our case the threads keep increasing and reaches even 6000. We are using Netty 3.10.5 Final and JDK 1.8.101.

Do you guys have any work around for this?

Member

Do you guys have any work around for this?

@amit2103 - Netty 3.x has been EOL a while ago I would recommend updating to 4.1. Also I don't think we have been able to reproduce this issue, and no reproducer has been supplied. Can you provide a reproducer based upon 4.1 (or 4.x)?

endlesstian commented Jan 3, 2017 edited

We have the same problem. Netty 4.1.0.CR7, Jdk 1.8. We have about 6000 threads for nioEventLoopGroup and eventually come to OOM.
`"nioEventLoopGroup-13832-2" #501109 prio=10 os_prio=0 cpu=0.00 [reset 0.00] ms elapsed=31.50 [reset 31.50] s allocated=259840 B (253.75 KB) [reset 259840 B (253.75 KB)] defined_classes=0
io= file i/o: 0/0 B, net i/o: 215/140 B, files opened:0, socks opened:-1 [reset file i/o: 0/0 B, net i/o: 215/140 B, files opened:0, socks opened:-1 ]
tid=0x00007f109499b800 nid=0x1ac82 / 109698 pthread-id=139697309652736 runnable [_thread_in_native (_at_safepoint), stack(0x00007f0dd0749000,0x00007f0dd084a000)] [0x00007f0dd0848000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(JIJI)I(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(J)I(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(J)I(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(J)I(SelectorImpl.java:86)
- locked <0x00000007c6625120> (a io.netty.channel.nio.SelectedSelectionKeySet)
- locked <0x00000007c6624ea8> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000007c6624d70> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(J)I(SelectorImpl.java:97)
at io.netty.channel.nio.NioEventLoop.select(Z)V(NioEventLoop.java:638)
at io.netty.channel.nio.NioEventLoop.run()V(NioEventLoop.java:324)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run()V(SingleThreadEventExecutor.java:742)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run()V(DefaultThreadFactory.java:145)
at java.lang.Thread.run()V(Thread.java:805)

Locked ownable synchronizers:
- None`

Member

@endlesstian - Can you provide a reproducer? Also can you update to the latest version of Netty (4.1.6) and list your version of JDK (also update to latest version if necessary).

@endlesstian - Did you find the solution of your problem. I am facing the same problem with Netty 4.1.0.CR7, Jdk 1.8. and almost 10k thread in tomcat. Tomcat became irresponsive

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment