Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPollArrayWrapper.epollWait 100% CPU Usage #327

Closed
blucas opened this issue May 16, 2012 · 69 comments

Comments

Projects
None yet
@blucas
Copy link
Contributor

commented May 16, 2012

Hi,

I believe I have an issue similar to #302 but on Linux (Ubuntu 10.04) with JDK (1.6.0u30) and JDK(1.7.0u4) using Netty-4.0.0 (Revision: 52a7d28)

The app is proxying connections to backend systems. The proxy has a pool of channels that it can use to send requests to the backend systems. If the pool is low on channels, new channels are spawned and put into the pool so that requests sent to the proxy can be serviced. The pools get populated on app startup, so that is why it doesn't take long at all for the CPU to spike through the roof (22 seconds into the app lifecycle).

The test box has two CPUs, the output from 'top' is below:

PID  USER   PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
8220 root   20   0 2281m 741m  10m R 50.2 18.7  0:22.57 java                                                                             
8218 root   20   0 2281m 741m  10m R 49.9 18.7  0:22.65 java                                                                             
8219 root   20   0 2281m 741m  10m R 49.2 18.7  0:22.86 java                                                                             
8221 root   20   0 2281m 741m  10m R 49.2 18.7  0:22.20 java 

Thread Dump for the four NioClient based Worker Threads that are chewing up all the CPU.

"backend-worker-pool-7-thread-1" prio=10 tid=0x00007f5918015800 nid=0x201a runnable [0x00007f5924ba3000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0x000000008be93580> (a sun.nio.ch.Util$2)
    - locked <0x000000008be93570> (a java.util.Collections$UnmodifiableSet)
    - locked <0x000000008be92548> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at io.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:55)
    at io.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:261)
    at io.netty.channel.socket.nio.NioWorker.run(NioWorker.java:37)
    at io.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:43)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)   Locked ownable synchronizers:    - <0x000000008be00748> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
"backend-worker-pool-7-thread-2" prio=10 tid=0x00007f5918012000 nid=0x201b runnable [0x00007f5924b82000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0x000000008be94a28> (a sun.nio.ch.Util$2)
    - locked <0x000000008be94a18> (a java.util.Collections$UnmodifiableSet)
    - locked <0x000000008be90648> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at io.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:55)
    at io.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:261)
    at io.netty.channel.socket.nio.NioWorker.run(NioWorker.java:37)
    at io.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:43)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)   Locked ownable synchronizers:    - <0x000000008be904c8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
"backend-worker-pool-7-thread-3" prio=10 tid=0x00007f5918007800 nid=0x201c runnable [0x00007f5924b61000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0x000000008be952e0> (a sun.nio.ch.Util$2)
    - locked <0x000000008be952d0> (a java.util.Collections$UnmodifiableSet)
    - locked <0x000000008be8f858> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at io.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:55)
    at io.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:261)
    at io.netty.channel.socket.nio.NioWorker.run(NioWorker.java:37)
    at io.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:43)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)   Locked ownable synchronizers:    - <0x000000008be8f618> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
"backend-worker-pool-7-thread-4" prio=10 tid=0x00007f5918019000 nid=0x201d runnable [0x00007f5924b40000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0x000000008be003f8> (a sun.nio.ch.Util$2)
    - locked <0x000000008be003e8> (a java.util.Collections$UnmodifiableSet)
    - locked <0x000000008be00408> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at io.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:55)
    at io.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:261)
    at io.netty.channel.socket.nio.NioWorker.run(NioWorker.java:37)
    at io.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:43)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)   Locked ownable synchronizers:    - <0x000000008be004e0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
@blucas

This comment has been minimized.

Copy link
Contributor Author

commented May 16, 2012

I've done some googling and it looks like this could be a JDK bug.
Take a look at the thread dump supplied on 13-MAY-2009. It is exactly the same as mine! Although they say the bug has been fixed, it obviously hasn't. Could you please investigate the workaround supplied in the link? They mention cancelling the selector and invoking a selectNow() method. I have NO clue where to do that myself to test if it actually works, so if someone could change it in the repo, I'm more than happy to give it a test.

In fact I've actually found a post by @trustin on this very subject. But it looks like no changes were made back then.

@blucas

This comment has been minimized.

Copy link
Contributor Author

commented May 18, 2012

@trustin @normanmaurer Any thoughts on this?

@Eirenliel

This comment has been minimized.

Copy link

commented Jun 14, 2012

Please, update. Merged on new server wit j6u33 and now experiencing this bug. Will try to downgrade jre...

@Eirenliel

This comment has been minimized.

Copy link

commented Jun 14, 2012

It is happens on Netty 3.5.0 Final, 3.2.7 Final and 3.4.7 j6u33 and may be others.

@twinforces

This comment has been minimized.

Copy link

commented Jul 9, 2012

Yeah, I see this too with UltraESB.

I think its a problem with anything that uses the Non-blocking IO library.

@incubos

This comment has been minimized.

Copy link

commented Jul 26, 2012

I confirm this bug on Netty 3.3.0 Final (from Akka 2.0.2) & j7u4.

@vikiitd

This comment has been minimized.

Copy link

commented Aug 2, 2012

Hi,

We are seeing this issue in our production environment also under heavy concurrent load. Are there any updates/fix for this issue. On some forums there are discussions related to this issue being resolved by upgrading jdk, etc. I am not seeing this issue with netty.3.2.4.Final.jar but it is happening consistently with netty3.4.4.Final.jar Does anyone know if there is a work-around for this issue besides downgrading back to 3.2.4 netty jar version.

Thanks in advance!

@vikiitd

This comment has been minimized.

Copy link

commented Aug 8, 2012

I am observing this issue with Netty-3.2.6.Final.jar also, although less frequently. Is it possible that this issue got introduced before 3.3 or it is not a netty issue at all?

@trustin

This comment has been minimized.

Copy link
Member

commented Aug 9, 2012

Is there any reliable way to reproduce this problem? Then it would be much easier for me to fix this issue (probably it's more correct to say 'apply workaround').

@vrajesh1989

This comment has been minimized.

Copy link

commented Aug 12, 2012

Hi. This issue is occurring in Netty-3.2.4.Final.jar also. There is no reliable way to reproduce this issue, it is not happening consistently.

@ghost ghost assigned normanmaurer Aug 23, 2012

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Aug 23, 2012

Let me see if I can fix this ...

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Aug 25, 2012

@trustin I think this is because of an epoll bug which still is not fixed or was fixed and is now present again.

Please review the following workaround:
#565

So basically we do the following here:

  • check if Selector.select(timeout) returns 0 and complete before the timeout
  • if so we increment a counter
  • if the counter == 10 we create a new selector and register all the Channels from the old one to the new one and close the old Selector. After that we reset the counter.
  • if the counter < 10 we just process
  • if Selector.select(timeout) returns > 0 or complete after the timeout we reset the counter

The fix is kind of the same as jetty,grizzly and mina do :) WDYT ?

@vikiitd

This comment has been minimized.

Copy link

commented Aug 27, 2012

it will be great if we can have the workaround provided by @normanmaurer . The jdk bug seems to be hanging around for sometime now, not sure if it is getting fixed anytime soon.

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Aug 27, 2012

@vikiitd I have lost the hope to get nio bugs fixed in core java a long time ago :(

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Aug 29, 2012

Workaround for the epoll(..) bug was commited.

@kupci

This comment has been minimized.

Copy link

commented Sep 27, 2016

@adjsnlxx Are you able to test the issue with the JDKs mentioned above, i.e. JDK before JDK1.7.0_80, or before JDK1.8.0_25? For example, we did not see issue with JDK1.7.0_25. I realize you may have Java 8 code, or they may be other bugs that prevent you from doing this, but it would help rule out the possibility that what you are seeing is a JDK bug.

@hbprotoss

This comment has been minimized.

Copy link

commented Oct 14, 2016

Same problem here. CentOS 7.2, kernel 3.10.0, JDK 1.8.0_77

"http-nio-8082-ClientPoller-1" #27 daemon prio=5 os_prio=0 tid=0x00007f7350162800 nid=0x738e runnable [0x00007f731d08a000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    - locked <0x000000008348ff50> (a sun.nio.ch.Util$2)
    - locked <0x000000008348ff40> (a java.util.Collections$UnmodifiableSet)
    - locked <0x000000008348fcc8> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at org.apache.tomcat.util.net.NioEndpoint$Poller.run(Redefined)
    at java.lang.Thread.run(Redefined)
@normanmaurer

This comment has been minimized.

Copy link
Member

commented Oct 14, 2016

@hbprotoss your stack shows tomcat and not netty, so its not really about netty. That said I think its a JDK / Kernel issue

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Nov 21, 2016

@hbprotoss @kupci can any of you share a reproducer maybe ?

@amit2103

This comment has been minimized.

Copy link

commented Nov 30, 2016

Hey Guys, we are having the same problem, however in our case the threads keep increasing and reaches even 6000. We are using Netty 3.10.5 Final and JDK 1.8.101.

Do you guys have any work around for this?

@Scottmitch

This comment has been minimized.

Copy link
Member

commented Dec 1, 2016

Do you guys have any work around for this?

@amit2103 - Netty 3.x has been EOL a while ago I would recommend updating to 4.1. Also I don't think we have been able to reproduce this issue, and no reproducer has been supplied. Can you provide a reproducer based upon 4.1 (or 4.x)?

@endlesstian

This comment has been minimized.

Copy link

commented Jan 3, 2017

We have the same problem. Netty 4.1.0.CR7, Jdk 1.8. We have about 6000 threads for nioEventLoopGroup and eventually come to OOM.
`"nioEventLoopGroup-13832-2" #501109 prio=10 os_prio=0 cpu=0.00 [reset 0.00] ms elapsed=31.50 [reset 31.50] s allocated=259840 B (253.75 KB) [reset 259840 B (253.75 KB)] defined_classes=0
io= file i/o: 0/0 B, net i/o: 215/140 B, files opened:0, socks opened:-1 [reset file i/o: 0/0 B, net i/o: 215/140 B, files opened:0, socks opened:-1 ]
tid=0x00007f109499b800 nid=0x1ac82 / 109698 pthread-id=139697309652736 runnable [_thread_in_native (_at_safepoint), stack(0x00007f0dd0749000,0x00007f0dd084a000)] [0x00007f0dd0848000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(JIJI)I(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(J)I(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(J)I(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(J)I(SelectorImpl.java:86)
- locked <0x00000007c6625120> (a io.netty.channel.nio.SelectedSelectionKeySet)
- locked <0x00000007c6624ea8> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000007c6624d70> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(J)I(SelectorImpl.java:97)
at io.netty.channel.nio.NioEventLoop.select(Z)V(NioEventLoop.java:638)
at io.netty.channel.nio.NioEventLoop.run()V(NioEventLoop.java:324)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run()V(SingleThreadEventExecutor.java:742)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run()V(DefaultThreadFactory.java:145)
at java.lang.Thread.run()V(Thread.java:805)

Locked ownable synchronizers:
- None`

@Scottmitch

This comment has been minimized.

Copy link
Member

commented Jan 3, 2017

@endlesstian - Can you provide a reproducer? Also can you update to the latest version of Netty (4.1.6) and list your version of JDK (also update to latest version if necessary).

@zaffargachal

This comment has been minimized.

Copy link

commented Feb 22, 2017

@endlesstian - Did you find the solution of your problem. I am facing the same problem with Netty 4.1.0.CR7, Jdk 1.8. and almost 10k thread in tomcat. Tomcat became irresponsive

@senthilec566

This comment has been minimized.

Copy link

commented Jul 4, 2017

Looks like in recent version netty also has same issue !

OS :
Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-121-generic x86_64)

Java:
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

CPU Usage:
ps -o '%cpu,%mem' -p 2023
%CPU %MEM
287 7.9

Netty Version:
4.1.12.Final

jstack

"epollEventLoopGroup-3-8" #20 prio=10 os_prio=0 tid=0x00007f10ac016000 nid=0x811 runnable [0x00007f10b731a000]
java.lang.Thread.State: RUNNABLE
at io.netty.channel.epoll.Native.epollWait0(Native Method)
at io.netty.channel.epoll.Native.epollWait(Native.java:117)
at io.netty.channel.epoll.EpollEventLoop.epollWait(EpollEventLoop.java:226)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
at java.lang.Thread.run(Thread.java:748)

"epollEventLoopGroup-3-7" #19 prio=10 os_prio=0 tid=0x00007f10ac014000 nid=0x810 runnable [0x00007f10b741b000]
java.lang.Thread.State: RUNNABLE
at io.netty.channel.epoll.Native.epollWait0(Native Method)
at io.netty.channel.epoll.Native.epollWait(Native.java:117)
at io.netty.channel.epoll.EpollEventLoop.epollWait(EpollEventLoop.java:226)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
at java.lang.Thread.run(Thread.java:748)

"epollEventLoopGroup-3-6" #18 prio=10 os_prio=0 tid=0x00007f10ac012800 nid=0x80f runnable [0x00007f10b751c000]
java.lang.Thread.State: RUNNABLE
at io.netty.channel.epoll.Native.epollWait0(Native Method)
at io.netty.channel.epoll.Native.epollWait(Native.java:117)
at io.netty.channel.epoll.EpollEventLoop.epollWait(EpollEventLoop.java:226)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
at java.lang.Thread.run(Thread.java:748)

"epollEventLoopGroup-3-5" #17 prio=10 os_prio=0 tid=0x00007f10ac010800 nid=0x80e runnable [0x00007f10b761d000]
java.lang.Thread.State: RUNNABLE
at io.netty.channel.epoll.Native.epollWait0(Native Method)
at io.netty.channel.epoll.Native.epollWait(Native.java:117)
at io.netty.channel.epoll.EpollEventLoop.epollWait(EpollEventLoop.java:226)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
at java.lang.Thread.run(Thread.java:748)

"epollEventLoopGroup-3-4" #16 prio=10 os_prio=0 tid=0x00007f10ac00e800 nid=0x80d runnable [0x00007f10b771e000]
java.lang.Thread.State: RUNNABLE
at io.netty.channel.epoll.Native.epollWait0(Native Method)
at io.netty.channel.epoll.Native.epollWait(Native.java:117)
at io.netty.channel.epoll.EpollEventLoop.epollWait(EpollEventLoop.java:226)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
at java.lang.Thread.run(Thread.java:748)

@senthilec566

This comment has been minimized.

Copy link

commented Jul 4, 2017

@normanmaurer - is this issue still exist in 4.X ??

@doom369

This comment has been minimized.

Copy link
Contributor

commented Jul 4, 2017

@senthilec566 are you sure you have 4.1.12.Final? These line numbers don't seem to be from 4.1.12.Final.

@senthilec566

This comment has been minimized.

Copy link

commented Jul 4, 2017

@doom369 - Yes , doubled checked the jar ..

cat pom.properties
#Created by Apache Maven 3.3.9
version=4.1.12.Final
groupId=io.netty
artifactId=netty-all

ps -o '%cpu,%mem' -p 6696
%CPU %MEM
449 8.2

epollEventLoopGroup-3-13" #25 prio=10 os_prio=0 tid=0x00007fcc3804e800 nid=0x1a6a runnable [0x00007fcc42dcb000]
java.lang.Thread.State: RUNNABLE
at io.netty.channel.epoll.Native.epollWait0(Native Method)
at io.netty.channel.epoll.Native.epollWait(Native.java:111)
at io.netty.channel.epoll.EpollEventLoop.epollWait(EpollEventLoop.java:230)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:748)

"epollEventLoopGroup-3-12" #24 prio=10 os_prio=0 tid=0x00007fcc3804c800 nid=0x1a69 runnable [0x00007fcc42ecc000]
java.lang.Thread.State: RUNNABLE
at io.netty.channel.epoll.Native.epollWait0(Native Method)
at io.netty.channel.epoll.Native.epollWait(Native.java:111)
at io.netty.channel.epoll.EpollEventLoop.epollWait(EpollEventLoop.java:230)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:748)

"epollEventLoopGroup-3-11" #23 prio=10 os_prio=0 tid=0x00007fcc3804a800 nid=0x1a68 runnable [0x00007fcc42fcd000]
java.lang.Thread.State: RUNNABLE
at io.netty.channel.unix.FileDescriptor.writeAddress(Native Method)
at io.netty.channel.unix.FileDescriptor.writeAddress(FileDescriptor.java:135)
at io.netty.channel.epoll.AbstractEpollChannel.doWriteBytes(AbstractEpollChannel.java:290)
at io.netty.channel.epoll.AbstractEpollStreamChannel.writeBytes(AbstractEpollStreamChannel.java:260)
at io.netty.channel.epoll.AbstractEpollStreamChannel.doWriteSingle(AbstractEpollStreamChannel.java:472)
at io.netty.channel.epoll.AbstractEpollStreamChannel.doWrite(AbstractEpollStreamChannel.java:457)

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Jul 4, 2017

@senthilec566

This comment has been minimized.

Copy link

commented Jul 5, 2017

screen shot 2017-07-05 at 12 29 29 pm

@normanmaurer - Attached SS for your reference ... Many times it crosses 80+ % .. but somehow in command line i see different result :( ..
ps -o '%cpu,%mem' -p 14442
%CPU %MEM
403 8.3

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Jul 5, 2017

@senthilec566 can you attach a jmh / your kit profile ?

@senthilec566

This comment has been minimized.

Copy link

commented Jul 5, 2017

snapshot.zip

Attaching YourKit SnapShot.. Hope this helps!

@senthilec566

This comment has been minimized.

Copy link

commented Jul 5, 2017

Dead Lock :-(..

Potential deadlock: frozen threads found

It seems that the following threads have not changed their stack for more than 10 seconds.
These threads are possibly (but not necessarily!) in a deadlock or hung.

epollEventLoopGroup-2-1 <--- Frozen for at least 58m 12s
io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java (native)
io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) Native.java:111
io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) EpollEventLoop.java:230
io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
io.netty.util.concurrent.SingleThreadEventExecutor$5.run() SingleThreadEventExecutor.java:858
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() DefaultThreadFactory.java:138
java.lang.Thread.run() Thread.java:748

@senthilec566

This comment has been minimized.

Copy link

commented Jul 5, 2017

@normanmaurer - Any work around for this ? Should i try old version of Netty ??

@senthilec566

This comment has been minimized.

Copy link

commented Jul 6, 2017

@normanmaurer - Any findings ? Meantime i will try other version of netty ..

@Scottmitch

This comment has been minimized.

Copy link
Member

commented Jul 6, 2017

@senthilec566 - your issue is using the Epoll JNI transport but this issue was opened against the NIO transport. Can you open another issue?

Also what is your usage pattern look like? Do you have sockets being opened/closed constantly or just a few that are active? If you take successive snapshots (e..g 20 ish) do you see the stack trace change? Seeing the stack trace at epillWait is not necessarily a problem and can be common (it's how the OS tells us what events are pending and is called frequently). Also a reproducer would be nice.

@umerpk

This comment has been minimized.

Copy link

commented Oct 26, 2017

We are still having same issue on Java 1.8?

"GS-LRMI-Selector-Read-Thread-0" daemon prio=10 tid=0x00007fdc3d550800 nid=0x7533 runnable [0x00007fdbbafdd000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x0000000572484890> (a sun.nio.ch.Util$2)
- locked <0x00000005724848a0> (a java.util.Collections$UnmodifiableSet)
- locked <0x0000000572484848> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at com.gigaspaces.lrmi.nio.selector.handler.AbstractSelectorThread.doSelect(AbstractSelectorThread.java:62)
at com.gigaspaces.lrmi.nio.selector.handler.AbstractSelectorThread.run(AbstractSelectorThread.java:51)
at java.lang.Thread.run(Thread.java:744)

Locked ownable synchronizers:
- None

@rohitsahay2000

This comment has been minimized.

Copy link

commented Nov 13, 2018

Facing the same issue with jdk 1.8, netty 4.1.25. The below screenshot was taken when the thread count already hit 1450 (starting from 200)

screen shot 2018-11-13 at 4 08 10 pm

The event loop group thread count keeps on increasing leading to very high cpu usage and unresponsive container inspite of closing them regularly.

One of the typical threads (having lots of instances) from jstack
nioEventLoopGroup-8870-1 tid=13320
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:281)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:105)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:98)
- locked <0x4f8900cd> (a io.netty.channel.nio.SelectedSelectionKeySet)
- locked <0x83732742> (a java.util.Collections$UnmodifiableSet)
- locked <0x288f21af> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:109)
at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62)
at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:756)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:411)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:811)

Locked ownable synchronizers:
- None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.