-
-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EPollArrayWrapper.epollWait 100% CPU Usage #327
Comments
|
I've done some googling and it looks like this could be a JDK bug. In fact I've actually found a post by @trustin on this very subject. But it looks like no changes were made back then. |
|
@trustin @normanmaurer Any thoughts on this? |
|
Please, update. Merged on new server wit j6u33 and now experiencing this bug. Will try to downgrade jre... |
|
It is happens on Netty 3.5.0 Final, 3.2.7 Final and 3.4.7 j6u33 and may be others. |
|
Yeah, I see this too with UltraESB. I think its a problem with anything that uses the Non-blocking IO library. |
|
I confirm this bug on Netty 3.3.0 Final (from Akka 2.0.2) & j7u4. |
|
Hi, We are seeing this issue in our production environment also under heavy concurrent load. Are there any updates/fix for this issue. On some forums there are discussions related to this issue being resolved by upgrading jdk, etc. I am not seeing this issue with netty.3.2.4.Final.jar but it is happening consistently with netty3.4.4.Final.jar Does anyone know if there is a work-around for this issue besides downgrading back to 3.2.4 netty jar version. Thanks in advance! |
|
I am observing this issue with Netty-3.2.6.Final.jar also, although less frequently. Is it possible that this issue got introduced before 3.3 or it is not a netty issue at all? |
|
Is there any reliable way to reproduce this problem? Then it would be much easier for me to fix this issue (probably it's more correct to say 'apply workaround'). |
|
Hi. This issue is occurring in Netty-3.2.4.Final.jar also. There is no reliable way to reproduce this issue, it is not happening consistently. |
|
Let me see if I can fix this ... |
…% cpu load. This is just a workaround but helps to recover. See #327
|
@trustin I think this is because of an epoll bug which still is not fixed or was fixed and is now present again. Please review the following workaround: So basically we do the following here:
The fix is kind of the same as jetty,grizzly and mina do :) WDYT ? |
|
it will be great if we can have the workaround provided by @normanmaurer . The jdk bug seems to be hanging around for sometime now, not sure if it is getting fixed anytime soon. |
|
@vikiitd I have lost the hope to get nio bugs fixed in core java a long time ago :( |
|
Workaround for the epoll(..) bug was commited. |
…ual wakup or because of a closed channel. See #327
…jboss.netty.epollBugWorkaround=true. It will be enabled by default later if it has proven to be stable. See #327
|
@endlesstian - Did you find the solution of your problem. I am facing the same problem with Netty 4.1.0.CR7, Jdk 1.8. and almost 10k thread in tomcat. Tomcat became irresponsive |
|
Looks like in recent version netty also has same issue ! OS : Java: CPU Usage: Netty Version: jstack "epollEventLoopGroup-3-8" #20 prio=10 os_prio=0 tid=0x00007f10ac016000 nid=0x811 runnable [0x00007f10b731a000] "epollEventLoopGroup-3-7" #19 prio=10 os_prio=0 tid=0x00007f10ac014000 nid=0x810 runnable [0x00007f10b741b000] "epollEventLoopGroup-3-6" #18 prio=10 os_prio=0 tid=0x00007f10ac012800 nid=0x80f runnable [0x00007f10b751c000] "epollEventLoopGroup-3-5" #17 prio=10 os_prio=0 tid=0x00007f10ac010800 nid=0x80e runnable [0x00007f10b761d000] "epollEventLoopGroup-3-4" #16 prio=10 os_prio=0 tid=0x00007f10ac00e800 nid=0x80d runnable [0x00007f10b771e000] |
|
@normanmaurer - is this issue still exist in 4.X ?? |
|
@senthilec566 are you sure you have 4.1.12.Final? These line numbers don't seem to be from 4.1.12.Final. |
|
@doom369 - Yes , doubled checked the jar .. cat pom.properties ps -o '%cpu,%mem' -p 6696 epollEventLoopGroup-3-13" #25 prio=10 os_prio=0 tid=0x00007fcc3804e800 nid=0x1a6a runnable [0x00007fcc42dcb000] "epollEventLoopGroup-3-12" #24 prio=10 os_prio=0 tid=0x00007fcc3804c800 nid=0x1a69 runnable [0x00007fcc42ecc000] "epollEventLoopGroup-3-11" #23 prio=10 os_prio=0 tid=0x00007fcc3804a800 nid=0x1a68 runnable [0x00007fcc42fcd000] |
|
Can you please attach a profiler snapshot ?
… On 4. Jul 2017, at 18:46, Senthilkumar K ***@***.***> wrote:
@doom369 <https://github.com/doom369> - Yes , doubled checked the jar ..
cat pom.properties
#Created by Apache Maven 3.3.9
version=4.1.12.Final
groupId=io.netty
artifactId=netty-all
ps -o '%cpu,%mem' -p 6696
%CPU %MEM
449 8.2
epollEventLoopGroup-3-13" #25 <#25> prio=10 os_prio=0 tid=0x00007fcc3804e800 nid=0x1a6a runnable [0x00007fcc42dcb000]
java.lang.Thread.State: RUNNABLE
at io.netty.channel.epoll.Native.epollWait0(Native Method)
at io.netty.channel.epoll.Native.epollWait(Native.java:111)
at io.netty.channel.epoll.EpollEventLoop.epollWait(EpollEventLoop.java:230)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:748)
"epollEventLoopGroup-3-12" #24 <#24> prio=10 os_prio=0 tid=0x00007fcc3804c800 nid=0x1a69 runnable [0x00007fcc42ecc000]
java.lang.Thread.State: RUNNABLE
at io.netty.channel.epoll.Native.epollWait0(Native Method)
at io.netty.channel.epoll.Native.epollWait(Native.java:111)
at io.netty.channel.epoll.EpollEventLoop.epollWait(EpollEventLoop.java:230)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:748)
"epollEventLoopGroup-3-11" #23 <#23> prio=10 os_prio=0 tid=0x00007fcc3804a800 nid=0x1a68 runnable [0x00007fcc42fcd000]
java.lang.Thread.State: RUNNABLE
at io.netty.channel.unix.FileDescriptor.writeAddress(Native Method)
at io.netty.channel.unix.FileDescriptor.writeAddress(FileDescriptor.java:135)
at io.netty.channel.epoll.AbstractEpollChannel.doWriteBytes(AbstractEpollChannel.java:290)
at io.netty.channel.epoll.AbstractEpollStreamChannel.writeBytes(AbstractEpollStreamChannel.java:260)
at io.netty.channel.epoll.AbstractEpollStreamChannel.doWriteSingle(AbstractEpollStreamChannel.java:472)
at io.netty.channel.epoll.AbstractEpollStreamChannel.doWrite(AbstractEpollStreamChannel.java:457)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#327 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAa0QuMMKy_qZCPCDQhGGn25quEZvCYtks5sKmx9gaJpZM4AB8w8>.
|
|
@normanmaurer - Attached SS for your reference ... Many times it crosses 80+ % .. but somehow in command line i see different result :( .. |
|
@senthilec566 can you attach a jmh / your kit profile ? |
|
Attaching YourKit SnapShot.. Hope this helps! |
|
Dead Lock :-(.. Potential deadlock: frozen threads found It seems that the following threads have not changed their stack for more than 10 seconds. epollEventLoopGroup-2-1 <--- Frozen for at least 58m 12s |
|
@normanmaurer - Any work around for this ? Should i try old version of Netty ?? |
|
@normanmaurer - Any findings ? Meantime i will try other version of netty .. |
|
@senthilec566 - your issue is using the Epoll JNI transport but this issue was opened against the NIO transport. Can you open another issue? Also what is your usage pattern look like? Do you have sockets being opened/closed constantly or just a few that are active? If you take successive snapshots (e..g 20 ish) do you see the stack trace change? Seeing the stack trace at epillWait is not necessarily a problem and can be common (it's how the OS tells us what events are pending and is called frequently). Also a reproducer would be nice. |
|
We are still having same issue on Java 1.8? "GS-LRMI-Selector-Read-Thread-0" daemon prio=10 tid=0x00007fdc3d550800 nid=0x7533 runnable [0x00007fdbbafdd000] Locked ownable synchronizers: |
|
Facing the same issue with jdk 1.8, netty 4.1.25. The below screenshot was taken when the thread count already hit 1450 (starting from 200) The event loop group thread count keeps on increasing leading to very high cpu usage and unresponsive container inspite of closing them regularly. One of the typical threads (having lots of instances) from jstack |
|
Hi All, We have hit this problem in a different project (not using Netty) and are looking for a fix/workaround. This only happens on Linux. We would really appreciate some help in learning how you solved this. Would someone please summarize the solution and source files involved? Sorry, it's difficult to piece together what finally worked given all the related discussions/issues. |
|
Hi all, The epoll bug is a well-known one with Java NIO framework, that seems fixed in NIO2. I hope Netty not having to delve into JNI/JNA/etc. to fix it, but in this case maybe you can even improve it. More info on #2515 |
|
We are still having same issue on Development environment jdk1.8,Operating environment jre 1.2. Locked ownable synchronizers: |
|
so how to slove this issue? |
|
My application changed netty from 3.6.0.Final to 3.10.6.Final and CPU usage returns to normal, JDK version is 1.8.0_101. |
hi, now, i have same issue. do you fix this ? |
|
i updated to jdk11 , not working... |


Hi,
I believe I have an issue similar to #302 but on Linux (Ubuntu 10.04) with JDK (1.6.0u30) and JDK(1.7.0u4) using Netty-4.0.0 (Revision: 52a7d28)
The app is proxying connections to backend systems. The proxy has a pool of channels that it can use to send requests to the backend systems. If the pool is low on channels, new channels are spawned and put into the pool so that requests sent to the proxy can be serviced. The pools get populated on app startup, so that is why it doesn't take long at all for the CPU to spike through the roof (22 seconds into the app lifecycle).
The test box has two CPUs, the output from 'top' is below:
Thread Dump for the four NioClient based Worker Threads that are chewing up all the CPU.
The text was updated successfully, but these errors were encountered: