-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU spikes with jetty winstone threads #4717
Comments
as it's happening when using Jenkins who has a layer on the top of Jetty.
Are you really running Jenkins server on OSX? |
Yes, we run Jenkins on OSX. Start jenkins using below args: |
Does it say what the thread is doing at sun.nio.ch.FileDispatcherImpl.preClose0(Native Method)? blocked trying to get a lock? Googling for that I do see a few reports of similar sounding issues... can you have a look and tell us which are most similar? |
sun.nio.ch.FileDispatcherImpl.preClose0(Native Method) is in runnable state but thread is not getting closed, stack trace is below, similar threads are keep on growing and not closing. |
So it is runnable but not progressing??? not waiting for a lock? It must then be spinning, which would explain your high CPU spike. |
We tried to two different Java/JVM, JDK 1.8_192 build and JDK1.8_241 which is latest, we are seeing issue with the both versions, upto to some point of time we are seeing CPU running stable, when these winstone threads starts late processing CPU starts spiking. |
@satyagv557 can you please try with JDK 11 or later? This is because the socket layer has been rewritten in JDK 11 and made more stable. |
I can try with that but this issue never happened until we update to Jenkins v2.219 and thought it's happening because of the issue #4537 and to fix that updated to Jenkins v2.25 which comes with Jetty v9.47 but issue still remain, Do you think this is Jetty issue? |
@satyagv557 we are not saying that changes in jetty have not indirectly caused the problem, just that we think it is a OS and/or JVM bug that is being tickled by the changes that we have made. |
We're running the winstone Jenkins 2.226 on the CI that does the Jetty builds. |
I am happy to do the java upgrade but there are several reported bugs with 11, I am worried about seeing new issues!, Is there any way we can confirm what causing this issue? If yes can you provide how can we conclude the issue is from JVM side or Jetty or OS? Thanks. |
Here is the some reference application where they seeing same issue and fixed |
"Jetty (winstone)-290959" Id=290959 Group=main RUNNABLE (in native) |
@sbordet @lorban FYI Akka worked around this problem with: akka/akka#25923 I think the essence of it is to do |
@satyagv557 is that stack from java11? |
Here's my theory of what's happening, based on the information at hand: Jetty's ChannelEndPoint.doClose() calls _channel.close() which eventually calls the FileDispatcherImpl.preClose0() native method. I suspect that under certain rare circumstances, When IOException is thrown, Jetty's ChannelEndPoint.doClose() catches it and logs at debug level. My theory is that since the file descriptor wasn't closed, the selector is going to be notified again that it can be closed, Jetty tries to close it again, Assuming my theory is correct, explicitly cancelling the key before closing the channel would explain why the problem goes away as that makes sure the selector won't wake up again for the unclosed file descriptor. To confirm it, I would need to know if FileDispatcherImpl.preClose0() does indeed throw an IOException, then what the reported error message would be since the error message is based on errno. Maybe by just enabling debug logs on org.eclipse.jetty.io.ChannelEndPoint and redirecting those logs to a specific file would generate little enough traffic to make it manageable? We could call _key.cancel() before calling _channel.close() to make the problem go away, but I fear that might hide an OS resource leak as I'm not certain the socket would eventually be closed. Maybe that would not happen, or rarely enough that it wouldn't be too important? |
Oh, and assuming Jenkins won't serve a ton of traffic, maybe switching the SelectorProvider to a different implementation would make the problem go away? Since the problem seems to be somewhere between the JVM and the OS, this has some chances to work around the problem. If you wan to try this out, just set the |
Just want to chime in on this issue. We ran into this as well, changing the selector provider per @lorban suggested works; however, this has many impacts on nio and a real fix is needed in jetty lib. |
@michaelkwan Thanks for trying out the suggested workaround. Could you please be more explicit about the nio impacts this has in your experience? As far as I can tell, this seems to be caused by a bug somewhere between the JVM and the OS, and there is little Jetty could do to solve it. If you could collect the debug logs of |
Akka reports : JDK 11: Async channel closing semantics changed This is something to confirm on our behaviors as well. |
This issue has been automatically marked as stale because it has been a |
We're seeing this happen consistently with Terracotta BigMemory 4.x (Jetty 9.4.45.v20220203) on MacOS (Monterey 12.3.1) with the newest Zulu-8 (1.8.0_322) but not in any newer JDK versions. |
@GaryWKeim can you enable debug logging temporarily and capture some activity? (maximum of 5 minutes). |
Also, when this occurs, the process becomes a zombie that can't be terminated in any way and a reboot is necessary to free up any files/ports the zombie is holding open: % ps auxww | grep "(java)" What packages would you like me to log at DEBUG? |
Log at DEBUG level on the named logger |
@GaryWKeim Have you tried the workaround I mentioned earlier, i.e.: set the |
Yes, the work-around seems to work. |
Your description seems to confirm what I previously stated: this seems to be caused by a bug somewhere between the JVM and the OS, and there is little Jetty can do to solve it. I'm afraid your only options are either to use the workaround or a JDK more recent than 1.8. |
I was only able to get some logging after the problem occurred. Maybe it will help |
Signed-off-by: Ludovic Orban <lorban@bitronix.be>
@GaryWKeim since you seem to be able to reproduce this problem, maybe you could test a candidate fix? I've created a branch which contains a change that should/could solve this problem: https://github.com/eclipse/jetty.project/tree/jetty-9.4.x-4717-macos-high-cpu I'd be glad if you could build it, try it out to see if the problem goes away AND make sure we're not leaking file descriptors with partially-closed sockets. Thanks! |
After pounding on this for several days now using your branch, I have not been able to reproduce the problem and I see no leaking file descriptors. |
Signed-off-by: Ludovic Orban <lorban@bitronix.be>
…t pass a null key to the ctor Signed-off-by: Ludovic Orban <lorban@bitronix.be>
Signed-off-by: Ludovic Orban <lorban@bitronix.be>
…t pass a null key to the ctor Signed-off-by: Ludovic Orban <lorban@bitronix.be>
This issue has been automatically marked as stale because it has been a |
@sbordet May I know the final workaround? |
Note: Jetty 9.x is now at End of Community Support See: The fix from @lorban was merged and is present in all currently released versions of Jetty 9 thru Jetty 12 |
OS: | Mac OS X, 10.14.6 , x86_64/64 (4 cores)
Java: | Java(TM) SE Runtime Environment, 1.8.0_241-b07
JVM: | Java HotSpot(TM) 64-Bit Server VM, 25.241-b07
Description
We are upgraded Jenkins v2.187 to 2.219 and this issue started happening, we tried with Jenkins 2.225 but issue remains.
our servers in production reported 100% CPU usage. The instance had been running for several days. The CPU spike occurred instantly and remained until the instance was rebooted.
Hundred's of Jetty (winstone) threads are filing up not getting closed with the method sun.nio.ch.FileDispatcherImpl.preClose0(Native Method)
Jetty (winstone)-66537
sun.nio.ch.FileDispatcherImpl.preClose0(Native Method)
sun.nio.ch.SocketDispatcher.preClose(SocketDispatcher.java:59)
sun.nio.ch.SocketChannelImpl.implCloseSelectableChannel(SocketChannelImpl.java:838)
java.nio.channels.spi.AbstractSelectableChannel.implCloseChannel(AbstractSelectableChannel.java:234)
java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:115)
org.eclipse.jetty.io.ChannelEndPoint.doClose(ChannelEndPoint.java:197)
org.eclipse.jetty.io.AbstractEndPoint.doOnClose(AbstractEndPoint.java:220)
org.eclipse.jetty.io.AbstractEndPoint.close(AbstractEndPoint.java:192)
org.eclipse.jetty.io.AbstractEndPoint.close(AbstractEndPoint.java:175)
org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.doClose(SslConnection.java:1366)
org.eclipse.jetty.io.AbstractEndPoint.doOnClose(AbstractEndPoint.java:220)
org.eclipse.jetty.io.AbstractEndPoint.close(AbstractEndPoint.java:192)
org.eclipse.jetty.io.AbstractEndPoint.close(AbstractEndPoint.java:175)
org.eclipse.jetty.io.AbstractConnection.close(AbstractConnection.java:254)
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:258)
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:543)
org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:398)
org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:161)
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
When normal state we are seeing below for jetty winstone thread looks like
Jetty (winstone)-14
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.reservedWait(ReservedThreadExecutor.java:304)
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
java.lang.Thread.run(Thread.java:748)
The text was updated successfully, but these errors were encountered: