-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
100% CPU usage in Selector using Jetty on Windows #2205
Comments
This seems to reference Issue #1446 (using github syntax to trigger reference link) |
Agreed, it does seem to reference #1446. However #1446 says problem seemed to be corrected by jetty 9.4.6.v20170531. I have been running with 9.4.7.v20170914 and recently upgraded to 9.4.8.v20171121 in hopes that in would correct the problem. But it does not. So if the problem was corrected in 9.4.6, it looks like it returned in 9.4.7 and 9.4.8 |
I was able to get some debug logging turned on when the problem occurred. It looks like the same pattern is repeated constantly. Here's a snippet. Full logs are also available if you need them: 03/14/2018 18:04:58.782 DEBUG jetty WebSocketContainer@1018635868-36: Selector sun.nio.ch.WindowsSelectorImpl@142a7d96 woken up from select, 0/0 selected |
If you can build the jetty source, please try the current |
Thanks Joakim. I'll try to build the HEAD from source. Do you have an ETA on 9.4.9? |
Is the fix you're referring to for bug #2335? |
Yes, PR #2335 is the source of the fix. The 9.4.9 staged release should be available on oss.sonatype.org in a few hours. |
The Jetty 9.4.9 stage is available.
https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.9.v20180315 Staging Repository: https://oss.sonatype.org/content/repositories/jetty-1370/ |
Thanks Joakim.
I’ll give it a try.
Joe
From: Joakim Erdfelt [mailto:notifications@github.com]
Sent: Thursday, March 15, 2018 11:29 AM
To: eclipse/jetty.project <jetty.project@noreply.github.com>
Cc: Joseph Mokos <Joseph.Mokos@riverbed.com>; Author <author@noreply.github.com>
Subject: Re: [eclipse/jetty.project] 100% CPU usage in Selector using Jetty on Windows (#2205)
The Jetty 9.4.9 stage is available.
Note: this is not an official release, and only represents a potential release.
It will only be official once adequate testing has been performed and then promoted to general release.
https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.9.v20180315
Many many fixes, see above link for list.
Staging Repository: https://oss.sonatype.org/content/repositories/jetty-1370/
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#2205 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/Ai-XwA6R3c09pqFoghfzcv1PdCtREU2Yks5teojGgaJpZM4SMWmA>.
|
Converted over to 9.4.9. Will have to send update to our customer to determine if the 100% CPU problem is fixed. We were never able to recreate in-house. Assuming this fixes the problem, do you have any idea when 9.4.9 will be GA? |
Jetty 9.4.9.v20180320 has been released with a hack for the identified Windows JVM Selector NIO Bug. |
Finally got a chance to test 9.4.9 at client site and am still running into the same issue. Following is a snippet from the log that was taken while the 100% CPU was occurring: 03/30/2018 18:15:47.202 DEBUG jetty WebSocketContainer@1444944344-39: Selector sun.nio.ch.WindowsSelectorImpl@a0305fd woken up from select, 0/0 selected |
Do you have a testcase that can cause this? |
Unfortunately, no. So far, I have not been able to recreate this in-house. I have only seen it at one of our customer's and only on Windows 2012 R2 machines. |
What version of Java? |
jdk1.8.0_162 currently but we also saw the problem with jdk1.8.0_131 |
Here is that log sorted by thread:
|
I cannot see how this is not a JVM bug. These are 4 idle selectors with each having no keys at all, so they should all definitely block in select until it is woken up. I guess conceivably we have a bug were we are waking up the selector when there is nothing to do?? If we gave you a version with extra instrumentation could you run that on your clients site to capture more info? However, I'm really dubious we will be able to fix this in the main line of the project. We already have some code to work around windows bugs and I'll really loath to add more work arounds in such a core part of the code. In this case it looks like if there are zero keys then we'd need to wait on a separate mutex... but then we'd have a race, so we'd have to synchronize and we'd kill performance for the 99.999% of other users. At best I'm seeing a branch with a work around that you'd have to build/update yourself. |
Signed-off-by: Greg Wilkins <gregw@webtide.com>
Hi Greg, I can't disagree with anything you're saying. Those selects should block until the socket(s) is readable, are interrupted or are woken via wakeup. Let me check with my client to see if we can run a version with extra instrumentation. I don't know if this is important or not or just a coincidence, but this only happens on 3 servers (our app is running on many). They are running windows 2012 R2. And it always seems to happen 2 hours after our app starts and the websocket connection is made. This does not happen (or hasn't) on windows 2012 servers. |
Hi Greg, I was looking at the log and I noticed threads 36, 37, 39 and 41 are idle and experiencing this problem. The log message generated by these threads says: Selector sun.nio.ch.WindowsSelectorImpl@3b6ce69c woken up from select, 0/0 selected The 0/0 shows the (# of keys in the selected keys set)/(# of keys in the key set). I understand why the # of keys in the selected set is 0 but shouldn't the # of keys in the key set be at least 1. The only selector thread that is actually connected (42) seems to be functioning normally, ie "woken up from select, 1/1 selected". Is this all normal behavior? |
That is why we think it's a JVM bug. Having 0/0 and not blocking in |
Thanks for responding so quickly. Shouldn't the 2nd 0 be > 0. Does that mean the set of sockets we're interested in is empty? I wonder what the behavior is of select when the key set is empty. Would it block or return immediately? |
Yes the second zero should be > 0, i.e. there are no sockets managed by that selector. |
Looks like the nightly snapshot deployment has the instrumentation that @gregw added in commit 426fb95 Jetty Snapshot Repository URL: https://oss.sonatype.org/content/repositories/jetty-snapshots/ Please use version |
Thanks Joakim. I'll get it deployed to my customer's site as soon as I get the go ahead. |
Hi @joakime, Yes I can see "woken with none selected" in the debug logs. Its windows 2016 server. We have have similar installation at many customer sites but at this site only we see the issue. If its infact a OS or network drivers issue, then let me know if have any pointer to pin point this or debug so can get a fix for this. -Bhawani |
I have no advice on upgrading your OS or network drivers. |
Thanks @joakmine |
Hi @joakime, I agree that it is an underlying problem with the OS, but there is a workaround that Jetty could do. Netty has functionality that, when it detects this spinning situation, will rebuild the selector. This is behind a system property, which Jetty could also do. See https://github.com/netty/netty/blob/3.9/src/main/java/org/jboss/netty/channel/socket/nio/AbstractNioSelector.java#L129. |
@diffractious We've considered the Netty approach of rebuilding the selector several times, but you wind up losing all of the selector specific attachments when you do that. That impacts all of the active connections attached to that selector. We had rebuilding in the Jetty layer for spinning for a short while, but that impacted ALL other non-buggy environment usages far too harshly so it was removed. |
@diffractious you can use a workaround ServerConnector to rebuild on spurious select 0. |
@joakime nice! Thank you! |
One other update here. We've received reports that running Jetty behind IIS and it's |
maybe this is the reason. https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6778476 |
I don't think a selector.select() in a hung state would result in extra CPU usage. |
thanks, I got it😁 |
I find a quick way to reproduce it. |
Hi @joakime , we tried the workaround, it seems it worked in our labs. But when in production, it started breaking.
|
@bhawani1978 I don't see any problem with your stack trace. If you have 100% CPU, it's something else. |
@sbordet no high CPU but it just hangs and stops responding |
@bhawani1978 if you don't have 100% CPU please open a new issue, as this issue is not related to your problem. |
@sbordet I thaught, this would be right place as this happening after applying workaround as provided by @joakime https://github.com/jetty-project/selector-hack if we remove this hack, we most probably will start getting old issue. |
The hack just works around a hardware/driver/OS issue on your machine. The new issue you reported is something else that is unrelated to this issue. |
Hello,
|
Try to gather as much information about their system. It usually comes down to a few things:
There is not enough / adequate testing for that selector-hack branch.
ServerConnector is not responsible for how the connection is handled. Jetty is 100% NIO based, there's no support for blocking I/O concepts, in order to accomplish this you would need to rewrite vast swathes of the IO and Threading layers across all of Jetty. Not a trivial task, and one we have no desire to do at this point in time.
Be aware that Oracle is pushing their Loom thread model agenda with that comment.
Note that we have single instances of Jetty handling over 200,000 active connections across a variety of protocols (http/1, http/2, websocket, etc) just fine, we start to hit network interface bandwidth limits way before we hit any kind of selector limits. |
On Linux. |
This issue has been automatically marked as stale because it has been a |
This issue has been closed due to it having no activity. |
I am running embedded jetty 9.4.8.v20171121. My app runs fine for a period of time, usually about 2 hours, with very little CPU usage. Then, 2 threads, both named WebSocketContainer@1861866092-, start to consume 100% CPU when idle. A stack trace at this point yields the following:
This seems very similar to: #1446
but I am running a newer version of jetty.
I can only get this problem to occur on servers running Windows 2012R2. I am running java jdk1.8.0_162 although I have also seen the problem with jdk1.8.0_131.
Does anyone have any ideas what could be causing this?
The text was updated successfully, but these errors were encountered: