-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repeated "Could not process key for channel" IllegalStateException for unix sockets #4865
Comments
@mpuncel if you can reproduce easily, can you please enable DEBUG logging and attach here the log file? |
Note that with the release 9.4.29 that is just about to come out, we have probably improved the failure behaviour in this case due to some other changes. On exception handling we used to only close if there was an endpoint attachment. Now if there is no endpoint, we close the channel. Hopefully this will be enough to remove the key from the selector, so once something has gone bad, it doesn't keep going bad. If it is not enough to avoid looping on that key, then it is a case that we should explicitly cancel the key. |
Also thinking about the root cause if this issue.... if it was a race, where the key was being selected before the attachment was attached, then I would think the attachement would eventually be set and the exception stop. The fact that it doesn't stop suggests that it is not a race, but a lost attachement??? |
Yep, will do! I'll reply to this issue when I have them.
Yeah it does seem that way, although it could possibly be that enough keys get into a bad state that the exception spam itself prevents (or greatly slows down) the keys from transitioning to a good state? When this happens it does usually happen for a lot of keys all at once, it seems to be related to the number of concurrent connections/requests that jetty is processing |
I have some logs now, trying to pare them down into something sharable. Do you have any suggestions for what we want to see? Should I just filter for lines that have the relevant key or channel ID logged in the exception message? |
Here's a go at producing some logs. This is the first instance of the exception:
The exception happened 370 times
First timestamp is 2020-05-13T21:13:52,752, second is 2020-05-13T21:13:56,572, so it took 4 seconds to "recover". Under a production workload I've seen the same exception logged for over a minute (just found a 67 second one) so I think it can often take longer to recover, possibly due to how busy the server is. Before each exception I see this line, confirming the attachment is null
Eventually I see it get an attachment, suggesting it's a race after all
after that, I don't see the exception again for that same PollSelectionKey. Backing up a bit to track that UnixSocketEndpoint@5db2cc31, here are the first 10 lines where it appears
Let me know if there's anything else I should look for |
@mpuncel if you cannot redact the logs (but keep all the lines), then the approach is correct. We want to track every possible object related to that Track The disappearance of the attachment, but then reappearance, and final disappearance indeed seem related to an issue in JNR as we don't update the attachment once we set it. Bear in mind that support for Unix pipes is highly experimental. FTR, Unix pipes support could arrive in Java 15+ (https://openjdk.java.net/jeps/380). |
Thanks for the info! @gregw mentioned earlier that 9.4.29 might mitigate the issue, I'm wondering when that release will go out? |
@mpuncel 9.4.29 and even 9.4.30 were released. If either one of those solved the problem, please close the issue, otherwise please provide updated info. |
Jetty version
9.4.28.v20200408
Java version
11
OS type/version
Centos7
Description
I'm not sure if this is a jetty issue, or a jnr-unixsocket issue but I'd appreciate any directional help I can get on figuring this one out.
Under load, we sometimes see exceptions like the following when using unix sockets (we've never seen this with a port)
This exception will be thrown over and over again for the same channel/key combination.
I'm guessing what's happening is select() keeps returning the same key, but that key doesn't have an attachment that is usable, and we keep revisiting the same key over and over in processSelected().
I'd appreciate any pointers on how to track this down, thanks! I'm able to reproduce this reliably by sending a bunch of concurrent requests to an HTTP endpoint that sleeps for a few seconds.
The text was updated successfully, but these errors were encountered: