-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIO ByteBuffer corruption in embedded Jetty server #4828
Comments
I'm moderately sure there is not a race between However, I do agree that is sounds plausible that a buffer was incorrectly recycled. We do have a So I think we are down to code inspection and thought experiments. Can you tell us a lot more about the load on your server. Is it all HTTP/1.? Do you have POSTs or GETs with content? If so is it chunked or length delimited? How big are your normal responses? are they normally chunked? I see a method called Lots of questions sorry and few answers at this stage. |
Hi Greg, To answer your questions:
The LeakTrackingByteBufferPool looks promising, we'll check if we can integrate it. |
Thanks for the extra info. I would consider removing the In your logs, can you tell if the first request that was not handled correctly was a HEAD request? Or was there a lost HEAD request in close association? @sbordet I'm guessing that direct buffers are now implemented as mapped buffers? |
@gregw no, direct buffers are direct buffers. Jetty produces In both cases the @chietti do you spawn threads in your application? Do you write to the response output stream from one of those spawned threads? |
@sbordet Not in this context. The CommunicationServlet is fully synchronous. We parse the protobuf content, handle the messages in there and write back the response all in the same Jetty thread context. I have to admit though that I can only speak for the protobuf layer. The server also hosts a lot of Jersey-based REST APIs that I am not familiar with as well well as a lot of other servlets which I do not know. |
@chietti you are using HTTP/1.1, I don't think other requests on other sockets can mess with this one. Are you able to figure out from where the |
FWIW, I get the same MappedByteBuffer lines in stacktrace when I run this on JDK 11:
|
@sbordet you mean why there is a DirectByteBuffer and not a HeapByteBuffer in use? |
As far as I understand the source code it cannot be caused by the _header or _chunk members since both are HeapByterBuffers (according to the HEADER_BUFFER_DIRECT and CHUNK_BUFFER_DIRECT constants). That leaves us with the _content member which is the _aggregate member in HttpOutput and that is a DirectByteBuffer (see HttpOutput.acquireBuffer()). I also verified that locally in the Debugger. |
I always forget this, so yeah, it's not a
Sorry what did you verify exactly? That the buffer that throws is the aggregate buffer? Do you have a reproducible test case for us? |
Sorry, that was a bit unclear: I verified that the _aggregate in HttpOutput is a DirectByteBuffer which makes it the only candidate for me since the other two involved buffers (_chunk and _header) are HeapByteBuffers for sure as far as I understand the code here: |
I had some time to look into the HttpOutput implementation and I am wondering about the call of releaseBuffer(). There are three places where it is called (completed(), onWriteComplete() and recycle()). In onWriteComplete() and recycle() the call is done in the synchronized block but in completed() it is done after the synchronized block. Is that intentional? If not, is it possible that these three methods might be called from different threads? If this is possible then there is a minimal window for releasing the buffer twice (if releaseBuffer is interrupted right after the release to the pool but before the member is reset to null on line 631) |
+ improve synchronization around releaseBuffer + improve synchronization around acquireBuffer + made acquireBuffer private. Signed-off-by: Greg Wilkins <gregw@webtide.com>
Hi @gregw, since we have no reproducer we cannot confirm that this fixes the bug. We will definitely keep an eye on this problem with our internal monitoring. |
I think you can close this ticket. We switched to 9.4.30 a long time ago and never faced this problem anymore |
thanks! |
Jetty 9.4.26.v20200117
OpenJDK 64-Bit Server VM, Version: 11.0.5
Linux, Version: 4.14.114-83.126.amzn1.x86_64
On one of our server cluster nodes we observed a sudden occurrence of these exceptions while writing http responses from our servlet on the embedded Jetty webserver:
At the same time the other servers of the cluster started to complain that they got messages which should never reach them. Looking at the exception it looks like multiple threads are using the same NIO ByteBuffer to write http responses to, that would explain that these responses did leak to targets that they are not intended to.
I did some research in the HttpConnection class source code and how it handles the pooled NIO ByteBuffers.
While the ByteBufferPool implementation we use (the default ArrayByteBufferPool) and the Bucket looks fine in terms of synchronization I am wondering if it is possible that a ByteBuffer is released twice into the pool.
HttpConnection.onCompleted() contains this piece of code with a comment that caught my attention:
Is it possible that onCompleted() is called while HttpConnection$SendCallBack.process() is executed? E.g. I have found calls to it via HttpChannel.onBadMessage()
HttpConnection$SendCallBack.process() contains this piece of code:
Since there is no synchronization it is possible that either one is interrupted before setting the _chunk member to null or a different value. So the chunk buffer could be released twice and remain in the pool for multiple threads to be used at the same time until the server is restarted (we did that and the problem has not reoccurred since then).
We assume it must be a rare fringe scenario since we never have seen this kind of behavior before in years but I do not know enough about the internal dynamics of Jetty to be sure if a race condition between these two code pieces is technically possible (although the comment suggests it).
Since the effect is rather devastating (we send messages to the wrong targets, causing any kinds of weird behavior) we would appreciate your insight on this problem.
The text was updated successfully, but these errors were encountered: