x/net/http2: pool deadlock #32388
x/net/http2: pool deadlock #32388
Comments
Change https://golang.org/cl/179938 mentions this issue: |
Whelp....
I haven't reviewed CL 179938 but it seems a bit long/invasive. I wonder why the mutex needs to be held during that write. Rather than change everything so CanTakeRequest can work without taking the lock, can we instead not hold the lock during the write and not modify CanTakeRequest? |
I tried that but because It may be that adding a lock just around the update of |
There seems to be a mix of ways the locks are acquired but it is always mu then wmu. I can see value in acquiring wmu under mu and then releasing if we are trying to guarantee some ordering. |
The test only talks to one endpoint so the http/2 code will try to make these requests over the same TCP connection; it's designed to validate that the first request doesn't block the second request due to a stalled write; in this situation, it should just create a new connection but as the You are correct in that |
Change https://golang.org/cl/181457 mentions this issue: |
Change https://golang.org/cl/224797 mentions this issue: |
Change https://golang.org/cl/240338 mentions this issue: |
@fraenkel Thanks so much for helping fix this bug! We've run into this quite a few times on some service to service calls under specific conditions |
HTTP/2 support is golang has many problematic cornercases where dead connections would be kept. golang/go#32388 golang/go#39337 golang/go#39750 I suggest we disable HTTP/2 for now and enable it manually on the blackbox exporter. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
@fraenkel Thanks for this patch https://golang.org/cl/240338, I've manually applied the patch and the issue is gone. Can we have this merged so I can remove the manual patch? |
Any update on this @fraenkel? This patch: https://golang.org/cl/240338 has been working great for me in production for the last 2 months |
I believe this may be tangential which was resolved in 1.15.3 #42113 |
Given the description of that bug says "will cause all subsequent write requests to fail blindly" I think this is very different as its a definite deadlock. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Use the net/http client to make requests to an http2 supporting server.
What did you expect to see?
Requests should continue to succeed if a single connection becomes blocked in a write.
What did you see instead?
A single blocked connection prevented new http2 connections from being established due to the http2 pool requiring the connection lock when testing to see if an existing connection can handle new requests. This results in the entire application hanging indefinitely.
The following trace was created with an older go version 1.9 from our production app, but I have proved it's still possible to reproduce in the latest version as detailed above, as the locking between connections and pool haven't been changed.
This is similar to #23559 however that bug is specifically about certain areas of the http2 connections not paying attention to timeouts whereas this bug is caused by the interaction of individual connection locks and the pool lock.
The text was updated successfully, but these errors were encountered: