net/http: apparent deadlock in TestTransportDialCancelRace #45211
Comments
I think this is a
One goroutine is blocked in The other goroutine is blocked in |
Maybe unrelated, but we're actively debugging an issue (with Go 1.16.2) where some users get into a state where it seems
There's no evidence (panics in logs) that we hit |
@bradfitz what architecture? |
@neild What is supposed to happen there is that Looking at the netpoll code, it does seem possible that |
@bradfitz That seems like a different problem to me. Any idea what is holding the write lock? Do we even need to hold a write lock during |
@rsc, at least @ianlancetaylor, it's not obvious from the stack traces. Nothing stuck in a system call doing a write, for instance. I can email it to you privately. |
@ianlancetaylor I think the |
@neild Thanks. Now I wonder whether |
@bradfitz Sure, I can take a look though I don't suppose that I'll see anything. But something must be holding the lock, somehow. |
@bradfitz sent a stack trace offline. The stack trace showed a goroutine holding the lock that all the other goroutines are waiting for:
This means that the Normally How is this UDP socket created? It is remotely plausible that the ephemeral ports were exhausted on this system? |
I just recalled from the distant past (#18541) that Little Snitch running on macOS can cause EAGAIN in unexpected places. It is possible that the user that those stack traces are from is running Little Snitch (or some other network filter extension). Possibly related, I just encountered some runtime poller failures on a new M1, and I also run Little Snitch. I will write them up for you tomorrow, or you can ask Brad for details if you want them sooner. |
Also #18751. And FWIW, |
@ianlancetaylor, forgot to tell you in the email: those stacks were from macOS, not Linux. (not super obvious, but some have e.g. The UDP socket is created from: ... which is effectively
That's quite likely actually. |
FWIW, the
|
@bradfitz @josharian If the UDP socket is being created using Unfortunately I'm not sure how to verify that that is the problem, other than observing that I don't see how it could be anything else. The goroutine is clearly blocking waiting for Even more unfortunately I'm not sure how to fix this. We could reduce the severity by changing |
By the way I think we're pretty clearly tracking two independent problems in this one issue. |
It's not obvious to me whether this deadlock is a bug in the specific test function, the
net/http
package, or*httptest.Server
in particular.2021-03-24T14:20:32-747f426/linux-arm64-aws
CC @neild @bradfitz @empijei
The text was updated successfully, but these errors were encountered: