Description
Go version
go version go1.23.1 linux/amd64
Output of go env
in your module/workspace:
GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/davide.depau/.cache/go-build'
GOENV='/home/davide.depau/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/davide.depau/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/davide.depau/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/lib/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/lib/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.23.1'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/davide.depau/.config/go/telemetry'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/davide.depau/Projects/unix-socket-ws-hang/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build3677999444=/tmp/go-build -gno-record-gcc-switches'
What did you do?
I was writing a simple Unix socket HTTP + WebSocket proxy when I stumbled upon this issue. Roughly 1/3 of the times the server gets stuck when upgrading the connection to WebSocket. The culprit seems to be Hijack(), which attempts to abortPendingRead()
but the background read goroutine doesn't seem to receive the signal and therefore deadlocks.
I wrote a reproducer here, the steps to reproduce are in the README: https://github.com/depau/golang-unix-websocket-deadlock
In short:
- Run the server
- Run the client over and over until the server deadlocks (on my machine this happens at least once every 5 tries)
Although the sample code uses gorilla websockets, the issue doesn't seem related to it. To demonstrate it I additionally wrote a basic websocket implementation that you can trigger on the server by adding the --no-gorilla
command line argument.
What did you see happen?
This is a Goroutine dump from when the issue occurs on the server. Observe how the hijacking goroutine gets stuck in abortPendingRead
, while the backgroundRead
goroutine remains stuck on the read syscall.
This looks similar to #19747, but it's not the same issue since my system time is up to date.
goroutine profile: total 6
1 @ 0x40dc09 0x471969 0x665633 0x477b41
# 0x471968 os/signal.signal_recv+0x28 /usr/lib/go/src/runtime/sigqueue.go:152
# 0x665632 os/signal.loop+0x12 /usr/lib/go/src/os/signal/signal_unix.go:23
1 @ 0x431151 0x46eafd 0x66ba71 0x66b8a5 0x6686cb 0x6804ce 0x477b41
# 0x66ba70 runtime/pprof.writeRuntimeProfile+0xb0 /usr/lib/go/src/runtime/pprof/pprof.go:793
# 0x66b8a4 runtime/pprof.writeGoroutine+0x44 /usr/lib/go/src/runtime/pprof/pprof.go:752
# 0x6686ca runtime/pprof.(*Profile).WriteTo+0x14a /usr/lib/go/src/runtime/pprof/pprof.go:374
# 0x6804cd main.dumpGoroutinesAfter+0x4d /home/davide.depau/Projects/unix-socket-ws-hang/cmd/server/main.go:99
1 @ 0x46fbce 0x4084bc 0x408092 0x680345 0x477b41
# 0x680344 main.main.func1+0xe4 /home/davide.depau/Projects/unix-socket-ws-hang/cmd/server/main.go:79
1 @ 0x46fbce 0x4345f7 0x46eec5 0x4cb127 0x4cca35 0x4cca23 0x56ee29 0x581e56 0x581270 0x649bec 0x68015d 0x43bc2b 0x477b41
# 0x46eec4 internal/poll.runtime_pollWait+0x84 /usr/lib/go/src/runtime/netpoll.go:351
# 0x4cb126 internal/poll.(*pollDesc).wait+0x26 /usr/lib/go/src/internal/poll/fd_poll_runtime.go:84
# 0x4cca34 internal/poll.(*pollDesc).waitRead+0x294 /usr/lib/go/src/internal/poll/fd_poll_runtime.go:89
# 0x4cca22 internal/poll.(*FD).Accept+0x282 /usr/lib/go/src/internal/poll/fd_unix.go:620
# 0x56ee28 net.(*netFD).accept+0x28 /usr/lib/go/src/net/fd_unix.go:172
# 0x581e55 net.(*UnixListener).accept+0x15 /usr/lib/go/src/net/unixsock_posix.go:172
# 0x58126f net.(*UnixListener).Accept+0x2f /usr/lib/go/src/net/unixsock.go:260
# 0x649beb net/http.(*Server).Serve+0x30b /usr/lib/go/src/net/http/server.go:3330
# 0x68015c main.main+0x2dc /home/davide.depau/Projects/unix-socket-ws-hang/cmd/server/main.go:90
# 0x43bc2a runtime.main+0x28a /usr/lib/go/src/runtime/proc.go:272
1 @ 0x46fbce 0x471199 0x471179 0x47f725 0x6404c5 0x63edef 0x646a77 0x63d314 0x680e92 0x6521ce 0x645890 0x477b41
# 0x471178 sync.runtime_notifyListWait+0x138 /usr/lib/go/src/runtime/sema.go:587
# 0x47f724 sync.(*Cond).Wait+0x84 /usr/lib/go/src/sync/cond.go:71
# 0x6404c4 net/http.(*connReader).abortPendingRead+0xa4 /usr/lib/go/src/net/http/server.go:738
# 0x63edee net/http.(*conn).hijackLocked+0x2e /usr/lib/go/src/net/http/server.go:321
# 0x646a76 net/http.(*response).Hijack+0xd6 /usr/lib/go/src/net/http/server.go:2170
# 0x63d313 net/http.(*ResponseController).Hijack+0x193 /usr/lib/go/src/net/http/responsecontroller.go:71
# 0x680e91 main.(*server).ServeHTTP+0x991 /home/davide.depau/Projects/unix-socket-ws-hang/cmd/server/main.go:170
# 0x6521cd net/http.serverHandler.ServeHTTP+0x8d /usr/lib/go/src/net/http/server.go:3210
# 0x64588f net/http.(*conn).serve+0x5cf /usr/lib/go/src/net/http/server.go:2092
1 @ 0x489c05 0x488778 0x4cbe6e 0x4cbe56 0x4cbcf1 0x56dc65 0x577505 0x6402d7 0x477b41
# 0x489c04 syscall.Syscall+0x24 /usr/lib/go/src/syscall/syscall_linux.go:73
# 0x488777 syscall.read+0x37 /usr/lib/go/src/syscall/zsyscall_linux_amd64.go:736
# 0x4cbe6d syscall.Read+0x2ad /usr/lib/go/src/syscall/syscall_unix.go:183
# 0x4cbe55 internal/poll.ignoringEINTRIO+0x295 /usr/lib/go/src/internal/poll/fd_unix.go:745
# 0x4cbcf0 internal/poll.(*FD).Read+0x130 /usr/lib/go/src/internal/poll/fd_unix.go:161
# 0x56dc64 net.(*netFD).Read+0x24 /usr/lib/go/src/net/fd_posix.go:55
# 0x577504 net.(*conn).Read+0x44 /usr/lib/go/src/net/net.go:189
# 0x6402d6 net/http.(*connReader).backgroundRead+0x36 /usr/lib/go/src/net/http/server.go:690
What did you expect to see?
I expected it to not deadlock :)