-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Description
What version of Go are you using (go version)?
go version go1.19.1 linux/arm
Does this issue reproduce with the latest release?
Unknown. I've been having this issue since I started working on this project roughly 24 months ago. Same behavior on all go versions.
What operating system and processor architecture are you using (go env)?
Raspberry Pi OS v10.11 on armv7l
go env Output
GO111MODULE="" GOARCH="arm" GOBIN="" GOCACHE="/root/.cache/go-build" GOENV="/root/.config/go/env" GOEXE="" GOEXPERIMENT="" GOFLAGS="" GOHOSTARCH="arm" GOHOSTOS="linux" GOINSECURE="" GOMODCACHE="/root/go_path/pkg/mod" GOOS="linux" GOPATH="/root/go_path" GOPROXY="https://proxy.golang.org,direct" GOROOT="/root/go" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/root/go/pkg/tool/linux_arm" GOVCS="" GOVERSION="go1.19.1" GCCGO="gccgo" GOARM="6" AR="ar" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="/dev/null" GOWORK="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -marm -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2944008957=/tmp/go-build -gno-record-gcc-switches"
What did you do?
I'm using go to build a set of services on the Raspberry Pi platform (Pi 4, CM4). Several of the higher duty-cycle services occasionally "run away", consuming upwards of 400% (100% x 4 cores) of available processing power. This usually happens after several hours or even a few days of running. When this happens, all of the goroutines in the application stop running, remaining in "runtime.gopark". (See the stack trace below.)
The program most given to this behavior is a network broadcaster. It subscribes to and receives data from a set of Redis pubsub channels and forwards that data to clients over UDP. The typical throughput is relatively low - only about 34 KB/second. It never gets above 38 KB/sec. It is run at a higher priority level, but it is not run as "realtime".
I've added a watchdog goroutine to try to detect the runway condition and terminate the service, but the watchdog timer never fires.
I tried pulling a profile using pprof but when the problem happens the http listener fails to respond.
I've tried tracing with strace but it doesn't return any data.
I've monitored open files with lsof and it does not appear to be leaking handles - open file count remains constant at 369.
I kicked it around with the Redigo team, and from what we can tell it's not something going on in Redigo - all the pubsub listeners simply get stuck in their waiting state.
What did you expect to see?
The application runs at its normal modest CPU load (5% - 8%)
What did you see instead?
The application sucks down as much CPU power as possible (395%)
Stack Trace
Too long to embed in this issue. Here it is as a gist:
https://gist.github.com/ssokol/b168de8b4546efd9b43a9d6af8538de9
Metadata
Metadata
Assignees
Labels
Type
Projects
Status