Skip to content

runtime: occasional hard lockup / 100% CPU usage in go applications #56424

@ssokol

Description

@ssokol

What version of Go are you using (go version)?

go version go1.19.1 linux/arm

Does this issue reproduce with the latest release?

Unknown. I've been having this issue since I started working on this project roughly 24 months ago. Same behavior on all go versions.

What operating system and processor architecture are you using (go env)?

Raspberry Pi OS v10.11 on armv7l

go env Output
GO111MODULE=""
GOARCH="arm"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/root/go_path/pkg/mod"
GOOS="linux"
GOPATH="/root/go_path"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/root/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/root/go/pkg/tool/linux_arm"
GOVCS=""
GOVERSION="go1.19.1"
GCCGO="gccgo"
GOARM="6"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -marm -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2944008957=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I'm using go to build a set of services on the Raspberry Pi platform (Pi 4, CM4). Several of the higher duty-cycle services occasionally "run away", consuming upwards of 400% (100% x 4 cores) of available processing power. This usually happens after several hours or even a few days of running. When this happens, all of the goroutines in the application stop running, remaining in "runtime.gopark". (See the stack trace below.)

The program most given to this behavior is a network broadcaster. It subscribes to and receives data from a set of Redis pubsub channels and forwards that data to clients over UDP. The typical throughput is relatively low - only about 34 KB/second. It never gets above 38 KB/sec. It is run at a higher priority level, but it is not run as "realtime".

I've added a watchdog goroutine to try to detect the runway condition and terminate the service, but the watchdog timer never fires.

I tried pulling a profile using pprof but when the problem happens the http listener fails to respond.

I've tried tracing with strace but it doesn't return any data.

I've monitored open files with lsof and it does not appear to be leaking handles - open file count remains constant at 369.

I kicked it around with the Redigo team, and from what we can tell it's not something going on in Redigo - all the pubsub listeners simply get stuck in their waiting state.

What did you expect to see?

The application runs at its normal modest CPU load (5% - 8%)

What did you see instead?

The application sucks down as much CPU power as possible (395%)

Stack Trace

Too long to embed in this issue. Here it is as a gist:

https://gist.github.com/ssokol/b168de8b4546efd9b43a9d6af8538de9

Metadata

Metadata

Assignees

Labels

NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.arch-armIssues solely affecting the 32-bit arm architecture.compiler/runtimeIssues related to the Go compiler and/or runtime.

Type

No type

Projects

Status

Todo

Relationships

None yet

Development

No branches or pull requests

Issue actions