Skip to content

runtime/pprof: StopCPUProfile occasionally stuck with 100% CPU and process hang #52912

@breezewish

Description

@breezewish

What version of Go are you using (go version)?

go1.18.0

Does this issue reproduce with the latest release?

Not sure.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/Users/breezewish/Library/Caches/go-build"
GOENV="/Users/breezewish/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/breezewish/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/breezewish/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/opt/homebrew/Cellar/go/1.18.1/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/opt/homebrew/Cellar/go/1.18.1/libexec/pkg/tool/darwin_arm64"
GOVCS=""
GOVERSION="go1.18.1"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/6m/phy2_frd0vd1hb7g1py32qz40000gn/T/go-build2219610054=/tmp/go-build -gno-record-gcc-switches -fno-common"
GOROOT/bin/go version: go version go1.18.1 darwin/arm64
GOROOT/bin/go tool compile -V: compile version go1.18.1
uname -v: Darwin Kernel Version 21.4.0: Fri Mar 18 00:46:32 PDT 2022; root:xnu-8020.101.4~15/RELEASE_ARM64_T6000
ProductName:	macOS
ProductVersion:	12.3.1
BuildVersion:	21E258
lldb --version: lldb-1316.0.9.41
Apple Swift version 5.6 (swiftlang-5.6.0.323.62 clang-1316.0.20.8)

What did you do?

In tidb-server there is a feature with a pattern that repeatedly run pprof CPU profiler for 1 second (StartCPUProfile -> wait 1 sec -> StopCPUProfile -> StartCPUProfile -> wait 1 sec -> ...).

Recently in my MacOS M1 with this feature enabled, I observed that the tidb-server process was hang with 100% (1 core) CPU and it cannot process any requests.

According to the CPU profiling data provided by Instruments, looks like StopCPUProfile was looping infinitely at

for !atomic.Cas(&prof.signalLock, 0, 1) {
:

image

I have no idea how this issue can be reliably reproduced. Hope the stack provided by the Instruments helps.

What did you expect to see?

StopCPUProfile should not cause process hang.

What did you see instead?

Process was hanging.

Metadata

Metadata

Assignees

Labels

FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.WaitingForInfoIssue is not actionable because of missing required information, which needs to be provided.compiler/runtimeIssues related to the Go compiler and/or runtime.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions