What version of Go are you using (go version)?
$ go version
go version go1.15.3 linux/amd64
Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (go env)?
go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/niaow/.cache/go-build"
GOENV="/home/niaow/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/niaow/go/pkg/mod"
GONOPROXY="github.com/molecula"
GONOSUMDB="github.com/molecula"
GOOS="linux"
GOPATH="/home/niaow/go"
GOPRIVATE="github.com/molecula"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build497931594=/tmp/go-build -gno-record-gcc-switches"
What did you do?
Passed a single context with cancellation to a bunch of goroutines.
These goroutines had a cold-path compute task, interlaced with calls to context.Err() to detect cancellation.
The loop looks something like:
var out []Thing
for iterator.Next() {
if err := ctx.Err(); err != nil {
// caller doesn't need a result anymore.
return nil, err
}
// Fetch thing from iterator, apply some filtering functions, and append it to out.
}
What did you expect to see?
A bit of a slowdown from the context check maybe?
What did you see instead?
Slightly over 50% of the CPU time was spent in runtime.findrunnable. The cancelContext struct uses a sync.Mutex, and due to extreme lock contention (64 CPU threads spamming it), this was triggering lockSlow. From poking at pprof, it appears that about 86% of CPU time was spent in functions related to this lock acquire.
I was able to work around this by adding a counter and checking it less frequently. However, I do not think that this is an intended performance degradation path. Theoretically this could be made more efficient with sync/atomic, although I think a sync.RWMutex would still be more than sufficient.
What version of Go are you using (
go version)?Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env)?go envOutputWhat did you do?
Passed a single context with cancellation to a bunch of goroutines.
These goroutines had a cold-path compute task, interlaced with calls to
context.Err()to detect cancellation.The loop looks something like:
What did you expect to see?
A bit of a slowdown from the context check maybe?
What did you see instead?
Slightly over 50% of the CPU time was spent in
runtime.findrunnable. ThecancelContextstruct uses async.Mutex, and due to extreme lock contention (64 CPU threads spamming it), this was triggeringlockSlow. From poking at pprof, it appears that about 86% of CPU time was spent in functions related to this lock acquire.I was able to work around this by adding a counter and checking it less frequently. However, I do not think that this is an intended performance degradation path. Theoretically this could be made more efficient with
sync/atomic, although I think async.RWMutexwould still be more than sufficient.