-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Description
Go version
go version go1.24.4 linux/arm64
Output of go env in your module/workspace:
AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOARCH='arm64'
GOARM64='v8.0'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1130418166=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='arm64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/dev/null'
GOMODCACHE='/root/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/root/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go1.24'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/root/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go1.24/pkg/tool/linux_arm64'
GOVCS=''
GOVERSION='go1.24.4'
GOWORK=''
PKG_CONFIG='pkg-config'What did you do?
I conducted benchmarks using this code on Go 1.24, comparing the default build against Go 1.24 with GOEXPERIMENT=nospinbitmutex on AWS EC2 c7g.4xlarge. Note that although the instance has 16 CPUs, I tested with -test.cpu values ranging from 1-32.
package main
import (
"sync"
"testing"
)
// Run with: go test runtime -test.run='^$' -test.bench=ChanContended -test.cpu="$(seq 1 32 | tr '\n' ',')" -test.count=10
func BenchmarkPart2LockContentionUnpredictable(b *testing.B) {
const requestsPerGoroutine = 10
var globalMutex sync.Mutex
var sharedCounter = 0
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
for i := 0; i < requestsPerGoroutine; i++ {
respChan := make(chan int, 1)
go processPart2LockContentionUnpredictable(i, respChan, &globalMutex, &sharedCounter)
<-respChan
}
}
})
}
func processPart2LockContentionUnpredictable(reqID int, respChan chan int, globalMutex *sync.Mutex, sharedCounter *int) {
results := make(chan int, 16)
for i := 0; i < 16; i++ {
go func(id int) {
globalMutex.Lock()
*sharedCounter++
// Some Workload
baseWork := 50
variance := (id*13 + reqID*7) % 199951
work := baseWork + variance
sum := id + reqID
for j := 0; j < work; j++ {
sum += j
}
globalMutex.Unlock()
results <- sum
}(i)
}
total := 0
for i := 0; i < 16; i++ {
total += <-results
}
respChan <- total
}
What did you see happen?
Benchstat 1.24 Run 1: Performance comparison between Go 1.24 default configuration and Go 1.24 with GOEXPERIMENT=nospinbitmutex
/root/go/bin/benchstat BenchmarkPart2LockContentionUnpredictable_go1.24_124_nospinbit.txt BenchmarkPart2LockContentionUnpredictable_go1.24_124.txt
goos: linux
goarch: arm64
pkg: chan-contended-1.24
│ BenchmarkPart2LockContentionUnpredictable_go1.24_124_nospinbit.txt │ BenchmarkPart2LockContentionUnpredictable_go1.24_124.txt │
│ sec/op │ sec/op vs base │
Part2LockContentionUnpredictable 115.5µ ± 0% 115.8µ ± 0% ~ (p=0.218 n=10)
Part2LockContentionUnpredictable-2 92.59µ ± 2% 85.16µ ± 1% -8.03% (p=0.000 n=10)
Part2LockContentionUnpredictable-3 70.09µ ± 1% 65.15µ ± 1% -7.04% (p=0.000 n=10)
Part2LockContentionUnpredictable-4 60.07µ ± 1% 58.49µ ± 1% -2.63% (p=0.000 n=10)
Part2LockContentionUnpredictable-5 55.98µ ± 1% 53.59µ ± 3% -4.27% (p=0.000 n=10)
Part2LockContentionUnpredictable-6 52.18µ ± 0% 55.11µ ± 1% +5.61% (p=0.002 n=10)
Part2LockContentionUnpredictable-7 52.25µ ± 1% 53.62µ ± 11% +2.63% (p=0.000 n=10)
Part2LockContentionUnpredictable-8 54.72µ ± 0% 63.78µ ± 11% +16.56% (p=0.000 n=10)
Part2LockContentionUnpredictable-9 58.87µ ± 1% 64.30µ ± 6% +9.22% (p=0.000 n=10)
Part2LockContentionUnpredictable-10 63.40µ ± 1% 67.22µ ± 5% +6.03% (p=0.000 n=10)
Part2LockContentionUnpredictable-11 65.92µ ± 1% 72.38µ ± 8% +9.81% (p=0.001 n=10)
Part2LockContentionUnpredictable-12 67.27µ ± 1% 73.67µ ± 8% +9.51% (p=0.003 n=10)
Part2LockContentionUnpredictable-13 67.89µ ± 0% 74.62µ ± 8% +9.91% (p=0.000 n=10)
Part2LockContentionUnpredictable-14 68.32µ ± 0% 75.33µ ± 9% +10.26% (p=0.000 n=10)
Part2LockContentionUnpredictable-15 68.50µ ± 0% 75.92µ ± 9% +10.84% (p=0.000 n=10)
Part2LockContentionUnpredictable-16 68.90µ ± 1% 70.17µ ± 9% +1.84% (p=0.000 n=10)
Part2LockContentionUnpredictable-17 68.64µ ± 0% 70.44µ ± 9% +2.63% (p=0.000 n=10)
Part2LockContentionUnpredictable-18 68.80µ ± 0% 70.34µ ± 9% +2.24% (p=0.000 n=10)
Part2LockContentionUnpredictable-19 68.69µ ± 0% 77.28µ ± 9% +12.51% (p=0.000 n=10)
Part2LockContentionUnpredictable-20 68.88µ ± 0% 76.92µ ± 8% +11.67% (p=0.000 n=10)
Part2LockContentionUnpredictable-21 68.89µ ± 0% 71.08µ ± 9% +3.18% (p=0.000 n=10)
Part2LockContentionUnpredictable-22 68.89µ ± 0% 71.78µ ± 9% +4.19% (p=0.000 n=10)
Part2LockContentionUnpredictable-23 69.11µ ± 1% 78.10µ ± 8% +13.01% (p=0.000 n=10)
Part2LockContentionUnpredictable-24 69.27µ ± 1% 71.80µ ± 10% +3.66% (p=0.000 n=10)
Part2LockContentionUnpredictable-25 69.06µ ± 1% 78.62µ ± 9% +13.83% (p=0.000 n=10)
Part2LockContentionUnpredictable-26 69.33µ ± 1% 79.37µ ± 9% +14.47% (p=0.000 n=10)
Part2LockContentionUnpredictable-27 69.48µ ± 0% 72.49µ ± 10% +4.32% (p=0.000 n=10)
Part2LockContentionUnpredictable-28 69.51µ ± 0% 73.00µ ± 9% +5.02% (p=0.000 n=10)
Part2LockContentionUnpredictable-29 69.52µ ± 0% 72.85µ ± 10% +4.78% (p=0.000 n=10)
Part2LockContentionUnpredictable-30 69.74µ ± 0% 72.81µ ± 11% +4.40% (p=0.000 n=10)
Part2LockContentionUnpredictable-31 69.90µ ± 0% 73.13µ ± 11% +4.63% (p=0.000 n=10)
Part2LockContentionUnpredictable-32 69.88µ ± 0% 80.91µ ± 10% +15.79% (p=0.000 n=10)
geomean 67.70µ 71.61µ +5.78%
Benchstat 1.24 Run 2: Performance comparison between Go 1.24 default configuration and Go 1.24 with GOEXPERIMENT=nospinbitmutex
/root/go/bin/benchstat BenchmarkPart2LockContentionUnpredictable_go1.23_123.txt BenchmarkPart2LockContentionUnpredictable_go1.24_124.txt
goos: linux
goarch: arm64
pkg: chan-contended-1.24
│ BenchmarkPart2LockContentionUnpredictable_go1.23_123.txt │ BenchmarkPart2LockContentionUnpredictable_go1.24_124.txt │
│ sec/op │ sec/op vs base │
Part2LockContentionUnpredictable 117.9µ ± 0% 115.8µ ± 0% -1.70% (p=0.000 n=10)
Part2LockContentionUnpredictable-2 89.26µ ± 1% 89.86µ ± 2% ~ (p=0.280 n=10)
Part2LockContentionUnpredictable-3 67.46µ ± 2% 67.01µ ± 1% ~ (p=0.579 n=10)
Part2LockContentionUnpredictable-4 60.64µ ± 1% 59.93µ ± 1% -1.18% (p=0.000 n=10)
Part2LockContentionUnpredictable-5 55.17µ ± 1% 54.48µ ± 1% -1.24% (p=0.001 n=10)
Part2LockContentionUnpredictable-6 52.51µ ± 0% 52.49µ ± 1% ~ (p=0.739 n=10)
Part2LockContentionUnpredictable-7 52.78µ ± 0% 54.53µ ± 3% +3.33% (p=0.001 n=10)
Part2LockContentionUnpredictable-8 55.29µ ± 0% 58.52µ ± 5% +5.85% (p=0.000 n=10)
Part2LockContentionUnpredictable-9 59.62µ ± 1% 61.38µ ± 4% ~ (p=0.700 n=10)
Part2LockContentionUnpredictable-10 64.47µ ± 1% 66.37µ ± 6% ~ (p=0.481 n=10)
Part2LockContentionUnpredictable-11 67.47µ ± 1% 64.39µ ± 1% -4.56% (p=0.002 n=10)
Part2LockContentionUnpredictable-12 68.70µ ± 1% 69.77µ ± 6% ~ (p=0.481 n=10)
Part2LockContentionUnpredictable-13 69.62µ ± 0% 70.46µ ± 5% ~ (p=0.137 n=10)
Part2LockContentionUnpredictable-14 69.55µ ± 0% 67.20µ ± 6% ~ (p=0.143 n=10)
Part2LockContentionUnpredictable-15 70.09µ ± 0% 69.79µ ± 4% ~ (p=1.000 n=10)
Part2LockContentionUnpredictable-16 70.23µ ± 1% 72.21µ ± 5% +2.82% (p=0.023 n=10)
Part2LockContentionUnpredictable-17 69.91µ ± 1% 68.81µ ± 5% ~ (p=0.143 n=10)
Part2LockContentionUnpredictable-18 69.78µ ± 0% 72.72µ ± 5% +4.21% (p=0.023 n=10)
Part2LockContentionUnpredictable-19 69.87µ ± 1% 71.34µ ± 3% ~ (p=0.436 n=10)
Part2LockContentionUnpredictable-20 70.07µ ± 1% 73.25µ ± 5% ~ (p=0.105 n=10)
Part2LockContentionUnpredictable-21 69.81µ ± 1% 73.55µ ± 5% +5.36% (p=0.001 n=10)
Part2LockContentionUnpredictable-22 70.32µ ± 0% 70.29µ ± 1% ~ (p=0.684 n=10)
Part2LockContentionUnpredictable-23 69.80µ ± 0% 74.03µ ± 5% +6.06% (p=0.000 n=10)
Part2LockContentionUnpredictable-24 70.10µ ± 0% 74.16µ ± 5% +5.79% (p=0.001 n=10)
Part2LockContentionUnpredictable-25 70.08µ ± 0% 72.49µ ± 3% +3.44% (p=0.000 n=10)
Part2LockContentionUnpredictable-26 70.22µ ± 1% 72.77µ ± 3% +3.63% (p=0.000 n=10)
Part2LockContentionUnpredictable-27 70.28µ ± 1% 73.28µ ± 3% +4.27% (p=0.000 n=10)
Part2LockContentionUnpredictable-28 70.50µ ± 0% 71.38µ ± 5% +1.25% (p=0.002 n=10)
Part2LockContentionUnpredictable-29 70.59µ ± 1% 75.34µ ± 5% +6.73% (p=0.000 n=10)
Part2LockContentionUnpredictable-30 70.93µ ± 1% 71.79µ ± 5% +1.22% (p=0.000 n=10)
Part2LockContentionUnpredictable-31 70.78µ ± 0% 73.85µ ± 3% +4.35% (p=0.000 n=10)
Part2LockContentionUnpredictable-32 70.85µ ± 0% 75.69µ ± 4% +6.82% (p=0.000 n=10)
geomean 68.46µ 69.85µ +2.03%
the spinbit shows high variance indicating unpredictable performance.
What did you expect to see?
I expect similar performance between spinbitmutex(new in Go 1.24) and nospinbitmutex (before Go 1.24, and in Go 1.24 with GOEXPERIMENT=nospinbitmutex, removed in Go 1.25). However, spinbit exhibits higher variance and unpredictable latency patterns.
I reproduced BenchmarkChanContended on my arm64 test environment, which yielded comparable results to the amd64 benchmark included in this. This validates the improvements shown in that particular benchmark case.
/root/go/bin/benchstat BenchmarkChanContended_go1.24_124_nospinbit.txt BenchmarkChanContended_go1.24_124.txt
goos: linux
goarch: arm64
pkg: chan-contended-1.24
│ BenchmarkChanContended_go1.24_124_nospinbit.txt │ BenchmarkChanContended_go1.24_124.txt │
│ sec/op │ sec/op vs base │
ChanContended 6.317µ ± 0% 6.481µ ± 0% +2.60% (p=0.000 n=10)
ChanContended-2 13.44µ ± 5% 18.34µ ± 2% +36.43% (p=0.000 n=10)
ChanContended-3 16.99µ ± 3% 14.15µ ± 4% -16.70% (p=0.000 n=10)
ChanContended-4 22.11µ ± 2% 17.51µ ± 1% -20.78% (p=0.000 n=10)
ChanContended-5 26.79µ ± 11% 17.03µ ± 1% -36.43% (p=0.000 n=10)
ChanContended-6 31.70µ ± 4% 17.85µ ± 1% -43.70% (p=0.000 n=10)
ChanContended-7 34.40µ ± 11% 17.98µ ± 1% -47.72% (p=0.000 n=10)
ChanContended-8 41.35µ ± 2% 18.41µ ± 1% -55.47% (p=0.000 n=10)
ChanContended-9 38.51µ ± 18% 18.21µ ± 1% -52.71% (p=0.000 n=10)
ChanContended-10 43.31µ ± 22% 18.14µ ± 1% -58.10% (p=0.000 n=10)
ChanContended-11 43.41µ ± 2% 17.97µ ± 1% -58.61% (p=0.000 n=10)
ChanContended-12 44.10µ ± 16% 17.80µ ± 2% -59.63% (p=0.000 n=10)
ChanContended-13 45.99µ ± 24% 18.00µ ± 0% -60.85% (p=0.000 n=10)
ChanContended-14 46.22µ ± 15% 18.07µ ± 1% -60.91% (p=0.000 n=10)
ChanContended-15 48.10µ ± 11% 18.07µ ± 0% -62.43% (p=0.000 n=10)
ChanContended-16 45.97µ ± 5% 17.87µ ± 1% -61.12% (p=0.000 n=10)
ChanContended-17 46.88µ ± 7% 17.60µ ± 1% -62.45% (p=0.000 n=10)
ChanContended-18 48.94µ ± 14% 17.14µ ± 2% -64.99% (p=0.000 n=10)
ChanContended-19 44.91µ ± 12% 17.06µ ± 1% -62.01% (p=0.000 n=10)
ChanContended-20 44.43µ ± 3% 16.96µ ± 2% -61.83% (p=0.000 n=10)
ChanContended-21 43.35µ ± 0% 17.01µ ± 2% -60.76% (p=0.000 n=10)
ChanContended-22 43.42µ ± 9% 16.53µ ± 2% -61.94% (p=0.000 n=10)
ChanContended-23 43.26µ ± 19% 16.78µ ± 1% -61.21% (p=0.000 n=10)
ChanContended-24 42.91µ ± 3% 16.64µ ± 2% -61.23% (p=0.000 n=10)
ChanContended-25 42.84µ ± 8% 16.61µ ± 2% -61.23% (p=0.000 n=10)
ChanContended-26 38.40µ ± 32% 16.84µ ± 2% -56.16% (p=0.000 n=10)
ChanContended-27 37.61µ ± 27% 16.89µ ± 2% -55.10% (p=0.000 n=10)
ChanContended-28 51.70µ ± 8% 16.79µ ± 2% -67.52% (p=0.000 n=10)
ChanContended-29 48.91µ ± 11% 16.66µ ± 2% -65.94% (p=0.000 n=10)
ChanContended-30 45.40µ ± 10% 16.65µ ± 1% -63.32% (p=0.000 n=10)
ChanContended-31 45.36µ ± 7% 16.89µ ± 3% -62.77% (p=0.000 n=10)
ChanContended-32 45.12µ ± 10% 16.71µ ± 3% -62.96% (p=0.000 n=10)
geomean 36.86µ 16.72µ -54.63%
However, under high contention scenarios that better reflect production workloads, no performance benefits were observed (attached benchmark on what did you do section). Our service experienced degradation in P99 Max Latency, particularly affecting goroutines performing network operations such as database queries, cache requests, and external service calls.
Other than performance degradation, this is also related to this issue regarding the removal of the nospinbitmutex GOEXPERIMENT, which is hindering our transition to Go 1.25 as we use it for workaround.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status