Skip to content

runtime: high variance and unpredictable latency with spinbit mutex #75261

@itsalvinchris

Description

@itsalvinchris

Go version

go version go1.24.4 linux/arm64

Output of go env in your module/workspace:

AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOARCH='arm64'
GOARM64='v8.0'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1130418166=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='arm64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/dev/null'
GOMODCACHE='/root/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/root/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go1.24'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/root/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go1.24/pkg/tool/linux_arm64'
GOVCS=''
GOVERSION='go1.24.4'
GOWORK=''
PKG_CONFIG='pkg-config'

What did you do?

I conducted benchmarks using this code on Go 1.24, comparing the default build against Go 1.24 with GOEXPERIMENT=nospinbitmutex on AWS EC2 c7g.4xlarge. Note that although the instance has 16 CPUs, I tested with -test.cpu values ranging from 1-32.

package main

import (
	"sync"
	"testing"
)

// Run with: go test runtime -test.run='^$' -test.bench=ChanContended -test.cpu="$(seq 1 32 | tr '\n' ',')" -test.count=10
func BenchmarkPart2LockContentionUnpredictable(b *testing.B) {
	const requestsPerGoroutine = 10

	var globalMutex sync.Mutex
	var sharedCounter = 0

	b.RunParallel(func(pb *testing.PB) {
		for pb.Next() {
			for i := 0; i < requestsPerGoroutine; i++ {
				respChan := make(chan int, 1)
				go processPart2LockContentionUnpredictable(i, respChan, &globalMutex, &sharedCounter)
				<-respChan
			}
		}
	})
}

func processPart2LockContentionUnpredictable(reqID int, respChan chan int, globalMutex *sync.Mutex, sharedCounter *int) {
	results := make(chan int, 16)
	for i := 0; i < 16; i++ {
		go func(id int) {
			globalMutex.Lock()
			*sharedCounter++

			// Some Workload
			baseWork := 50
			variance := (id*13 + reqID*7) % 199951
			work := baseWork + variance

			sum := id + reqID
			for j := 0; j < work; j++ {
				sum += j
			}

			globalMutex.Unlock()
			results <- sum
		}(i)
	}
	total := 0
	for i := 0; i < 16; i++ {
		total += <-results
	}

	respChan <- total
}

What did you see happen?

Benchstat 1.24 Run 1: Performance comparison between Go 1.24 default configuration and Go 1.24 with GOEXPERIMENT=nospinbitmutex

/root/go/bin/benchstat  BenchmarkPart2LockContentionUnpredictable_go1.24_124_nospinbit.txt BenchmarkPart2LockContentionUnpredictable_go1.24_124.txt
goos: linux
goarch: arm64
pkg: chan-contended-1.24
                                    │ BenchmarkPart2LockContentionUnpredictable_go1.24_124_nospinbit.txt │ BenchmarkPart2LockContentionUnpredictable_go1.24_124.txt │
                                    │                               sec/op                               │              sec/op               vs base                │
Part2LockContentionUnpredictable                                                             115.5µ ± 0%                       115.8µ ±  0%        ~ (p=0.218 n=10)
Part2LockContentionUnpredictable-2                                                           92.59µ ± 2%                       85.16µ ±  1%   -8.03% (p=0.000 n=10)
Part2LockContentionUnpredictable-3                                                           70.09µ ± 1%                       65.15µ ±  1%   -7.04% (p=0.000 n=10)
Part2LockContentionUnpredictable-4                                                           60.07µ ± 1%                       58.49µ ±  1%   -2.63% (p=0.000 n=10)
Part2LockContentionUnpredictable-5                                                           55.98µ ± 1%                       53.59µ ±  3%   -4.27% (p=0.000 n=10)
Part2LockContentionUnpredictable-6                                                           52.18µ ± 0%                       55.11µ ±  1%   +5.61% (p=0.002 n=10)
Part2LockContentionUnpredictable-7                                                           52.25µ ± 1%                       53.62µ ± 11%   +2.63% (p=0.000 n=10)
Part2LockContentionUnpredictable-8                                                           54.72µ ± 0%                       63.78µ ± 11%  +16.56% (p=0.000 n=10)
Part2LockContentionUnpredictable-9                                                           58.87µ ± 1%                       64.30µ ±  6%   +9.22% (p=0.000 n=10)
Part2LockContentionUnpredictable-10                                                          63.40µ ± 1%                       67.22µ ±  5%   +6.03% (p=0.000 n=10)
Part2LockContentionUnpredictable-11                                                          65.92µ ± 1%                       72.38µ ±  8%   +9.81% (p=0.001 n=10)
Part2LockContentionUnpredictable-12                                                          67.27µ ± 1%                       73.67µ ±  8%   +9.51% (p=0.003 n=10)
Part2LockContentionUnpredictable-13                                                          67.89µ ± 0%                       74.62µ ±  8%   +9.91% (p=0.000 n=10)
Part2LockContentionUnpredictable-14                                                          68.32µ ± 0%                       75.33µ ±  9%  +10.26% (p=0.000 n=10)
Part2LockContentionUnpredictable-15                                                          68.50µ ± 0%                       75.92µ ±  9%  +10.84% (p=0.000 n=10)
Part2LockContentionUnpredictable-16                                                          68.90µ ± 1%                       70.17µ ±  9%   +1.84% (p=0.000 n=10)
Part2LockContentionUnpredictable-17                                                          68.64µ ± 0%                       70.44µ ±  9%   +2.63% (p=0.000 n=10)
Part2LockContentionUnpredictable-18                                                          68.80µ ± 0%                       70.34µ ±  9%   +2.24% (p=0.000 n=10)
Part2LockContentionUnpredictable-19                                                          68.69µ ± 0%                       77.28µ ±  9%  +12.51% (p=0.000 n=10)
Part2LockContentionUnpredictable-20                                                          68.88µ ± 0%                       76.92µ ±  8%  +11.67% (p=0.000 n=10)
Part2LockContentionUnpredictable-21                                                          68.89µ ± 0%                       71.08µ ±  9%   +3.18% (p=0.000 n=10)
Part2LockContentionUnpredictable-22                                                          68.89µ ± 0%                       71.78µ ±  9%   +4.19% (p=0.000 n=10)
Part2LockContentionUnpredictable-23                                                          69.11µ ± 1%                       78.10µ ±  8%  +13.01% (p=0.000 n=10)
Part2LockContentionUnpredictable-24                                                          69.27µ ± 1%                       71.80µ ± 10%   +3.66% (p=0.000 n=10)
Part2LockContentionUnpredictable-25                                                          69.06µ ± 1%                       78.62µ ±  9%  +13.83% (p=0.000 n=10)
Part2LockContentionUnpredictable-26                                                          69.33µ ± 1%                       79.37µ ±  9%  +14.47% (p=0.000 n=10)
Part2LockContentionUnpredictable-27                                                          69.48µ ± 0%                       72.49µ ± 10%   +4.32% (p=0.000 n=10)
Part2LockContentionUnpredictable-28                                                          69.51µ ± 0%                       73.00µ ±  9%   +5.02% (p=0.000 n=10)
Part2LockContentionUnpredictable-29                                                          69.52µ ± 0%                       72.85µ ± 10%   +4.78% (p=0.000 n=10)
Part2LockContentionUnpredictable-30                                                          69.74µ ± 0%                       72.81µ ± 11%   +4.40% (p=0.000 n=10)
Part2LockContentionUnpredictable-31                                                          69.90µ ± 0%                       73.13µ ± 11%   +4.63% (p=0.000 n=10)
Part2LockContentionUnpredictable-32                                                          69.88µ ± 0%                       80.91µ ± 10%  +15.79% (p=0.000 n=10)
geomean                                                                                      	67.70µ                            71.61µ         +5.78%

Benchstat 1.24 Run 2: Performance comparison between Go 1.24 default configuration and Go 1.24 with GOEXPERIMENT=nospinbitmutex

/root/go/bin/benchstat BenchmarkPart2LockContentionUnpredictable_go1.23_123.txt BenchmarkPart2LockContentionUnpredictable_go1.24_124.txt
goos: linux
goarch: arm64
pkg: chan-contended-1.24
                                    │ BenchmarkPart2LockContentionUnpredictable_go1.23_123.txt │ BenchmarkPart2LockContentionUnpredictable_go1.24_124.txt │
                                    │                          sec/op                          │              sec/op                vs base               │
Part2LockContentionUnpredictable                                                   117.9µ ± 0%                         115.8µ ± 0%  -1.70% (p=0.000 n=10)
Part2LockContentionUnpredictable-2                                                 89.26µ ± 1%                         89.86µ ± 2%       ~ (p=0.280 n=10)
Part2LockContentionUnpredictable-3                                                 67.46µ ± 2%                         67.01µ ± 1%       ~ (p=0.579 n=10)
Part2LockContentionUnpredictable-4                                                 60.64µ ± 1%                         59.93µ ± 1%  -1.18% (p=0.000 n=10)
Part2LockContentionUnpredictable-5                                                 55.17µ ± 1%                         54.48µ ± 1%  -1.24% (p=0.001 n=10)
Part2LockContentionUnpredictable-6                                                 52.51µ ± 0%                         52.49µ ± 1%       ~ (p=0.739 n=10)
Part2LockContentionUnpredictable-7                                                 52.78µ ± 0%                         54.53µ ± 3%  +3.33% (p=0.001 n=10)
Part2LockContentionUnpredictable-8                                                 55.29µ ± 0%                         58.52µ ± 5%  +5.85% (p=0.000 n=10)
Part2LockContentionUnpredictable-9                                                 59.62µ ± 1%                         61.38µ ± 4%       ~ (p=0.700 n=10)
Part2LockContentionUnpredictable-10                                                64.47µ ± 1%                         66.37µ ± 6%       ~ (p=0.481 n=10)
Part2LockContentionUnpredictable-11                                                67.47µ ± 1%                         64.39µ ± 1%  -4.56% (p=0.002 n=10)
Part2LockContentionUnpredictable-12                                                68.70µ ± 1%                         69.77µ ± 6%       ~ (p=0.481 n=10)
Part2LockContentionUnpredictable-13                                                69.62µ ± 0%                         70.46µ ± 5%       ~ (p=0.137 n=10)
Part2LockContentionUnpredictable-14                                                69.55µ ± 0%                         67.20µ ± 6%       ~ (p=0.143 n=10)
Part2LockContentionUnpredictable-15                                                70.09µ ± 0%                         69.79µ ± 4%       ~ (p=1.000 n=10)
Part2LockContentionUnpredictable-16                                                70.23µ ± 1%                         72.21µ ± 5%  +2.82% (p=0.023 n=10)
Part2LockContentionUnpredictable-17                                                69.91µ ± 1%                         68.81µ ± 5%       ~ (p=0.143 n=10)
Part2LockContentionUnpredictable-18                                                69.78µ ± 0%                         72.72µ ± 5%  +4.21% (p=0.023 n=10)
Part2LockContentionUnpredictable-19                                                69.87µ ± 1%                         71.34µ ± 3%       ~ (p=0.436 n=10)
Part2LockContentionUnpredictable-20                                                70.07µ ± 1%                         73.25µ ± 5%       ~ (p=0.105 n=10)
Part2LockContentionUnpredictable-21                                                69.81µ ± 1%                         73.55µ ± 5%  +5.36% (p=0.001 n=10)
Part2LockContentionUnpredictable-22                                                70.32µ ± 0%                         70.29µ ± 1%       ~ (p=0.684 n=10)
Part2LockContentionUnpredictable-23                                                69.80µ ± 0%                         74.03µ ± 5%  +6.06% (p=0.000 n=10)
Part2LockContentionUnpredictable-24                                                70.10µ ± 0%                         74.16µ ± 5%  +5.79% (p=0.001 n=10)
Part2LockContentionUnpredictable-25                                                70.08µ ± 0%                         72.49µ ± 3%  +3.44% (p=0.000 n=10)
Part2LockContentionUnpredictable-26                                                70.22µ ± 1%                         72.77µ ± 3%  +3.63% (p=0.000 n=10)
Part2LockContentionUnpredictable-27                                                70.28µ ± 1%                         73.28µ ± 3%  +4.27% (p=0.000 n=10)
Part2LockContentionUnpredictable-28                                                70.50µ ± 0%                         71.38µ ± 5%  +1.25% (p=0.002 n=10)
Part2LockContentionUnpredictable-29                                                70.59µ ± 1%                         75.34µ ± 5%  +6.73% (p=0.000 n=10)
Part2LockContentionUnpredictable-30                                                70.93µ ± 1%                         71.79µ ± 5%  +1.22% (p=0.000 n=10)
Part2LockContentionUnpredictable-31                                                70.78µ ± 0%                         73.85µ ± 3%  +4.35% (p=0.000 n=10)
Part2LockContentionUnpredictable-32                                                70.85µ ± 0%                         75.69µ ± 4%  +6.82% (p=0.000 n=10)
geomean                                                                            68.46µ                              69.85µ       +2.03%

the spinbit shows high variance indicating unpredictable performance.

What did you expect to see?

I expect similar performance between spinbitmutex(new in Go 1.24) and nospinbitmutex (before Go 1.24, and in Go 1.24 with GOEXPERIMENT=nospinbitmutex, removed in Go 1.25). However, spinbit exhibits higher variance and unpredictable latency patterns.

I reproduced BenchmarkChanContended on my arm64 test environment, which yielded comparable results to the amd64 benchmark included in this. This validates the improvements shown in that particular benchmark case.

/root/go/bin/benchstat BenchmarkChanContended_go1.24_124_nospinbit.txt BenchmarkChanContended_go1.24_124.txt 
goos: linux
goarch: arm64
pkg: chan-contended-1.24
                 │ BenchmarkChanContended_go1.24_124_nospinbit.txt │ BenchmarkChanContended_go1.24_124.txt │
                 │                     sec/op                      │    sec/op      vs base                │
ChanContended                                         6.317µ ±  0%     6.481µ ± 0%   +2.60% (p=0.000 n=10)
ChanContended-2                                       13.44µ ±  5%     18.34µ ± 2%  +36.43% (p=0.000 n=10)
ChanContended-3                                       16.99µ ±  3%     14.15µ ± 4%  -16.70% (p=0.000 n=10)
ChanContended-4                                       22.11µ ±  2%     17.51µ ± 1%  -20.78% (p=0.000 n=10)
ChanContended-5                                       26.79µ ± 11%     17.03µ ± 1%  -36.43% (p=0.000 n=10)
ChanContended-6                                       31.70µ ±  4%     17.85µ ± 1%  -43.70% (p=0.000 n=10)
ChanContended-7                                       34.40µ ± 11%     17.98µ ± 1%  -47.72% (p=0.000 n=10)
ChanContended-8                                       41.35µ ±  2%     18.41µ ± 1%  -55.47% (p=0.000 n=10)
ChanContended-9                                       38.51µ ± 18%     18.21µ ± 1%  -52.71% (p=0.000 n=10)
ChanContended-10                                      43.31µ ± 22%     18.14µ ± 1%  -58.10% (p=0.000 n=10)
ChanContended-11                                      43.41µ ±  2%     17.97µ ± 1%  -58.61% (p=0.000 n=10)
ChanContended-12                                      44.10µ ± 16%     17.80µ ± 2%  -59.63% (p=0.000 n=10)
ChanContended-13                                      45.99µ ± 24%     18.00µ ± 0%  -60.85% (p=0.000 n=10)
ChanContended-14                                      46.22µ ± 15%     18.07µ ± 1%  -60.91% (p=0.000 n=10)
ChanContended-15                                      48.10µ ± 11%     18.07µ ± 0%  -62.43% (p=0.000 n=10)
ChanContended-16                                      45.97µ ±  5%     17.87µ ± 1%  -61.12% (p=0.000 n=10)
ChanContended-17                                      46.88µ ±  7%     17.60µ ± 1%  -62.45% (p=0.000 n=10)
ChanContended-18                                      48.94µ ± 14%     17.14µ ± 2%  -64.99% (p=0.000 n=10)
ChanContended-19                                      44.91µ ± 12%     17.06µ ± 1%  -62.01% (p=0.000 n=10)
ChanContended-20                                      44.43µ ±  3%     16.96µ ± 2%  -61.83% (p=0.000 n=10)
ChanContended-21                                      43.35µ ±  0%     17.01µ ± 2%  -60.76% (p=0.000 n=10)
ChanContended-22                                      43.42µ ±  9%     16.53µ ± 2%  -61.94% (p=0.000 n=10)
ChanContended-23                                      43.26µ ± 19%     16.78µ ± 1%  -61.21% (p=0.000 n=10)
ChanContended-24                                      42.91µ ±  3%     16.64µ ± 2%  -61.23% (p=0.000 n=10)
ChanContended-25                                      42.84µ ±  8%     16.61µ ± 2%  -61.23% (p=0.000 n=10)
ChanContended-26                                      38.40µ ± 32%     16.84µ ± 2%  -56.16% (p=0.000 n=10)
ChanContended-27                                      37.61µ ± 27%     16.89µ ± 2%  -55.10% (p=0.000 n=10)
ChanContended-28                                      51.70µ ±  8%     16.79µ ± 2%  -67.52% (p=0.000 n=10)
ChanContended-29                                      48.91µ ± 11%     16.66µ ± 2%  -65.94% (p=0.000 n=10)
ChanContended-30                                      45.40µ ± 10%     16.65µ ± 1%  -63.32% (p=0.000 n=10)
ChanContended-31                                      45.36µ ±  7%     16.89µ ± 3%  -62.77% (p=0.000 n=10)
ChanContended-32                                      45.12µ ± 10%     16.71µ ± 3%  -62.96% (p=0.000 n=10)
geomean                                               36.86µ           16.72µ       -54.63%

However, under high contention scenarios that better reflect production workloads, no performance benefits were observed (attached benchmark on what did you do section). Our service experienced degradation in P99 Max Latency, particularly affecting goroutines performing network operations such as database queries, cache requests, and external service calls.

Other than performance degradation, this is also related to this issue regarding the removal of the nospinbitmutex GOEXPERIMENT, which is hindering our transition to Go 1.25 as we use it for workaround.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugReportIssues describing a possible bug in the Go implementation.NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.Performancecompiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    Status

    Todo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions