Go version
go version go1.26.0 linux/amd64
Output of go env in your module/workspace:
Details
AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOAMD64='v1'
GOARCH='amd64'
GOAUTH='netrc'
GOBIN='/home/caleb/bin'
GOCACHE='/home/caleb/.cache/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/home/caleb/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build583110936=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/home/caleb/p/misc/go.mod'
GOMODCACHE='/home/caleb/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/caleb/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/caleb/3p/go'
GOSUMDB='sum.golang.org'
GOTELEMETRY='on'
GOTELEMETRYDIR='/home/caleb/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/caleb/3p/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.26.0'
GOWORK=''
PKG_CONFIG='pkg-config'
What did you do?
I ran this test benchmark with go test -bench .:
package bloop
import (
"testing"
)
func BenchmarkBaseline(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = expensive()
}
}
func BenchmarkSink(b *testing.B) {
var x float64
for i := 0; i < b.N; i++ {
x = expensive()
}
floatSink = x
}
func BenchmarkBLoop(b *testing.B) {
for b.Loop() {
expensive()
}
}
func BenchmarkBLoopAssign(b *testing.B) {
for b.Loop() {
_ = expensive()
}
}
var floatSink float64
func expensive() float64 {
x := 1.0
for i := range int(1e6) {
x *= float64(i) * float64(i+1) * float64(i+2) * float64(i+3)
}
return x
}
What did you see happen?
In Go 1.26.0, I got results along these lines:
BenchmarkBaseline-22 10000 106871 ns/op
BenchmarkSink-22 1336 919937 ns/op
BenchmarkBLoop-22 1300 890339 ns/op
BenchmarkBLoopAssign-22 10927 109716 ns/op
That is, there seem to be two regimes, where the slow one is about 9x slower than the fast one. (If I look at the disassembly, the fast ones do have the 1e6 loop, but the body of the loop is a no-op instead of all the expensive float ops.)
The baseline version, which uses a b.N loop, was fast (that is, it skipped the expensive parts). That's expected.
Using a "sink" variable, which is a typical pre-b.Loop workaround, doesn't skip the work.
Using b.Loop and calling the function simply as
also doesn't skip the work.
If we use b.Loop and call the function as
then the benchmark is fast (it skips the expensive stuff).
What did you expect to see?
I expected that writing _ = f() would have the same treatment as f() as far as the b.Loop/compiler interaction went.
Here's what the doc says:
Within the body of a "for b.Loop() { ... }" loop, arguments to and results
from function calls and assigned variables within the loop are kept alive,
preventing the compiler from fully optimizing away the loop body. Currently,
this is implemented as a compiler transformation that wraps such variables
with a runtime.KeepAlive intrinsic call. This applies only to statements
syntactically between the curly braces of the loop, and the loop condition
must be written exactly as "b.Loop()".
I think that however you read that, the result of expensive() in the line _ = expensive() should be part of the "function calls and assigned variables within the loop", and should be kept alive.
And regardless of the docs, _ = f() is a very natural way of writing a function call within a benchmark where the function produces a result that you would normally use in any non-benchmark context. So it shouldn't have this sharp edge.
This is a Go 1.26 regression. In Go 1.25.6, I see (for example):
BenchmarkBaseline-22 11143 108954 ns/op
BenchmarkSink-22 1282 880942 ns/op
BenchmarkBLoop-22 1354 877726 ns/op
BenchmarkBLoopAssign-22 1366 880257 ns/op
Fun side note: this pattern doesn't exhibit the bug:
/cc bloop gang: @JunyangShao @cherrymui @thepudds
Go version
go version go1.26.0 linux/amd64
Output of
go envin your module/workspace:Details
What did you do?
I ran this test benchmark with
go test -bench .:What did you see happen?
In Go 1.26.0, I got results along these lines:
That is, there seem to be two regimes, where the slow one is about 9x slower than the fast one. (If I look at the disassembly, the fast ones do have the 1e6 loop, but the body of the loop is a no-op instead of all the expensive float ops.)
The baseline version, which uses a
b.Nloop, was fast (that is, it skipped the expensive parts). That's expected.Using a "sink" variable, which is a typical pre-
b.Loopworkaround, doesn't skip the work.Using
b.Loopand calling the function simply asalso doesn't skip the work.
If we use
b.Loopand call the function asthen the benchmark is fast (it skips the expensive stuff).
What did you expect to see?
I expected that writing
_ = f()would have the same treatment asf()as far as theb.Loop/compiler interaction went.Here's what the doc says:
I think that however you read that, the result of
expensive()in the line_ = expensive()should be part of the "function calls and assigned variables within the loop", and should be kept alive.And regardless of the docs,
_ = f()is a very natural way of writing a function call within a benchmark where the function produces a result that you would normally use in any non-benchmark context. So it shouldn't have this sharp edge.This is a Go 1.26 regression. In Go 1.25.6, I see (for example):
Fun side note: this pattern doesn't exhibit the bug:
/cc bloop gang: @JunyangShao @cherrymui @thepudds