Discovered a regression in sec/op of 2.74% for benchmark Invocation/interpreter/random_mat_mul_size_20-88 at 2a93576.
This was "interrnal/buildcfg: enable SizeSpecializedMalloc by default", a performance regression seems unpossible, yet here we are.
TLDR: to write the repro instruction I ran the benchmark, and at least for me on my laptop, it got faster, not slower.
But for posterity...
It's a bent benchmark, to reproduce:
# choose directories as appropriate
export BASELINE=/tmp/baseline
export TEST=/tmp/test
go install golang.org/x/benchmarks/cmd/bent@latest
go install golang.org/x/perf/cmd/benchstat@latest
git clone git clone https://go.googlesource.com/go $BASELINE
git clone git clone https://go.googlesource.com/go $TEST
(cd $BASELINE/src; git fetch; git checkout 2a93576965^ ; ./make.bash )
(cd $TEST/src; git fetch; git checkout 2a93576965 ; ./make.bash )
mkdir foo; cd foo
bent -I # <- capital letter "i" , as in GH I JKL
# You need a configurations.toml
cat > configurations.toml <<\\EOF
[[Configurations]]
Name = "Baseline"
Root = "$BASELINE"
[[Configurations]]
Name = "Test"
Root = "$TEST"
\EOF
# run 25 iterations of the wazero benchmark, randomly linked
bent -b wazero -R=25
# look at the benchmark results
cd bench
alias bs='benchstat -col toolchain -ignore pkg,shortname'
# you want the last two stdout files; this is a real example run
# and, uh, looks like it did not reproduce -- the new version is faster, at least on an Apple laptop.
bs 20260527T190219.Baseline.stdout 20260527T190219.Test.stdout
goos: darwin
goarch: arm64
cpu: Apple M4
│ Baseline │ Test │
│ sec/op │ sec/op vs base │
Invocation/interpreter/fib_for_20-10 1.148m ± 0% 1.056m ± 1% -8.04% (p=0.000 n=25)
Invocation/interpreter/string_manipulation_size_50-10 479.4µ ± 0% 463.6µ ± 0% -3.28% (p=0.000 n=25)
Invocation/interpreter/random_mat_mul_size_20-10 3.485m ± 0% 3.464m ± 1% -0.60% (p=0.006 n=25)
Compilation/with_extern_cache-10 142.9µ ± 0% 142.8µ ± 0% ~ (p=0.430 n=25)
Compilation/without_extern_cache-10 7.393m ± 1% 7.396m ± 1% ~ (p=0.893 n=25)
geomean 1.152m 1.124m -2.44%
Discovered a regression in sec/op of 2.74% for benchmark Invocation/interpreter/random_mat_mul_size_20-88 at 2a93576.
This was "interrnal/buildcfg: enable SizeSpecializedMalloc by default", a performance regression seems unpossible, yet here we are.
TLDR: to write the repro instruction I ran the benchmark, and at least for me on my laptop, it got faster, not slower.
But for posterity...
It's a bent benchmark, to reproduce: