Background
The first version of b.Loop as implemented in Go 1.24 stopped the inlining of functions called within the b.Loop body.
Shortly after 1.24 was released, I had filed #73137, which suggested that strategy ideally would be improved upon, including because it caused heap allocations under a b.Loop benchmark that would not occur in normal usage or in an older b.N style benchmark, especially in cases where the inlining would allow something to be stack allocated that would otherwise be heap allocated.
Austin (comment) and others agreed it made sense to improve the implementation.
Go 1.26 did change the implementation, and as a result the issue I had filed is now closed:
Problem
The 1.26 implementation no longer prevents inlining.
However, from what I can tell, the 1.26 implementation is such that it still causes allocations under b.Loop that do not occur under an older style b.N benchmark. I suspect the common case is that scenarios that would trigger extra allocations in 1.24/1.25 under b.Loop still trigger extra allocations in go1.26rc2.
I have a CL at https://go.dev/cl/738822 that makes a small change to the Go 1.26 implementation that I think resolves this issue.
Additional details
Inlining most of my comment from a few days ago in #73137 (comment), which includes a sample benchmark that illustrates the problem:
I wanted to understand the new 1.26 b.Loop behavior better, so I poked around a bit at how it's implemented.
One thing I noticed is it seems the new behavior means b.Loop still seems to cause extra allocations compared to the older b.N style benchmarks, including in the example I built for this issue here -- in particular, my "b.Loop-basic" benchmark from the playground link in the opening comment above.
The function being benchmark now can be inlined in Go 1.26, which is an improvement compared to Go 1.24/1.25 behavior, but the undesirable allocation still happens with b.Loop in go1.26rc2 (compared to the allocations do not happen in an otherwise equivalent b.N benchmark).
I suspect that is due to the way the 1.26 b.Loop compiler changes are handling the temporary variables it is creating. It looks like the autotmp variables are being declared by the compiler outside the loop body, which escape analysis will determine is an escaping value and results in a heap allocation.
I sent WIP https://go.dev/cl/738822 with a candidate fix.
Background
The first version of
b.Loopas implemented in Go 1.24 stopped the inlining of functions called within the b.Loop body.Shortly after 1.24 was released, I had filed #73137, which suggested that strategy ideally would be improved upon, including because it caused heap allocations under a
b.Loopbenchmark that would not occur in normal usage or in an olderb.Nstyle benchmark, especially in cases where the inlining would allow something to be stack allocated that would otherwise be heap allocated.Austin (comment) and others agreed it made sense to improve the implementation.
Go 1.26 did change the implementation, and as a result the issue I had filed is now closed:
Problem
The 1.26 implementation no longer prevents inlining.
However, from what I can tell, the 1.26 implementation is such that it still causes allocations under
b.Loopthat do not occur under an older styleb.Nbenchmark. I suspect the common case is that scenarios that would trigger extra allocations in 1.24/1.25 underb.Loopstill trigger extra allocations in go1.26rc2.I have a CL at https://go.dev/cl/738822 that makes a small change to the Go 1.26 implementation that I think resolves this issue.
Additional details
Inlining most of my comment from a few days ago in #73137 (comment), which includes a sample benchmark that illustrates the problem: