New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: testing: add Keep, to force evaluation in benchmarks #61179
Comments
Bikeshedding the name aside ( It's simple and sufficient. It doesn't prevent us from working on a different API like the proposed |
If we planned to do Iterate, we might not want to also do Keep. That said, I think the drawbacks listed above for Iterate are quite serious and we should simply not do it. |
FWIW, I'm planning to file another proposal for an API that covers just the looping aspect of Iterate and would complement Keep. |
Doing |
This proposal has been added to the active column of the proposals project |
Note that that is also possible with the API proposed in #48768 by closing over the intentionally-constant values: func BenchmarkFibConstant10(b *testing.T) {
b.Iterate(func() int {
return Fib(10)
})
} It seems to me that this proposal and #48768 are equally expressive, and the key difference is just whether constant-propagation is opt-in (#48768) or opt-out (this proposal). |
As I have repeatedly stated on #48768, I believe that there are several viable ways to overcome that overhead. I am becoming somewhat frustrated that #48768 (comment) in particular seems to have been ignored. I may not be on the Go compiler team, but I am well acquainted with compiler optimization techniques, and so far nobody has explained why those techniques would not apply in this case. |
While that is true, any type mismatch errors would be diagnosed immediately if the benchmark is ever actually run, and a similar lack of type safety was not seen as a significant barrier for the closely-related fuzz testing API (#44551). |
@bcmills, my experience over >25 frustrating years of trying to benchmark things is that, in general, attempting to subtract out per-loop overhead sounds good in theory, but in practice that overhead can and often does include various random noise. And the more overhead there is, the louder the noise. This means if you are trying to benchmark a very short operation, then subtracting out a (different) reflect.Call measurement is very likely to destroy the measurement, perhaps even making it negative. The best approach we have for getting the most reliable numbers we can is to introduce as little overhead as possible to begin with. For the trivial loop for i := 0; i < b.N; i++, we just ignore the overhead of the i++, i < N entirely and include it as part of the thing being measured. This turns out to be far more accurate than trying to subtract it out. |
From what I can tell, this would require N+1 calls to |
The main place where testing.Keep is needed is around the overall result. I write code to work around that all the time. |
I see now that you also mentioned making b.Iterate a compiler intrinsic. I suppose that is possible, but it seems very special-case. At that point it's basically a back-door language change, since either you can't do |
I expect that that will become more common as the compiler gets better at inlining. That said, it is also more straightforward to work around (without new API) today, such as by alternating among multiple entries in a slice of inputs. |
I agree that subtracting out the overhead from a naive implementation based on Making I think probably the most promising approach is an implementation that sets up a stack frame with arguments and then repeatedly invokes the function starting from that frame. It isn't obvious to me whether the |
The stack frame implementation would not be able to set up the arguments just once. It would have to set them up on every iteration, since in general a function can overwrite its arguments, and many do. reflect.Caller would amortize the allocation but not the setup. |
All good points, @bcmills.
I'm not sure if you're referring to "line noise" here (which, I agree, this does introduce a fair amount of line noise) or measurement noise. For the latter, a naive implementation of
Another possible option is that we make sure I'm not that concerned about people capturing We already do some code generation for tests. Is there anything we could code-generate to help with this? We don't rewrite any test code right now, so this might require pushing that too far.
Not to mention, I would expect most or all of the arguments to be passed in registers. We would certainly have to re-initialze those. |
What I had in mind is something like two functions: a (somewhat expensive) setup hook that checks types and copies function arguments from a slice into a more compact argument block, and a (cheap) “invoke” hook that initializes the call stack frame, copies arguments into registers, and calls the function. The argument block might look something like:
The implementation of func (b *B) Iterate(f any, args ...any) {
call := reflectlite.NewCall(f, args...)
b.ResetTimer()
for i := 0; i < b.N; i++ {
call.Invoke()
}
} where That seems like it might be easier than teaching the compiler to inline through |
As a developer I would much prefer the compiler be taught not to optimize away calls within a _test.go file instead of me having to remember to write a bunch of wrapper calls. I didn't see that listed in the alternatives, so my apologies if that has been proposed previously. |
So I naively want the compiler not to optimize away things in a benchmark... but also some amount of the optimization happening would in fact be part of what the compiler would do running the code in reality, and thus, part of what I want to benchmark. The trick is distinguishing between optimizing-away the benchmark and optimizing-away some of the work inside the benchmark, which would also be optimized-away outside of the benchmark. |
Another alternative name for This function is not only useful for testing, but also for non-testing code. |
I belive that another problem with |
The existing ABI is such that if a function that returns a pointer to a new object is not inlined into the caller, the object to which it points must be heap-allocated. |
I am still concerned about the overhead of reflect in Iterate. We can't subtract it, and that means we can't reliably measure very short functions - which are the ones most likely to be affected by throwing away parts of the computation. The compiler is going to be involved no matter what. What if it's more involved? Specifically, suppose we have a function that just does looping and takes func(), like
or maybe
and the compiler would recognize testing.B.Loop and apply Keep to every function and every argument in every call in that closure. We could still provide Keep separately and explain in the docs for Loop what the compiler is doing, in terms of Keep. This would end up being like b.Iterate but (a) you get to write the actual calls, and (b) there is no reflect. On the other hand, the compiler does more work. But this is the Go compiler and the Go testing package; the compiler already knows about sync/atomic and math and other packages. For that matter we could also recognize for i := 0; i < b.N; i++ { ... } and do the same to that loop body (it might still help to have something like Iterate or Loop though). |
I've just filed #61515, which I consider closely related to and complementary to this proposal and the |
To quote #61515:
To be explicit, does that proposal include this compiler special-casing? Or is that proposal only for the API change and then in the future we will take a separate decision regarding special compilation of Loop? You say that that proposal is complementary to this Keep proposal, but this specific change seems non-orthogonal. If we decide to do the compiler special-casing, that seems like it should bear on our decision about whether to expose Keep to the user at all. |
We are considering Keep and Loop-with-implicit-Keep together. If we do the special casing, then we basically have to expose Keep too, |
To summarize the current state, the idea is to have Keep(x) return x but "hide" it from the compiler and disable throwing it away, so you can use Keep(f(Keep(x))) to both make sure f's result calculation is not optimized away and to keep the compiler from specializing an inlined copy of f to handle just x. Then, over on #61515, we have a proposal to define b.Loop() that returns bool and is used like: for b.Loop() { instead of for i := 0; i < b.N; i++ { The nice thing about b.Loop is that the testing package can run code inside b.Loop to time groups of iterations separately, so that for example b.Loop could return true 10 times and see how long those iterations took, and then return true 100 more times and see how long those took, all without breaking the loop. This would remove the need to call a benchmark function more than once, and it would remove the need for b.ResetTimer - the only timing would be while the for loop is running. Setup and teardown would automatically not be counted. And then on top of that, the compiler would recognize a for loop around b.Loop() and edit any calls inside the { ... } loop body to insert Keep around the result of the call and each argument. With all that, a working, accurate benchmark for, say, unicode.IsSpace, would be:
When users learn the pattern of using b.Loop, their benchmarks are easier to write and report real numbers. This would be rewritten by the compiler to:
It might be better to rename Keep to Use too, but for clarity I've written this comment with Keep. |
So, if we wanted to benchmark inlining of func BenchmarkIsSpaceInlinedConstant(b *testing.B) {
setup() // no setup really needed here but in general...
// Enable inlining by defining this outside of the b.Loop body.
xIsSpace := func() bool {
return unicode.IsSpace('x')
}
for b.Loop() {
// Benchmark the call with the 'x' argument inlined.
xIsSpace()
}
teardown() // same...
} ? |
I'm still not real fond of the “ |
I believe your example is right. It's awkward, but I think the only way to make it non-awkward is what we have today. Given that I'm pretty sure intentional constant propagation in benchmarks is extremely rare, it seems like the right balance to make the common intent (no constant propagation) the easy default, at the expense of making the rare intent awkward.
I'm a bit squeamish about this, too. But, it seems to me that there are the users who think about unintended optimization in benchmarks, and the users who don't. With a little compiler magic, we can just solve this problem for the users who don't think about it. And for the users who do think about it, hopefully they can also learn about the deoptimization effect of b.Loop. Also, there's no harm in continuing to do the sorts of "manual" deoptimization that people do today. My main concern is that refactoring of code within a b.Loop could have surprising effects on the result of a benchmark. I think attaching it to b.Loop is a lot more robust to refactoring than, say, deoptimizing the body of Benchmark functions, though. |
It seems like the choice generally is between "thing people forget to use" and "thing that is kind of magic". Given that choice it seems like we should prefer the second. Or at least I prefer the second, because it will mean that benchmarks are more reliable. There is no perfect solution here. We have to pick one of those two choices. I agree with the example above but I would have written
I suspect that will become a pattern, and it seems fine. |
I don't really understand why for b.Loop() {
func() {
unicode.IsSpace('x')
}()
} would work to enable the optimization — it is still lexically within the loop body. If it does work, then I'm not sure I can clearly describe the region within which optimizations are disabled. 😅 |
Good point @bcmills. We need a precise definition of what gets Keep added inside the loop body. It sounds like maybe we want "Keep is applied around all function results and all function arguments appearing anywhere lexically inside the loop body", so that F(10) becomes Keep(F(Keep(10)). So my "idiom" would not in fact become an idiom, or at least it would not do anything useful. An alternative would be to apply it around all expressions, so that F(10) becomes Keep(Keep(F)(Keep(10))), but we probably don't want that, because often we do want F itself to be inlined in the benchmark if it would be inlined at the call site. |
Do I have that right? Do people agree with this? |
This seems reasonable to me. (That said, while I think it's important to agree on a definition we can implement and communicate, I don't feel like the exact details matter that much. I think any reasonable definition will work for the vast majority of code, and in the unusual cases where the details matter, users can follow whatever definition we provide.) |
Wouldn't it be simpler to just disable the appropriate optimizations when compiling benchmark functions (not including the functions they call)? I think most users would accept worse emitted code for these funcs knowing that it was for better benchmark accuracy. Then we wouldn't need a special stdlib function with documentation explaining its purpose, etc. If you need better emitted code for some parts of the benchmark for some reason, then wrap it in a helper function, which is optimized like normal: func BenchmarkFoo(b *testing.B) {
lotsOfSetupWork(b)
for n := 0; n < b.N; n++ {
Fib(10)
}
lotsOfTeardownWork(b)
} |
@willfaught , we've leaned away from that for two reasons:
|
Based on the discussion above, this proposal seems like a likely accept. |
@rsc I'm surprised and disappointed to see this proposal being accepted. On many other proposals, it has been argued that we should prefer making the compiler smarter to fix the problem for everyone retroactively, instead of adding new functions people have to go rewrite all their code to go use (e.g., |
@aclements I don't follow. What does it mean for the loop to be factored out of the benchmark func? Which part of the benchmark func do you mean by "actual body of the benchmark"?
Go benchmarks don't have a history that the Go tool compares results against (which would be a great feature), so I don't see the issue. We don't worry about compiler improvements throwing off benchmark results, as far as I know. And as I pointed out, there's a workaround for getting back optimizations for setup/teardown code.
I'm surprised as well. This doesn't seem very Go-like. Is there a precedent of taking this special-function-wrapped-around-value optimization approach before in Go? A function approach seems more consistent with how Go does things: func BenchmarkFib10(b *testing.B) {
b.Do(func() { Fib(10) })
} where the compiler is free to disable the appropriate optimizations inside the func literal for b.Do. |
@willfaught FWIW I have a ton of code that has complex benchmarks like
IOW, my experience lines up with with what @aclements said: it's common that the
Your If so, then I guess I'd point you at @rsc's comment when I asked about this:
|
I mean that the func Benchmark(b *testing.B) {
b.Run("1", func(b *testing.B) { f(b, 1) })
b.Run("2", func(b *testing.B) { f(b, 2) })
}
func f(b *testing.B, arg int) {
for i := 0; i < b.N; i++ {
// .. do something ..
}
} Factoring the Of course, it's also possible that the Another option would be that the compiler recognizes a loop over @rsc is planning to gather some data on how often the (Haha, looks like @cespare beat me to this point by 2 minutes. 😄)
I believe comparing across time is one of the most common uses of benchstat. We regularly get reports of things that have slowed down from one release to another. On the Go team, we certainly do this all the time with https://perf.golang.org/dashboard/.
There isn't precedent in Go for any of these approaches. I could see an argument for not doing any implicit Keep/benchmark deoptimization because any such approach is too implicit, but I think that's not the argument you're making.
This is what I originally proposed in #61515. However, it's harder to eliminate the overhead of that for very short benchmarks (certainly not impossible, but it requires more inlining and more complex inlining). This also opens the possibility of passing something that isn't just a function literal, in which case we definitely wouldn't be able to deoptimize the loop body. |
It is some weird that the |
The point of |
What is the effect of calling |
It should be the same whether it is in a test file or not. |
So it is a general purpose function in syntax/semantics, but a testing specific function subjectively. Not a big problem though. |
@cespare @aclements Thanks for the explanations.
I agree it's the same as #61515, although the Loop in this proposal seems to be different, not taking a function, and returning a boolean. Regarding that change, can this proposal be updated to include that? It's difficult to track the current state of the proposal by piecing together all the comments.
Yes, assuming you mean the Loop from #61515, and not the Loop here, as explained just above.
Why do we need to expose Keep to explain what Loop is doing? I don't see why we can't explain what Loop does in the same way, e.g. "All function values, all arguments, and all function results are forced to be evaluated etc etc etc..." Why do users need this general power? What if we limit the disabled optimizations to just func literals that are assigned to a new package testing type func Benchmark1(b *testing.B) {
b.Loop(func() { Fib(10) }) // Not optimized
}
func Benchmark2(b *testing.B) {
var notOptimized testing.BenchFunc = func() { Fib(10) }
var optimized func() = func() { Fib(10) }
b.Loop(notOptimized)
b.Loop(testing.BenchFunc(optimized))
} work as expected. @cespare's example would be func BenchmarkFoo(b *testing.B) {
for _, bb := range []struct{
name string
/* lots of testing parameters */
} {
{ /* test case 1 */ },
// ...
} {
// lots of setup code
b.Run(bb.name, func(b *testing.B) {
benchFoo(b, bb.x, bb.y, some, other, params)
})
}
}
func benchFoo(b *testing.B, x, y, z int) {
// ...
b.Loop(func() {
Foo(x, y, z)
}
} |
The #61515 proposal does include a pointer to the latest version in the top post. We tend not to do significant rewrites of the top post in a proposal because then it makes it hard to follow the conversation that follows it, and instead add updates to it linking to the comment explaining the latest version. There's no really ideal way to do this. It may be that the way I wrote the update to #61515 wasn't clear enough, so I've tried to rewrite it.
You're right that we can explain how Loop deoptimizes without exposing Keep. However, not exposing Keep limits refactoring opportunities, and also makes it impossible to write examples the allow partial optimization like in @bcmills' comment. Granted, we expect both of these situations to be rare.
This seems strictly more complicated to me. Earlier you argued that "A function approach seems more consistent with how Go does things", but I'm not sure I agree with that. Go APIs tend not to reach for closures when simpler and more direct constructs will do. For example, |
Oops, I guess his example doesn't technically show partial optimization since there's only one argument to the function under test. Partial optimization would mix one (or more) argument passed in the |
@aclements Adding an hr divider at the end, followed by an "Edit: Changed to [...], see these comments [...]" line(s) would be sufficient. This is comparable to the practice of reserving the first comment for FAQs and proposal updates that I've seen the Go team use elsewhere recently, which worked well.
Can you demonstrate an example using Keep?
Can you demonstrate an example using Keep? |
The auto-Keep during b.Loop could be applied the same in any loop that counts from 0 to b.N where b has type *testing.B. Then b.Loop is less special - the compiler just applies it to both kinds of benchmark loops. |
If we make Keep auto-apply inside b.N loops, then b.Loop is no longer special, and converting a b.N loop to a b.Loop loop is not a performance change at all, so it seems like we should make Keep auto-apply inside b.N loops. That will also fix a very large number of Go microbenchmarks, although it will look like it slowed them down. That leaves the question of whether to auto-Keep at all. If we don't, we leave a big mistake that users, even experienced ones, will make over and over. The compiler can do the right thing for us here. Maybe it would help if someone could try a compiler implementation and see how difficult this is. |
Benchmarks frequently need to prevent certain compiler optimizations that may optimize away parts of the code the programmer intends to benchmark. Usually, this comes up in two situations where the benchmark use of an API is slightly artificial compared to a “real” use of the API. The following example comes from @davecheney's 2013 blog post, How to write benchmarks in Go, and demonstrates both issues:
Most commonly, the result of the function under test is not used because we only care about its timing. In the example, since
Fib
is a pure function, the compiler could optimize away the call completely. Indeed, in “real” code, the compiler would often be expected to do exactly this. But in benchmark code, we’re interested only in the side-effect of the function’s timing, which this optimization would destroy.An argument to the function under test may be unintentionally constant-folded into the function. In the example, even if we addressed the first issue, the compiler may compute Fib(10) entirely at compile time, again destroying the benchmark. This is more subtle because sometimes the intent is to benchmark a function with a particular constant-valued argument, and sometimes the constant argument is simply a placeholder.
There are ways around both of these, but they are difficult to use and tend to introduce overhead into the benchmark loop. For example, a common workaround is to add the result of the call to an accumulator. However, there’s not always a convenient accumulator type, this introduces some overhead into the loop, and the benchmark must then somehow ensure the accumulator itself doesn’t get optimized away.
In both cases, these optimizations can be partial, where part of the function under test is optimized away and part isn’t, as demonstrated in @eliben’s example. This is particularly subtle because it leads to timings that are incorrect but also not obviously wrong.
Proposal
I propose we add the following function to the testing package:
(This proposal is an expanded and tweaked version of @randall77’s comment.)
The
Keep
function can be used on the result of a function under test, on arguments, or even on the function itself. UsingKeep
, the corrected version of the example would be:(Or
testing.Keep(Fib)(10)
, but this is subtle enough that I don’t think we should recommend this usage.)Unlike various other solutions,
Keep
also lets the benchmark author choose whether to treat an argument as constant or not, making it possible to benchmark expected constant folding.Alternatives
Keep
may not be the best name. This is essentially equivalent to Rust’sblack_box
, and we could call ittesting.BlackBox
. Other options includeOpaque
,NoOpt
,Used
, andSink
.testing: document best practices for avoiding compiler optimizations in benchmarks #27400 asks for documentation of best practices for avoiding unwanted optimization. While we could document workarounds, the basic problem is Go doesn’t currently have a good way to write benchmarks that run afoul of compiler optimizations.
proposal: testing: a less error-prone API for benchmark iteration #48768 proposes
testing.Iterate
, which forces evaluation of all arguments and results of a function, in addition to abstracting away the b.N loop, which is another common benchmarking mistake. However, its heavy use of reflection would be difficult to make zero or even low overhead, and it lacks static type-safety. It also seems likely that users would often just pass afunc()
with the body of the benchmark, negating its benefits for argument and result evaluation.runtime.KeepAlive
can be used to force evaluation of the result of a function under test. However, this isn’t the intended use and it’s not clear how this might interact with future optimizations toKeepAlive
. It also can’t be used for arguments because it doesn’t return anything. @cespare has some arguments againstKeepAlive
in this comment.The text was updated successfully, but these errors were encountered: