proposal: testing: B.Execute to run a subbenchmark and return results

In general, benchmarks in isolation aren't very useful.  To compare benchmarks we need an external tool (like `benchstat`).  It'd be nice if we could do comparative benchmarking from within the go test tool itself.

When refactoring code, it is often useful to maintain an old version of a function or method to test for regressions or change in functionality.

Likewise, when refactoring for performance we should test for functional regressions AND performance regressions.  That is: We should ensure that a change represents a performance improvement and make sure that it remains the case over time.

For example, lets say we have a function `fib()` that we think we've sped up.  I believe a reasonable thing to do would be to create a test compare our new version of `fib()` with our old version.  That might entail something like:

```go
// in fib_test.go

func oldFib(n int) int {
  // insert fib() implementation here...
}

// imagine our "new" fib() function has replaced the old implementation of fib() -- presumably in fib.go


// TestFibRegression ensures that our new optimized fib is bug-for-bug compatible with the previous version.  
func TestFibRegression(t *testing.T) {
  for i := 0; i < 50; i++ {
    if got, expected := fib(i), oldFib(i); got != expected {
      t.Fatalf("regression: fib(%d) returned %d; oldFib(%[1]d) returned %d", i, got, expected)
    }
  }
}
```

So maybe that's a reasonable way to ensure that our new `fib()` is a valid replacement for the old one.  But how do we ensure that our new fib() is always faster than the old?

How about this:

```go
// BenchmarkFib benchmarks and compares our optimized and unoptimized fib implementation.
// If, somehow, the optimized version becomes slower, this benchmark should fail.
func BenchmarkFib(b *testing.B) {
  funcs := map[string]func(int) int {
    "oldFib":  oldFib,
    "fib": fib,
  }

  r := map[string]testing.BenchmarkResults{}

  for name, f := range funcs {
    r[name] = b.Execute(func(b *testing.B) {
      for i := 0; i < b.N; i++ {
        f(100) // whatever -- i know, inlining etc.
      }
    })
  }

  if newns, oldns := r["fib"].NsPerOp(), r["oldFib"].NsPerOp; if oldns > 0 && newns > oldns {
    b.Fatalf("regression: new fib is %d ns/op; should be less than old fib at %d ns/op", newns, oldns)
  }
}
```

The functions can be compared on the same machine and as near the same time as possible.  `oldFib` can be excluded from the benchmark if necessary by excluding sub-tests that starts with `old` and we can have an easy way to be alerted if the optimization regresses some how (maybe a new compiler optimization or standard library change, etc.).

The goal is to allow comparative benchmarks without external tools.

I propose this instead of using testing.Benchmark() unless testing.Benchmark() can be called from... an actual benchmark in a meaningful way.

Otherwise, I don't think we should have to run benchmarks from a `Test` function for the same reason we generally don't want to run benchmarks when we test.

Maybe additional methods could be used to compute and output differences in a way similar to `benchstat`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

proposal: testing: B.Execute to run a subbenchmark and return results #59300

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

proposal: testing: B.Execute to run a subbenchmark and return results #59300

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions