Skip to content

proposal: testing: B.Execute to run a subbenchmark and return results #59300

@ayang64

Description

@ayang64

In general, benchmarks in isolation aren't very useful. To compare benchmarks we need an external tool (like benchstat). It'd be nice if we could do comparative benchmarking from within the go test tool itself.

When refactoring code, it is often useful to maintain an old version of a function or method to test for regressions or change in functionality.

Likewise, when refactoring for performance we should test for functional regressions AND performance regressions. That is: We should ensure that a change represents a performance improvement and make sure that it remains the case over time.

For example, lets say we have a function fib() that we think we've sped up. I believe a reasonable thing to do would be to create a test compare our new version of fib() with our old version. That might entail something like:

// in fib_test.go

func oldFib(n int) int {
  // insert fib() implementation here...
}

// imagine our "new" fib() function has replaced the old implementation of fib() -- presumably in fib.go


// TestFibRegression ensures that our new optimized fib is bug-for-bug compatible with the previous version.  
func TestFibRegression(t *testing.T) {
  for i := 0; i < 50; i++ {
    if got, expected := fib(i), oldFib(i); got != expected {
      t.Fatalf("regression: fib(%d) returned %d; oldFib(%[1]d) returned %d", i, got, expected)
    }
  }
}

So maybe that's a reasonable way to ensure that our new fib() is a valid replacement for the old one. But how do we ensure that our new fib() is always faster than the old?

How about this:

// BenchmarkFib benchmarks and compares our optimized and unoptimized fib implementation.
// If, somehow, the optimized version becomes slower, this benchmark should fail.
func BenchmarkFib(b *testing.B) {
  funcs := map[string]func(int) int {
    "oldFib":  oldFib,
    "fib": fib,
  }

  r := map[string]testing.BenchmarkResults{}

  for name, f := range funcs {
    r[name] = b.Execute(func(b *testing.B) {
      for i := 0; i < b.N; i++ {
        f(100) // whatever -- i know, inlining etc.
      }
    })
  }

  if newns, oldns := r["fib"].NsPerOp(), r["oldFib"].NsPerOp; if oldns > 0 && newns > oldns {
    b.Fatalf("regression: new fib is %d ns/op; should be less than old fib at %d ns/op", newns, oldns)
  }
}

The functions can be compared on the same machine and as near the same time as possible. oldFib can be excluded from the benchmark if necessary by excluding sub-tests that starts with old and we can have an easy way to be alerted if the optimization regresses some how (maybe a new compiler optimization or standard library change, etc.).

The goal is to allow comparative benchmarks without external tools.

I propose this instead of using testing.Benchmark() unless testing.Benchmark() can be called from... an actual benchmark in a meaningful way.

Otherwise, I don't think we should have to run benchmarks from a Test function for the same reason we generally don't want to run benchmarks when we test.

Maybe additional methods could be used to compute and output differences in a way similar to benchstat.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Incoming

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions