What steps will reproduce the problem?
1. Put http://play.golang.org/p/-5mfEhIhC0 (a subset of the math/cmplx benchmarks) in
its own package.
2. Run 'go test -bench=Conj .'
3. Comment out BenchmarkCosh.
4. Run 'go test -bench=Conj .'
What is the expected output? What do you see instead?
Adding/removing an unrelated benchmark should not impact benchmark results.
With all three benchmarks:
BenchmarkConj 2000000000 1.08 ns/op
With only two benchmarks:
BenchmarkConj 2000000000 0.81 ns/op
These measurements are very consistent.
Please use labels and text to provide additional information.
The exact reproduction instructions depend on the version. The instructions above are
go version devel +acf346c00e56 Fri Apr 25 06:44:51 2014 -0700 darwin/amd64
The text was updated successfully, but these errors were encountered:
Can you see if the generated code for the benchmark is the same in both cases? It's
possible that what you are seeing is the effect of crossing instruction cache lines in a
tight loop. If that is indeed the case there isn't much the compiler or the testing
framework can do, except serve as a cautionary tale for authors of microbenchmarks.
On the other hand, if the code changes, then there is probably a bug.
In that case I don't see what can be done on the Go side. I'd be happy to hear
suggestions, but basically this is a microbenchmark of trivial code for which the loop
and function call overhead dominate. It is presumably sensitive to the exact code
Unrolling 8x here causes the delta to shrink. Instead of 1.08 to 0.81 (-25%), it is 0.81
to 0.74 (-10%).
It's hard to be sure that this is attributable directly to the unrolling and not to new
code alignment, but it does appear to help. And unrolling is good for reducing benchmark
The question then is whether it's worth the ugliness to do by hand, in the short term.