Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
testing: parallel benchmark results are poorly documented #31884
The way parallel benchmark results are reported is easy to misinterpret, and the correct way to interpret them is not documented. Typically this makes perfectly-scalable benchmarks appear to perform much worse at low parallelism, and makes not-at-all-scalable benchmarks appear to be doing fine. For example, this came up in #31820 (comment)
Specifically, the "ns/op" reported is not CPU-ns/op, it's wall-ns/op. For example, suppose each op takes exactly 100 ns, regardless of parallelism. If the single-threaded benchmark runs for 1 sec, it will execute 10,000,000 ops, so ns/op = 1s/10,000,000 ops = 100 ns/op. But if the same benchmark runs 4-way parallel for 1 sec, it will execute 40,000,000 ops, so ns/op = 1s/40,000,000 = 25 ns/op. (I really wish it didn't work this way...)
Interpreting the results of CPU-bound parallel benchmarks is further complicated by hyper-threading (though this isn't the fault of the testing package).
I don't think we can change the reported ns/op at this point. We could perhaps introduce a new metric for parallel benchmarks. At the very least, we should document this.
When many UUIDs are being generated concurrently, contention on the atomic counter can slow things down. There might be ways to speed this up, but for now, just add a parallel benchmark so we can measure the baseline. Initial results with `go test -bench . -cpu 1,2` on my machine (two physical cores): BenchmarkContended 56413502 18.3 ns/op BenchmarkContended-2 82533951 33.5 ns/op Note that the 33.5ns/op measure is worse than it appears, because that's wall ns/op, not cpu-ns/op (see golang/go#31884), so the time is actually 67.0ns/op when contended.