The way parallel benchmark results are reported is easy to misinterpret, and the correct way to interpret them is not documented. Typically this makes perfectly-scalable benchmarks appear to perform much worse at low parallelism, and makes not-at-all-scalable benchmarks appear to be doing fine. For example, this came up in #31820 (comment)
Specifically, the "ns/op" reported is not CPU-ns/op, it's wall-ns/op. For example, suppose each op takes exactly 100 ns, regardless of parallelism. If the single-threaded benchmark runs for 1 sec, it will execute 10,000,000 ops, so ns/op = 1s/10,000,000 ops = 100 ns/op. But if the same benchmark runs 4-way parallel for 1 sec, it will execute 40,000,000 ops, so ns/op = 1s/40,000,000 = 25 ns/op. (I really wish it didn't work this way...)
Interpreting the results of CPU-bound parallel benchmarks is further complicated by hyper-threading (though this isn't the fault of the testing package).
I don't think we can change the reported ns/op at this point. We could perhaps introduce a new metric for parallel benchmarks. At the very least, we should document this.
The text was updated successfully, but these errors were encountered:
When many UUIDs are being generated concurrently,
contention on the atomic counter can slow things down.
There might be ways to speed this up, but for now, just
add a parallel benchmark so we can measure the baseline.
Initial results with `go test -bench . -cpu 1,2` on my machine (two physical cores):
BenchmarkContended 56413502 18.3 ns/op
BenchmarkContended-2 82533951 33.5 ns/op
Note that the 33.5ns/op measure is worse than it appears, because
that's wall ns/op, not cpu-ns/op (see golang/go#31884), so the time is actually 67.0ns/op when contended.