Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: parallel benchmark results are poorly documented #31884

aclements opened this issue May 7, 2019 · 1 comment

testing: parallel benchmark results are poorly documented #31884

aclements opened this issue May 7, 2019 · 1 comment
Documentation NeedsFix


Copy link

@aclements aclements commented May 7, 2019

The way parallel benchmark results are reported is easy to misinterpret, and the correct way to interpret them is not documented. Typically this makes perfectly-scalable benchmarks appear to perform much worse at low parallelism, and makes not-at-all-scalable benchmarks appear to be doing fine. For example, this came up in #31820 (comment)

Specifically, the "ns/op" reported is not CPU-ns/op, it's wall-ns/op. For example, suppose each op takes exactly 100 ns, regardless of parallelism. If the single-threaded benchmark runs for 1 sec, it will execute 10,000,000 ops, so ns/op = 1s/10,000,000 ops = 100 ns/op. But if the same benchmark runs 4-way parallel for 1 sec, it will execute 40,000,000 ops, so ns/op = 1s/40,000,000 = 25 ns/op. (I really wish it didn't work this way...)

Interpreting the results of CPU-bound parallel benchmarks is further complicated by hyper-threading (though this isn't the fault of the testing package).

I don't think we can change the reported ns/op at this point. We could perhaps introduce a new metric for parallel benchmarks. At the very least, we should document this.

@aclements aclements added this to the Go1.13 milestone May 7, 2019
@andybons andybons added the NeedsFix label May 8, 2019
rogpeppe added a commit to rogpeppe/fastuuid that referenced this issue May 9, 2019
When many UUIDs are being generated concurrently,
contention on the atomic counter can slow things down.

There might be ways to speed this up, but for now, just
add a parallel benchmark so we can measure the baseline.

Initial results with `go test -bench  . -cpu 1,2` on my machine (two physical cores):

	BenchmarkContended     	56413502	        18.3 ns/op
	BenchmarkContended-2   	82533951	        33.5 ns/op

Note that the 33.5ns/op measure is worse than it appears, because
that's wall ns/op, not cpu-ns/op (see golang/go#31884), so the time is actually 67.0ns/op when contended.
@andybons andybons removed this from the Go1.13 milestone Jul 8, 2019
@andybons andybons added this to the Go1.14 milestone Jul 8, 2019
@rsc rsc removed this from the Go1.14 milestone Oct 9, 2019
@rsc rsc added this to the Backlog milestone Oct 9, 2019
Copy link

@rpickz rpickz commented Mar 15, 2021

Looking at improving this documentation now.

Considering where best to document - thinking this detail would be best added in a little of the documentation for testing.B.RunParallel and within the summary of the testing package as a note.

Would anyone recommend any additional areas to document this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Documentation NeedsFix
None yet

No branches or pull requests

4 participants