-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
testing: collect performance counters for benchmarks #21295
Comments
Discussed with @aclements. Open an issue for tracking. |
@martisch and I also discussed this recently. Having cache misses and branch mispredictions would be super helpful. |
I think this would be great. Probably we would want to build this on #16110. Probably we would want to put this behind a flag to |
What is the proposal to evaluate here? Everyone (including me) seems to think this is a good idea but there are no actual details. |
Here's a more concrete proposal as a starting point. Add a On Linux, this would be built on the kernel's Questions:
|
I want a retired instruction count. That's even more stable than cycles (although not as important). |
Sure, |
Please do give some thought to having a few -counters= names that work portably across architectures. If there's a cycle count for x86 and one for arm it would be nice if there is a single name that enables either one, for example. (I'm assuming this is a comma-separated list and that unknown things are just ignored, or something like that.) |
FWIW, Linux perf has a list of portable names already, printed by For reference, here are the "Hardware events": branch-instructions (alias branches), branch-misses, bus-cycles, cache-misses, cache-references, cpu-cycles (alias cycles), instructions, ref-cycles. |
Has there been any progress on this one? |
I recently wrote https://pkg.go.dev/github.com/aclements/go-perfevent/perfbench to do this as a library. Of course, you have to (slightly) modify each benchmark to enable this, which is much less ergonomic than a flag supported everywhere. The other problem is that, to do this properly, you need runtime help. In Linux perf, at least, performance counters are per OS thread. In my package, I worked around this by calling LockOSThread, but that can have other performance effects and is fragile if a benchmark starts another goroutine. I see two viable options:
|
Performance counters may be helpful for benchmarking:
It may be hard to do it in a portable way. But doing it only on platforms that are available would still be nice.
@aclements
The text was updated successfully, but these errors were encountered: