testing: collect performance counters for benchmarks #21295

cherrymui · 2017-08-03T19:20:25Z

Performance counters may be helpful for benchmarking:

it can provide more information like cache misses, branch misprediction, etc.
number of cycles may be more stable than wall clock time.

It may be hard to do it in a portable way. But doing it only on platforms that are available would still be nice.

@aclements

cherrymui · 2017-08-03T19:23:05Z

Discussed with @aclements. Open an issue for tracking.

josharian · 2017-08-03T20:59:42Z

@martisch and I also discussed this recently. Having cache misses and branch mispredictions would be super helpful.

aclements · 2017-08-04T17:38:24Z

I think this would be great. Probably we would want to build this on #16110. Probably we would want to put this behind a flag to go test since it will increase its chattiness (then it could also fail if the flag was specified but couldn't be supported).

rsc · 2017-08-14T20:17:26Z

What is the proposal to evaluate here? Everyone (including me) seems to think this is a good idea but there are no actual details.

aclements · 2017-08-14T20:36:28Z

Here's a more concrete proposal as a starting point.

Add a -test.benchperf flag to the flags exported by the standard testing package. When this flag is specified, benchmarks report additional per-operation metrics based on performance counters exposed by the hardware performance monitoring unit. If this flag is passed but the platform does not support hardware performance counters, it is a no-op. Exact counters would depend on the platform, but a good set to collect on x86 is: cycles, LLC-misses, branch-misses. These would be reported as new metrics cycles/op, etc.

On Linux, this would be built on the kernel's perf_event_open API, which takes an event description and returns an FD from which the event counter's current value can be read at any time.

Questions:

I don't like -test.benchperf. What should it be called?
Should the flag accept the names of counters to collect? At least on Linux there's a standard way of naming events and a way to enumerate supported events (see perf list).

randall77 · 2017-08-14T21:06:41Z

-test.counters

I want a retired instruction count. That's even more stable than cycles (although not as important).

rsc · 2017-10-09T20:40:10Z

Sure, -test.counters seems fine (or something else if you decide that's not accurate enough). Marking proposal accepted.

rsc · 2017-10-09T20:41:22Z

Please do give some thought to having a few -counters= names that work portably across architectures. If there's a cycle count for x86 and one for arm it would be nice if there is a single name that enables either one, for example. (I'm assuming this is a comma-separated list and that unknown things are just ignored, or something like that.)

aclements · 2017-10-10T15:52:16Z

FWIW, Linux perf has a list of portable names already, printed by perf list (specifically the "Hardware event" and "Hardware cache event" categories). I'm pretty sure the "Hardware events" are supported basically everywhere. I'm less sure about the "Hardware cache events". These event types are baked into the perf ABI.

For reference, here are the "Hardware events": branch-instructions (alias branches), branch-misses, bus-cycles, cache-misses, cache-references, cpu-cycles (alias cycles), instructions, ref-cycles.

clausecker · 2020-09-21T14:18:29Z

Has there been any progress on this one?

aclements · 2024-08-15T14:19:17Z

I recently wrote https://pkg.go.dev/github.com/aclements/go-perfevent/perfbench to do this as a library. Of course, you have to (slightly) modify each benchmark to enable this, which is much less ergonomic than a flag supported everywhere.

The other problem is that, to do this properly, you need runtime help. In Linux perf, at least, performance counters are per OS thread. In my package, I worked around this by calling LockOSThread, but that can have other performance effects and is fragile if a benchmark starts another goroutine. I see two viable options:

Monitor the whole process. This requires a reliable way to enumerate all threads and catch when new threads are created, but is otherwise fairly straightforward. However, this may catch other unrelated work going on in the process, and I'm not sure if that's a feature or a bug.
Monitor goroutines, with counter inheritance. I don't particularly like this option. It's hard to implement because the runtime would have to switch out the perf context on goroutine switches, which is also potentially expensive. Also it depends on goroutine inheritance, which makes it fragile in the way that inheritance always is. (Maybe we could tie it to pprof labels, but that's even more complexity and people rarely use those unless they have some reason to.)

gopherbot added this to the Proposal milestone Aug 3, 2017

gopherbot added the Proposal label Aug 3, 2017

rsc changed the title ~~proposal: testing: collect performance counters for benchmarks~~ testing: collect performance counters for benchmarks Oct 9, 2017

rsc added the Proposal-Accepted label Oct 9, 2017

josharian mentioned this issue Apr 18, 2018

testing: show rusage statistics for benchmarks #24905

Open

aclements mentioned this issue Apr 18, 2018

x/sys/linux/perf: add package for Linux perf tracing #24918

Open

aclements mentioned this issue May 7, 2018

testing: add -benchtime=100x (x suffix for exact count) #24735

Closed

aclements mentioned this issue Jul 23, 2018

proposal: testing: add B method for adding stats #26037

Closed

josharian mentioned this issue Jan 30, 2020

proposal: runtime/pprof: add PMU-based profiles #36821

Open

jmacd mentioned this issue Sep 21, 2022

Proposal: GitHub self-hosted runners for specific use cases open-telemetry/community#1162

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testing: collect performance counters for benchmarks #21295

testing: collect performance counters for benchmarks #21295

cherrymui commented Aug 3, 2017

cherrymui commented Aug 3, 2017

josharian commented Aug 3, 2017

aclements commented Aug 4, 2017

rsc commented Aug 14, 2017

aclements commented Aug 14, 2017

randall77 commented Aug 14, 2017

rsc commented Oct 9, 2017

rsc commented Oct 9, 2017

aclements commented Oct 10, 2017

clausecker commented Sep 21, 2020

aclements commented Aug 15, 2024

testing: collect performance counters for benchmarks #21295

testing: collect performance counters for benchmarks #21295

Comments

cherrymui commented Aug 3, 2017

cherrymui commented Aug 3, 2017

josharian commented Aug 3, 2017

aclements commented Aug 4, 2017

rsc commented Aug 14, 2017

aclements commented Aug 14, 2017

randall77 commented Aug 14, 2017

rsc commented Oct 9, 2017

rsc commented Oct 9, 2017

aclements commented Oct 10, 2017

clausecker commented Sep 21, 2020

aclements commented Aug 15, 2024