New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: collect performance counters for benchmarks #21295

Open
cherrymui opened this Issue Aug 3, 2017 · 9 comments

Comments

Projects
None yet
6 participants
@cherrymui
Contributor

cherrymui commented Aug 3, 2017

Performance counters may be helpful for benchmarking:

  • it can provide more information like cache misses, branch misprediction, etc.
  • number of cycles may be more stable than wall clock time.

It may be hard to do it in a portable way. But doing it only on platforms that are available would still be nice.

@aclements

@gopherbot gopherbot added this to the Proposal milestone Aug 3, 2017

@gopherbot gopherbot added the Proposal label Aug 3, 2017

@cherrymui

This comment has been minimized.

Contributor

cherrymui commented Aug 3, 2017

Discussed with @aclements. Open an issue for tracking.

@josharian

This comment has been minimized.

Contributor

josharian commented Aug 3, 2017

@martisch and I also discussed this recently. Having cache misses and branch mispredictions would be super helpful.

@aclements

This comment has been minimized.

Member

aclements commented Aug 4, 2017

I think this would be great. Probably we would want to build this on #16110. Probably we would want to put this behind a flag to go test since it will increase its chattiness (then it could also fail if the flag was specified but couldn't be supported).

@rsc

This comment has been minimized.

Contributor

rsc commented Aug 14, 2017

What is the proposal to evaluate here? Everyone (including me) seems to think this is a good idea but there are no actual details.

@aclements

This comment has been minimized.

Member

aclements commented Aug 14, 2017

Here's a more concrete proposal as a starting point.

Add a -test.benchperf flag to the flags exported by the standard testing package. When this flag is specified, benchmarks report additional per-operation metrics based on performance counters exposed by the hardware performance monitoring unit. If this flag is passed but the platform does not support hardware performance counters, it is a no-op. Exact counters would depend on the platform, but a good set to collect on x86 is: cycles, LLC-misses, branch-misses. These would be reported as new metrics cycles/op, etc.

On Linux, this would be built on the kernel's perf_event_open API, which takes an event description and returns an FD from which the event counter's current value can be read at any time.

Questions:

  • I don't like -test.benchperf. What should it be called?
  • Should the flag accept the names of counters to collect? At least on Linux there's a standard way of naming events and a way to enumerate supported events (see perf list).
@randall77

This comment has been minimized.

Contributor

randall77 commented Aug 14, 2017

-test.counters

I want a retired instruction count. That's even more stable than cycles (although not as important).

@rsc

This comment has been minimized.

Contributor

rsc commented Oct 9, 2017

Sure, -test.counters seems fine (or something else if you decide that's not accurate enough). Marking proposal accepted.

@rsc rsc changed the title from proposal: testing: collect performance counters for benchmarks to testing: collect performance counters for benchmarks Oct 9, 2017

@rsc

This comment has been minimized.

Contributor

rsc commented Oct 9, 2017

Please do give some thought to having a few -counters= names that work portably across architectures. If there's a cycle count for x86 and one for arm it would be nice if there is a single name that enables either one, for example. (I'm assuming this is a comma-separated list and that unknown things are just ignored, or something like that.)

@aclements

This comment has been minimized.

Member

aclements commented Oct 10, 2017

FWIW, Linux perf has a list of portable names already, printed by perf list (specifically the "Hardware event" and "Hardware cache event" categories). I'm pretty sure the "Hardware events" are supported basically everywhere. I'm less sure about the "Hardware cache events". These event types are baked into the perf ABI.

For reference, here are the "Hardware events": branch-instructions (alias branches), branch-misses, bus-cycles, cache-misses, cache-references, cpu-cycles (alias cycles), instructions, ref-cycles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment