Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PMU: Performance Monitoring Unit counter support #597

Merged
merged 11 commits into from Sep 4, 2015
Merged

Conversation

lukego
Copy link
Member

@lukego lukego commented Aug 14, 2015

Make it easy to track detailed CPU performance counters.

Modern CPUs can track and report on hundreds of different events. This can be extremely useful for low-level performance analysis. There are off-the-shelf tools such as pmu-tools available to access these counters but this branch is creating a "native" LuaJIT interface to make it easy to separately track specific pieces of code (e.g. apps).

Originally planned as an extension to LuaJIT and described in SnabbCo/luajit#2. Now I think it makes more sense to add it to Snabb Switch and make it separate to the jit.p profiler. This interface would simply take the delta of counter values between the start and end of a piece of code, as one does when measuring wall-clock time, instead of taking samples and matching them up to individual instructions.

The detailed performance counter definitions could however also be used to enhance the LuaJIT performance event profiling support in SnabbCo/luajit#6.

The basic integration I envision is that you could say which performance counters you are interested in and the engine would report them separately for each app. For an example of using performance counters see lukego/blog#6.

There are existing C libraries that we could use (libpfm and jevents) but if possible I would prefer to accomplish this with a small amount of our own code rather than adding these as dependencies.

Add a script that can automatically generate Lua definitions for the
hundreds of specific Performance Monitoring Unit events available in
each individual Intel CPU.
Returns a Lua table defining all of the known performance counters for
all Intel CPUs.
RDPMC is "Read Performance-Monitoring Counters".

The disassembler already recognizes this instruction.
Reading performance monitoring counters requires assembler code:

- CPUID to lookup the types of counter that exist for the current CPU.

- CPUID to detect how many counters can be selected simultaneously.

- RDPMC instruction to read a counter value.

This is implemented using a ".dasl" source file that uses the new
Lua-based dynasm support (snabbco#575).
@lukego
Copy link
Member Author

lukego commented Aug 17, 2015

I have been geeking out deeply on CPU performance counters. The way this branch is headed is to bypass the kernel and use raw CPU instructions to precisely track counters during a short piece of code execution e.g. the push() method of an app that might execute around one thousand cycles.

This would then be used much like wall-clock time for computing the number of events (e.g. cache misses, branch misses) and interesting ratios (e.g. instructions per cycle, cycles per packet, packets per cache miss). Could be that we hook this deeply into the breathe() loop for computing per-app metrics or could be that we use it manually as an optimization tool.

I see this as a separate feature to the whole software toolchain around performance counters: Linux kernel support including multiplexing many logical counters onto the hardware slots available, perf tools building various analysis tools on top of the kernel perf_event_open(2) interface, and pmu-tools extension layer on top of perf. That stuff is extremely valuable for system-wide analysis and it would be awesome to make LuaJIT play better with that. However, for self-profiling software I found the raw hardware interface more appealing (taking a leaf from Agner Fog's book).

The missing code right now is a lib.pmu module to tie it all together and let you execute a function with a named list of performance events being counted for the duration. not finished yet...

The PMU API is now documented and fully implemented. The selftest
method is rebuilt on the API.
@lukego
Copy link
Member Author

lukego commented Aug 18, 2015

Phew! The whole PMU API is in place and seems to be working. It is really easy to use.

Trivial example:

$ sudo taskset -c 0 ./snabb snsh -i
Snabb> pmu = require("lib.pmu")
Snabb> pmu.profile(function() for i = 0, 10000 do end end)
EVENT                                   TOTAL
instructions                           45,702
cycles                                 45,130
ref-cycles                             90,264

measure(f) is a new convenience function that returns the counters in
a table.

measure() and profile() now both return the function return value as
their first value.

to_table() now converts counters to Lua numbers from uint64_t.
lukego added a commit to lukego/snabb that referenced this pull request Aug 19, 2015
lukego added a commit to lukego/snabb that referenced this pull request Aug 19, 2015
lukego added a commit to lukego/snabb that referenced this pull request Aug 19, 2015
@eugeneia eugeneia merged commit 9d9336e into snabbco:master Sep 4, 2015
@lukego lukego changed the title PMU: Performance Monitoring Unit counter support [WIP] PMU: Performance Monitoring Unit counter support Sep 13, 2015
@lukego lukego deleted the pmu branch February 24, 2016 12:45
benagricola pushed a commit to benagricola/snabb that referenced this pull request Nov 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants