New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PMU: Performance Monitoring Unit counter support #597
Conversation
Add a script that can automatically generate Lua definitions for the hundreds of specific Performance Monitoring Unit events available in each individual Intel CPU.
Returns a Lua table defining all of the known performance counters for all Intel CPUs.
RDPMC is "Read Performance-Monitoring Counters". The disassembler already recognizes this instruction.
Reading performance monitoring counters requires assembler code: - CPUID to lookup the types of counter that exist for the current CPU. - CPUID to detect how many counters can be selected simultaneously. - RDPMC instruction to read a counter value. This is implemented using a ".dasl" source file that uses the new Lua-based dynasm support (snabbco#575).
|
I have been geeking out deeply on CPU performance counters. The way this branch is headed is to bypass the kernel and use raw CPU instructions to precisely track counters during a short piece of code execution e.g. the This would then be used much like wall-clock time for computing the number of events (e.g. cache misses, branch misses) and interesting ratios (e.g. instructions per cycle, cycles per packet, packets per cache miss). Could be that we hook this deeply into the I see this as a separate feature to the whole software toolchain around performance counters: Linux kernel support including multiplexing many logical counters onto the hardware slots available, The missing code right now is a |
The PMU API is now documented and fully implemented. The selftest method is rebuilt on the API.
|
Phew! The whole PMU API is in place and seems to be working. It is really easy to use. Trivial example: |
measure(f) is a new convenience function that returns the counters in a table. measure() and profile() now both return the function return value as their first value. to_table() now converts counters to Lua numbers from uint64_t.
Add "snabb ps"
Make it easy to track detailed CPU performance counters.
Modern CPUs can track and report on hundreds of different events. This can be extremely useful for low-level performance analysis. There are off-the-shelf tools such as pmu-tools available to access these counters but this branch is creating a "native" LuaJIT interface to make it easy to separately track specific pieces of code (e.g. apps).
Originally planned as an extension to LuaJIT and described in SnabbCo/luajit#2. Now I think it makes more sense to add it to Snabb Switch and make it separate to the
jit.pprofiler. This interface would simply take the delta of counter values between the start and end of a piece of code, as one does when measuring wall-clock time, instead of taking samples and matching them up to individual instructions.The detailed performance counter definitions could however also be used to enhance the LuaJIT performance event profiling support in SnabbCo/luajit#6.
The basic integration I envision is that you could say which performance counters you are interested in and the engine would report them separately for each app. For an example of using performance counters see lukego/blog#6.
There are existing C libraries that we could use (libpfm and jevents) but if possible I would prefer to accomplish this with a small amount of our own code rather than adding these as dependencies.