Will there be a README or other documentation? #1

travisdowns · 2016-10-09T17:18:05Z

It would be awesome to have a README or documentation on this tool. A lot what you've described in this answer could simply be copied over.

Are you willing to answer questions about the tool? What's the best forum for it? Issues here on github? Questions on stackoverflow? Somewhere else?

obilaniu · 2016-10-09T17:33:48Z

@travisdowns Yessir, will get down to writing it. This Github repo would probably be the best place to discuss it. In the README or in the Wiki page. I don't want to just copy over what I wrote there, since I was explaining why the counters must have been set to count in OS mode, but I'll certainly inspire myself from them.

obilaniu · 2016-10-09T20:31:09Z

@travisdowns There's the beginnings of a README.md in the repo now, though much remains unsaid, especially about the kernel code.

travisdowns · 2016-10-10T01:42:10Z

Awesome, reading it now.

What's the approximate cost of the PFCSTART/PFCEND calls? Do the make a kernel transition, or does the LKM enable user-space setting & reading of the PMC counters?

How does this compare to agner fogs testp program:

http://www.agner.org/optimize/#testp

?

How does this compare to PAPI?

I'm actually looking for a lightweight way to time smallish sections of code. My current approach is to use Linux perf, but it doesn't have an API (you could, in principle, use the underlying perf_events syscalls, but I haven't looked into how hard that would actually be). It seems like libpfc could be that way.

obilaniu · 2016-10-10T03:03:38Z

@travisdowns They are defined here. pfcRemoveBias() automatically computes the costs for the current counter configuration. In particular, both sequences cost precisely the same (Assuming add/sub with memory operands cost the same), and both cost 37 instructions, ~240 unhalted core cycles and 0 branches (at least on my systems). There is no other overhead, and no system calls. In my experience, n pairs of PFCSTART() and PFCEND() followed by pfcRemoveBias(, n) produces essentially exact counts; For instance, if they sandwich no code, they'll reliably measure about 0 on all metrics.

The software does allow userspace to write configurations and counts to the hardware MSRs, and makes a kernel transition when doing so, but the macros PFCSTART() and PFCEND(), which employ rdpmc instructions, specifically do not make kernel transitions in order to ensure their deterministic run-time and cost. This determinism is relied upon by pfcRemoveBias() to compensate that deterministic cost.

testp is software in the same vein as libpfc, and supports more OSes and more CPUs. But it is not library-based, and its overhead estimation is not as deterministic as mine. The overhead estimation is written in C and involves loops; The code size for this is much greater, touches more icache lines, involves loops (and therefore branches) and there is no guarantee that the code for overhead estimation is exactly the same as the actual hot code timing. Moreover, the start code and end code are not precisely the same (the former involves an assignment, the latter a subtraction). Lastly, the rdpmc readouts from testp are int, which is 32 bits, while my macros perform full-bitwidth reads as reported by CPUID (On Haswell, 48-bit) and accumulate them into a 64-bit integer. However, it does correctly set the User bit and clear the Operating System bit, like me.

The PFCSTART() and PFCEND() macros are written in inline assembler. The instructions within them are precisely the same (except for the add/sub distinction), have the same cost and instruction size and are branchless. pfcRemoveBias() contains an single inline assembler chunk with both of them, to measure precisely their overhead. The PFCSTART() macro subtracts while the PFCEND() macro adds the current readouts of rdpmc, which means you can use multiple pairs to perform fine-grained performance measurements within the code, then invoke pfcRemoveBias(, n) with n equaling the number of such pairs to remove the overhead precisely.

IIRC, PAPI is the interface perf uses, in which case it would suffer from the same overcounting problem as perf.

My pfcdemo code should get you started using my library; The "hot section" is where you'd place your code for isolated snippets, but alternately you can ditch that and use my code as a library within your larger projects. For that, call my initialization, thread-pinning and counter setup code in your main, define a global array of 7 64-bit integers, and sandwich my PFC* macros around any chunk of code you wish to time. Then at program exit call pfcRemoveBias(, n) with the number of times n that chunk of code was executed, divide the counts by n to compute an average, and print out this value.

travisdowns · 2017-05-10T23:38:12Z

The 240 cycles is for reading all 8 counters, right? Is there an option to only read a subset?

obilaniu · 2017-05-10T23:51:09Z

@travisdowns Well, technically, 7 counters (3 fixed, 4 general-purpose).

It would be possible to read a subset by hacking the inline asm macros, but I wanted to avoid branches in them for reasons of predictability and avoiding incrementing counters if I could avoid it (like # of branches encountered and (mis)-predicted). Avoiding branches in that code while allowing any subset of 7 counters would require 2^7 versions of the macro, a bit painful.

Is the overhead of 7 counter reads that considerable?

obilaniu · 2017-05-10T23:52:52Z

@travisdowns Other thing to note, certain performance events can only be counted on certain counters (Some L1/L2 events can only be counted in GP1, for instance). I've no idea why.

ms2pony · 2021-07-18T08:32:47Z

I can't build through the readme.d, the meson.py .. -Dbuildtype=release --prefix=/path/to/prefixdir # Such as $HOME/.local is hard to understand

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Will there be a README or other documentation? #1

Will there be a README or other documentation? #1

travisdowns commented Oct 9, 2016

obilaniu commented Oct 9, 2016

obilaniu commented Oct 9, 2016

travisdowns commented Oct 10, 2016

obilaniu commented Oct 10, 2016 •

edited

Loading

travisdowns commented May 10, 2017

obilaniu commented May 10, 2017

obilaniu commented May 10, 2017

ms2pony commented Jul 18, 2021

Will there be a README or other documentation? #1

Will there be a README or other documentation? #1

Comments

travisdowns commented Oct 9, 2016

obilaniu commented Oct 9, 2016

obilaniu commented Oct 9, 2016

travisdowns commented Oct 10, 2016

obilaniu commented Oct 10, 2016 • edited Loading

travisdowns commented May 10, 2017

obilaniu commented May 10, 2017

obilaniu commented May 10, 2017

ms2pony commented Jul 18, 2021

obilaniu commented Oct 10, 2016 •

edited

Loading