-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Will there be a README or other documentation? #1
Comments
@travisdowns Yessir, will get down to writing it. This Github repo would probably be the best place to discuss it. In the README or in the Wiki page. I don't want to just copy over what I wrote there, since I was explaining why the counters must have been set to count in OS mode, but I'll certainly inspire myself from them. |
@travisdowns There's the beginnings of a |
Awesome, reading it now. What's the approximate cost of the How does this compare to agner fogs http://www.agner.org/optimize/#testp ? How does this compare to PAPI? I'm actually looking for a lightweight way to time smallish sections of code. My current approach is to use Linux perf, but it doesn't have an API (you could, in principle, use the underlying |
@travisdowns They are defined here. The software does allow userspace to write configurations and counts to the hardware MSRs, and makes a kernel transition when doing so, but the macros
The IIRC, PAPI is the interface My |
The 240 cycles is for reading all 8 counters, right? Is there an option to only read a subset? |
@travisdowns Well, technically, 7 counters (3 fixed, 4 general-purpose). It would be possible to read a subset by hacking the inline asm macros, but I wanted to avoid branches in them for reasons of predictability and avoiding incrementing counters if I could avoid it (like # of branches encountered and (mis)-predicted). Avoiding branches in that code while allowing any subset of 7 counters would require 2^7 versions of the macro, a bit painful. Is the overhead of 7 counter reads that considerable? |
@travisdowns Other thing to note, certain performance events can only be counted on certain counters (Some L1/L2 events can only be counted in GP1, for instance). I've no idea why. |
I can't build through the readme.d, the |
It would be awesome to have a README or documentation on this tool. A lot what you've described in this answer could simply be copied over.
Are you willing to answer questions about the tool? What's the best forum for it? Issues here on github? Questions on stackoverflow? Somewhere else?
The text was updated successfully, but these errors were encountered: