-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't Process Perf File #144
Comments
This was the command line used:
|
Thanks for reporting. I'll take a look. |
Thanks! Let me know if you need more information. |
See also: https://lore.kernel.org/lkml/CAHk-=whqCT0BeqBQhW8D-YoLLgp_eFY=8Y=9ieREM5xx0ef08w@mail.gmail.com/ This is blocking our ability to use AutoFDO to improve the performance of the Linux kernel. |
Will you be able to use the following perf command: perf record -a -c 10000019 -e cycles -b -i -m 16 -N -o - sleep 30 That is replace the origin command "--call-graph lbr" with "-b". The latter explicitly instructs perf to record lbr records whereas the former one records callgraph that is obtained from lbr. After that, you may check if the perf.data file contains any lbr records via: perf script -Fpid,brstack -i perf.data This should dump tons of lbr records, one snapshot per line. Let's see if it helps. I'll follow up with this tomorrow. |
I'll give it a shot. I tried it with
|
(just to mention - "-c 128" seems a little bit intrusive, and usually we use a prime number (e.g., 10007) for "-c", so we do not put a bias on some address that is part of a loop. Also, the autofdo tool uses LBR data from perf.data, the LBR data are only recorded when "-b" (alias for "-j any") are given. When autofdo tool sees no LBR data in the binary, it cannot proceed.) |
Ah! Okay. I got a Is the LBR data you mention here the same for chips that don't have LBR, like ARM? |
Good to know you got a profraw file. As to the LBR, it is only available on INTEL architectures, Skylake and later generations have 32-depth LBR records (meaning each snapshot contains a consecutive of last 32 branch records), the Haswell only 16. AMD and ARM architectures do not support LBR for now, so "-b" probably will give an error on those machines. LBR data are a reflection of code paths. For AMD and INTEL machines, I believe most of the time the code paths are the same, so binaries that are optimized by profraw collected on INTEL should see similar performance boost on both INTEL and AMD machines. However, for ARM, the code paths may be different (some libraries have different versions for X86_64 / ARM), for that case, if we use profraw to optimize code that is to be run on ARM, we might get a regression. (We are currently exploring LBR-like perf data on ARM machines, but still not quite there yet....) |
Closing this. (Please reopen if any further questions..) |
I did get a profraw file, but it wasn't very useful. This was part of the profile summary. The total number of functions is very low.
|
Please reopen this. (I don't have the ability to do that.) |
A tangential note - AMD has added BRS (which is the counterpart of INTEL LBR) support to it's Fam19h Model 01h CPUs. And current toolchains support it seamlessly. We've also done an evaluation on AMD BRS and the performance numbers are on par. As to the profile, the number of functions that have counters are too small, usually we will either tune down "-c" or tune up profiling period. Also use "loadtests" that "saturate" kernel functionality would be crucial. |
See also #138 |
@shenhanc78 Hi, can you share more detail? Such as: use the same commands "perf record -b ./sort"? |
@bage613 @shenhanc78 I'd also be interested if there is a way to use AutoFDO on AMD Milan or Genoa. |
@bage613 is now able to collect raw perf data, convert it to autofdo profile and see some performance improvement for one of his benchmarks, and he is trying to do it for a open source server and reproduce the improvement. This is done on Zen3 (some models) and Zen4 CPUs. He will share more if he sees wins for the open source server. |
I have a perf file that autofdo doesn't seem to be able to do anything with. Is autofdo not able to handle "use_lbr"?
vmlinux.tcp_rr.not-instrumented.perf.gz
The text was updated successfully, but these errors were encountered: