-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to open performance counter with 'perf_event_open' #2815
Comments
Got stacktrace with line numbers when this happens (as of commit f7c3d17): #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ffff7949859 in __GI_abort () at abort.c:79
#2 0x0000555555a92c97 in rr::notifying_abort () at ../src/util.cc:1501
#3 0x00005555558ec7fc in rr::FatalOstream::~FatalOstream (this=0x7fffffffd9a0, __in_chrg=<optimized out>) at ../src/log.cc:360
#4 0x000055555591cfda in rr::start_counter (tid=0, group_fd=-1, attr=0x7fffffffdad0, disabled_txcp=0x0) at ../src/PerfCounters.cc:217
#5 0x000055555591e763 in rr::check_working_counters () at ../src/PerfCounters.cc:274
#6 0x000055555591e9be in rr::check_for_bugs (uarch=rr::IntelSandyBridge) at ../src/PerfCounters.cc:314
#7 0x000055555591ef0d in rr::init_attributes () at ../src/PerfCounters.cc:377
#8 0x000055555591f0af in rr::PerfCounters::default_ticks_semantics () at ../src/PerfCounters.cc:411
#9 0x0000555555a27721 in rr::Session::Session (this=0x555555c9f650) at ../src/Session.cc:50
#10 0x0000555555938327 in rr::RecordSession::RecordSession (this=0x555555c9f650, exe_path="/bin/echo", argv=std::vector of length 2, capacity 4 = {...}, envp=std::vector of length 48, capacity 64 = {...}, disable_cpuid_features=..., syscallbuf=rr::RecordSession::ENABLE_SYSCALL_BUF, syscallbuf_desched_sig=30, bind_cpu=rr::BIND_CPU, output_trace_dir="", trace_id=0x0, use_audit=false, unmap_vdso=false) at ../src/RecordSession.cc:2238
#11 0x0000555555937e10 in rr::RecordSession::create (argv=std::vector of length 2, capacity 4 = {...}, extra_env=std::vector of length 0, capacity 0, disable_cpuid_features=..., syscallbuf=rr::RecordSession::ENABLE_SYSCALL_BUF, syscallbuf_desched_sig=30 '\036', bind_cpu=rr::BIND_CPU, output_trace_dir="", trace_id=0x0, use_audit=false, unmap_vdso=false, force_asan_active=false) at ../src/RecordSession.cc:2204
#12 0x000055555592b2be in rr::record (args=std::vector of length 2, capacity 4 = {...}, flags=...) at ../src/RecordCommand.cc:632
#13 0x000055555592c078 in rr::RecordCommand::run (this=0x555555c863b0 <rr::RecordCommand::singleton>, args=std::vector of length 2, capacity 4 = {...}) at ../src/RecordCommand.cc:791
#14 0x0000555555aaebeb in main (argc=4, argv=0x7fffffffe388) at ../src/main.cc:249 |
So, after reading code and docs, it seems the error comes down to But I think the most interesting questions are: α) what does it mean the type not being valid, and β) is it possible to see it somewhere under And a more practical question: what happens if we replace |
To answer that one: preliminary answer it seems is "yes". I didn't try to modify the sources yet, but when am at this line with gdb right after Fun fact: this manual does not mention |
Oh, bad news: it turns out, the |
So, looking through kernel sources, α: this means your CPU archictecture does not define the specific CPU capability. β: yes, with
In my VM I don't see the All that said, there's still one question left: how does |
So, I think there is a workaround. If you run Will see if |
So, I did the changes, but I can't test it because I stumbled upon an irrelevant fail at this line. This fail belongs to some completely different event, represented by a |
Some machines, mainly VMs, may lack hardware support for cpu-cycles counter. However, it is no reason not to support them: there may be a similar software event available: PERF_COUNT_SW_CPU_CLOCK. The idea is taken from the kernel-provided `perf` utility, function `evsel__fallback` at evsel.c file. Fixes: rr-debugger#2815, rr-debugger#1753 Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Some machines, mainly VMs, may lack hardware support for cpu-cycles counter. However, it is no reason not to support them: there may be a similar software event available: PERF_COUNT_SW_CPU_CLOCK. The idea is taken from the kernel-provided `perf` utility, function `evsel__fallback` at evsel.c file. Fixes: rr-debugger#2815, rr-debugger#1753 Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
So you're saying that in this VM setup, |
I don't know what is "hardware retired conditional branches". I just fixed one problem I had to research into, but as I noted there's something else left (which may be the "hardware retired conditional branches" that you mention, I don't know yet, will look into it next week). Either way, I figure the more cases |
OK. I think there is no point in falling back from hardware cycle counting to software time counting if rr isn't going to work anyway. And I am pretty confident that, whenever hardware cycle counting doesn't work, rr isn't going to work anyway because the hardware ticks counter won't work. You will need to get the hardware counters working under VMWare. This is possible usually. |
@rocallahan okay, let me put it that way: I fixed one problem. Now |
On a side note, I don't think this is under my control. It is a very old vsphere, and it is nearly impossible to convince people to update. I think problems on this VM comes down to the vsphere being very old. |
If hardware RCB counting doesn't work there is no fix for that problem. I think we should make the error message better here. We say
We should say specifically "are hardware performance counters enabled?" and instead of saying "Try |
So, I just sshed to the VM in question and executed
So… Apparently RCB does work? |
I mean, I don't know what RCB is, but I assume that would be
|
Can you dump the entire output of |
Sure, here: |
What does |
I see, the |
Oh, while on it, if you don't mind my curiosity: how did you initially figured out that RCB doesn't work on this machine? Is there some connection between |
Yes, something's broken, but I don't know what exactly.
I guess so. Certainly rr won't work unless you can get that
PERF_COUNT_HW_CPU_CYCLES is usually the best-supported hardware counter. I've never seen a system where that didn't work but retired-conditional-branches did work. |
Such wiki page does not exist yet, right? Unless I'm overlooking… |
Right. |
Okay, so, I created the page, but there's a little problem: I also see |
This isn't really true but it doesn't matter. The wiki page looks fine. The other events are actually not needed normally. |
This is fixed, thanks for your help. |
Similar issue on a VM not under control here - is it possible to apply the same nice message we now have for rr during cmake? Building rr takes quite a while (and a bunch of dependencies ;-) so it feels a bit bad when you finally have build and installed it just to see the message then... Related question: would it be useful to reorder https://github.com/rr-debugger/rr/wiki/Building-And-Installing and have "Hardware/software configuration" first, then dependencies and build, then tests, then troubleshooting? |
That would complicate the build because we'd have to build a test program first and do whatever it depends on. I'd rather not add that for something that shouldn't happen too often. |
I'm not sure about that either TBH. For people who are already confident they can run rr they have to scan down to find the build instructions. |
Upon running
rr record …
I get this message and a trace. This is similar to #1753 because I also get it under a VM (it is a VMWare, but AFAIK it's a very old one). However,perf record …
works fine, and admesg | grep PMU
reports events available. So, the research done in #1753 does not apply here, so I create a new report.Steps to reproduce
Expected
Either
rr record
command wouldn't fail, or theperf record
command would also fail, ordmesg | grep PMU
would saysoftware events only
Actual
rr record
fails, butperf
doesn't, so reason for the fail is unclear.Versions
kernel
4.19.107
, custom buildrr
:5.3.0
and a latest git from commit f7c3d17The text was updated successfully, but these errors were encountered: