Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to open performance counter with 'perf_event_open' #2815

Closed
Hi-Angel opened this issue Feb 19, 2021 · 29 comments
Closed

Unable to open performance counter with 'perf_event_open' #2815

Hi-Angel opened this issue Feb 19, 2021 · 29 comments

Comments

@Hi-Angel
Copy link
Contributor

Hi-Angel commented Feb 19, 2021

Upon running rr record … I get this message and a trace. This is similar to #1753 because I also get it under a VM (it is a VMWare, but AFAIK it's a very old one). However, perf record … works fine, and a dmesg | grep PMU reports events available. So, the research done in #1753 does not apply here, so I create a new report.

Steps to reproduce

$ rr record echo hey
[FATAL ../src/PerfCounters.cc:217:start_counter() errno: ENOENT] Unable to open performance counter with 'perf_event_open'; are perf events enabled? Try 'perf record'.
=== Start rr backtrace:
rr(_ZN2rr13dump_rr_stackEv+0x5d)[0x56204c63fcf4]
rr(_ZN2rr15notifying_abortEv+0x57)[0x56204c63fc92]
rr(_ZN2rr12FatalOstreamD1Ev+0x34)[0x56204c4997fc]
rr(+0x3c8fda)[0x56204c4c9fda]
rr(+0x3ca763)[0x56204c4cb763]
rr(+0x3ca9be)[0x56204c4cb9be]
rr(+0x3caf0d)[0x56204c4cbf0d]
rr(_ZN2rr12PerfCounters23default_ticks_semanticsEv+0x21)[0x56204c4cc0af]
rr(_ZN2rr7SessionC2Ev+0x107)[0x56204c5d4721]
rr(_ZN2rr13RecordSessionC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS6_SaIS6_EESD_RKNS_20DisableCPUIDFeaturesENS0_16SyscallBufferingEiNS_7BindCPUES8_PKNS_9TraceUuidEbb+0x65)[0x56204c4e5327]
rr(_ZN2rr13RecordSession6createERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EESB_RKNS_20DisableCPUIDFeaturesENS0_16SyscallBufferingEhNS_7BindCPUERKS7_PKNS_9TraceUuidEbbb+0xb40)[0x56204c4e4e10]
rr(+0x3d72be)[0x56204c4d82be]
rr(_ZN2rr13RecordCommand3runERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x3da)[0x56204c4d9078]
rr(main+0x21f)[0x56204c65bbeb]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7fa59572a0b3]
rr(_start+0x2e)[0x56204c3bf63e]
=== End rr backtrace
[1]    2531188 abort      rr record echo hey
$ perf record -g echo hey
hey
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data (4 samples) ]
$ dmesg | grep PMU
[    0.209229] Performance Events: IvyBridge events, core PMU driver.
[    0.211925] core: PMU erratum BJ122, BV98, HSD29 workaround disabled, HT off

Expected

Either rr record command wouldn't fail, or the perf record command would also fail, or dmesg | grep PMU would say software events only

Actual

rr record fails, but perf doesn't, so reason for the fail is unclear.

Versions

kernel 4.19.107, custom build
rr: 5.3.0 and a latest git from commit f7c3d17

@Hi-Angel
Copy link
Contributor Author

Got stacktrace with line numbers when this happens (as of commit f7c3d17):

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7949859 in __GI_abort () at abort.c:79
#2  0x0000555555a92c97 in rr::notifying_abort () at ../src/util.cc:1501
#3  0x00005555558ec7fc in rr::FatalOstream::~FatalOstream (this=0x7fffffffd9a0, __in_chrg=<optimized out>) at ../src/log.cc:360
#4  0x000055555591cfda in rr::start_counter (tid=0, group_fd=-1, attr=0x7fffffffdad0, disabled_txcp=0x0) at ../src/PerfCounters.cc:217
#5  0x000055555591e763 in rr::check_working_counters () at ../src/PerfCounters.cc:274
#6  0x000055555591e9be in rr::check_for_bugs (uarch=rr::IntelSandyBridge) at ../src/PerfCounters.cc:314
#7  0x000055555591ef0d in rr::init_attributes () at ../src/PerfCounters.cc:377
#8  0x000055555591f0af in rr::PerfCounters::default_ticks_semantics () at ../src/PerfCounters.cc:411
#9  0x0000555555a27721 in rr::Session::Session (this=0x555555c9f650) at ../src/Session.cc:50
#10 0x0000555555938327 in rr::RecordSession::RecordSession (this=0x555555c9f650, exe_path="/bin/echo", argv=std::vector of length 2, capacity 4 = {...}, envp=std::vector of length 48, capacity 64 = {...}, disable_cpuid_features=..., syscallbuf=rr::RecordSession::ENABLE_SYSCALL_BUF, syscallbuf_desched_sig=30, bind_cpu=rr::BIND_CPU, output_trace_dir="", trace_id=0x0, use_audit=false, unmap_vdso=false) at ../src/RecordSession.cc:2238
#11 0x0000555555937e10 in rr::RecordSession::create (argv=std::vector of length 2, capacity 4 = {...}, extra_env=std::vector of length 0, capacity 0, disable_cpuid_features=..., syscallbuf=rr::RecordSession::ENABLE_SYSCALL_BUF, syscallbuf_desched_sig=30 '\036', bind_cpu=rr::BIND_CPU, output_trace_dir="", trace_id=0x0, use_audit=false, unmap_vdso=false, force_asan_active=false) at ../src/RecordSession.cc:2204
#12 0x000055555592b2be in rr::record (args=std::vector of length 2, capacity 4 = {...}, flags=...) at ../src/RecordCommand.cc:632
#13 0x000055555592c078 in rr::RecordCommand::run (this=0x555555c863b0 <rr::RecordCommand::singleton>, args=std::vector of length 2, capacity 4 = {...}) at ../src/RecordCommand.cc:791
#14 0x0000555555aaebeb in main (argc=4, argv=0x7fffffffe388) at ../src/main.cc:249

@Hi-Angel
Copy link
Contributor Author

Hi-Angel commented Feb 20, 2021

So, after reading code and docs, it seems the error comes down to perf_event_attr::type == PERF_TYPE_HARDWARE not being valid in my VM. I'm not sure yet if it's "not being valid" on its own or in conjunction with perf_event_attr::config == PERF_COUNT_HW_CPU_CYCLES.

But I think the most interesting questions are: α) what does it mean the type not being valid, and β) is it possible to see it somewhere under /sys or /proc, so people could immediately know what's wrong?

And a more practical question: what happens if we replace PERF_TYPE_HARDWARE with PERF_TYPE_SOFTWARE, could it work? Might be a good workaround: print a warning that hw events are not supported so it will be slow, but don't bail out on it.

@Hi-Angel
Copy link
Contributor Author

Hi-Angel commented Feb 20, 2021

And a more practical question: what happens if we replace PERF_TYPE_HARDWARE with PERF_TYPE_SOFTWARE, could it work? Might be a good workaround: print a warning that hw events are not supported so it will be slow, but don't bail out on it.

To answer that one: preliminary answer it seems is "yes". I didn't try to modify the sources yet, but when am at this line with gdb right after fd returned -1 (thus I am about to get the stacktrace some lines below), if I replace attr->type with PERF_TYPE_SOFTWARE and call under the debugger syscall(__NR_perf_event_open, attr, tid, -1, group_fd, PERF_FLAG_FD_CLOEXEC) again, it returns a correct fd.

Fun fact: this manual does not mention PERF_COUNT_HW_CPU_CYCLES as supported in conjunction with PERF_TYPE_SOFTWARE. But may be it is outdated. Even, very outdated, because I run an old kernel as well, 4.19.107 is here.

@Hi-Angel
Copy link
Contributor Author

Hi-Angel commented Feb 20, 2021

And a more practical question: what happens if we replace PERF_TYPE_HARDWARE with PERF_TYPE_SOFTWARE, could it work? Might be a good workaround: print a warning that hw events are not supported so it will be slow, but don't bail out on it.

To answer that one: preliminary answer it seems is "yes". I didn't try to modify the sources yet, but when am at this line with gdb right after fd returned -1 (thus I am about to get the stacktrace some lines below), if I replace attr->type with PERF_TYPE_SOFTWARE and call under the debugger syscall(__NR_perf_event_open, attr, tid, -1, group_fd, PERF_FLAG_FD_CLOEXEC) again, it returns a correct fd.

Fun fact: this manual does not mention PERF_COUNT_HW_CPU_CYCLES as supported in conjunction with PERF_TYPE_SOFTWARE. But may be it is outdated. Even, very outdated, because I run an old kernel as well, 4.19.107 is here.

Oh, bad news: it turns out, the perf_event_attr::config is a tagged union of different enums, where "tag" is represented by the type field. So, by varying the tag between PERF_TYPE_HARDWARE and PERF_TYPE_SOFTWARE you get different, possibly non even intersecting, events. The enum for PERF_TYPE_SOFTWARE is called perf_sw_ids, and I haven't found there anything with infix CYCLES (because event we're interested in is called PERF_COUNT_HW_CPU_CYCLES), so probably switching between hw and sw here can't possibly work, at least not as of writing the words.

@Hi-Angel
Copy link
Contributor Author

But I think the most interesting questions are: α) what does it mean the type not being valid, and β) is it possible to see it somewhere under /sys or /proc, so people could immediately know what's wrong?

So, looking through kernel sources, PERF_COUNT_HW_CPU_CYCLES is defined per-CPU-arch way. In particular, at this line I see it has a description "cpu cycles". This gave me the idea what to look for at perf list; so answers for my questions are:

α: this means your CPU archictecture does not define the specific CPU capability.

β: yes, with perf list --details | grep cycles | grep cpu, for example:

 λ perf list --details | grep cycles | grep cpu
  cpu-cycles OR cycles                               [Hardware event]
  bus-cycles OR cpu/bus-cycles/                      [Kernel PMU event]
  cpu-cycles OR cpu/cpu-cycles/                      [Kernel PMU event]
  ref-cycles OR cpu/ref-cycles/                      [Kernel PMU event]
[…]

In my VM I don't see the cpy-cycles event, so that's probably the reason rr doesn't work.


All that said, there's still one question left: how does perf record work, is there perhaps a workaround?

@Hi-Angel
Copy link
Contributor Author

All that said, there's still one question left: how does perf record work, is there perhaps a workaround?

So, I think there is a workaround. If you run perf record with -v option, you'll see a message that it falls back to software counters, here's the code. It uses PERF_COUNT_SW_CPU_CLOCK event.

Will see if rr gonna work if I wire it up or will it stumble upon something else.

@Hi-Angel
Copy link
Contributor Author

Hi-Angel commented Feb 20, 2021

Will see if rr gonna work if I wire it up or will it stumble upon something else.

So, I did the changes, but I can't test it because I stumbled upon an irrelevant fail at this line. This fail belongs to some completely different event, represented by a ticks_attr variable. FWIW, the events variable (that is being checked) is 0 for me.

Hi-Angel added a commit to Hi-Angel/rr that referenced this issue Feb 20, 2021
Some machines, mainly VMs, may lack hardware support for cpu-cycles
counter. However, it is no reason not to support them: there may be a
similar software event available: PERF_COUNT_SW_CPU_CLOCK.

The idea is taken from the kernel-provided `perf` utility, function
`evsel__fallback` at evsel.c file.

Fixes: rr-debugger#2815,
       rr-debugger#1753

Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Hi-Angel added a commit to Hi-Angel/rr that referenced this issue Feb 20, 2021
Some machines, mainly VMs, may lack hardware support for cpu-cycles
counter. However, it is no reason not to support them: there may be a
similar software event available: PERF_COUNT_SW_CPU_CLOCK.

The idea is taken from the kernel-provided `perf` utility, function
`evsel__fallback` at evsel.c file.

Fixes: rr-debugger#2815,
       rr-debugger#1753

Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
@rocallahan
Copy link
Collaborator

So you're saying that in this VM setup, PERF_COUNT_HW_CPU_CYCLES doesn't work, but the hardware retired conditional branches counter does work? Because that seems very strange.

@Hi-Angel
Copy link
Contributor Author

So you're saying that in this VM setup, PERF_COUNT_HW_CPU_CYCLES doesn't work, but the hardware retired conditional branches counter does work? Because that seems very strange.

I don't know what is "hardware retired conditional branches". I just fixed one problem I had to research into, but as I noted there's something else left (which may be the "hardware retired conditional branches" that you mention, I don't know yet, will look into it next week). Either way, I figure the more cases rr is able to handle, the less it is possible some user won't be able to run it on some odd configuration :)

@rocallahan
Copy link
Collaborator

OK. I think there is no point in falling back from hardware cycle counting to software time counting if rr isn't going to work anyway. And I am pretty confident that, whenever hardware cycle counting doesn't work, rr isn't going to work anyway because the hardware ticks counter won't work.

You will need to get the hardware counters working under VMWare. This is possible usually.

@Hi-Angel
Copy link
Contributor Author

@rocallahan okay, let me put it that way: I fixed one problem. Now rr may stumble upon another — but this would be a different one. This implies that when someone like me will try to fix rr problems on a configuration similar to mine, they won't repeat steps I already did, because the fix I crafted up is already in place. Instead they will figure out a fix for the next problem.

@Hi-Angel
Copy link
Contributor Author

You will need to get the hardware counters working under VMWare. This is possible usually.

On a side note, I don't think this is under my control. It is a very old vsphere, and it is nearly impossible to convince people to update. I think problems on this VM comes down to the vsphere being very old.

@rocallahan
Copy link
Collaborator

If hardware RCB counting doesn't work there is no fix for that problem.

I think we should make the error message better here. We say

[FATAL ../src/PerfCounters.cc:217:start_counter() errno: ENOENT] Unable to open performance counter with 'perf_event_open'; are perf events enabled? Try 'perf record'.

We should say specifically "are hardware performance counters enabled?" and instead of saying "Try perf record", which is misleading since perf record can work without hardware counters, we should link to an rr wiki page that gives detailed instructions to check whether hardware counters are enabled by grepping dmesg, perf list, and running perf record and checking whether it's falling back to software.

@Hi-Angel
Copy link
Contributor Author

If hardware RCB counting doesn't work there is no fix for that problem.

So, I just sshed to the VM in question and executed perf list | grep retired | grep conditional. Here's what I get:

╰─$ perf list | grep retired | grep conditional
       [Speculative and retired macro-conditional branches]
       [Speculative and retired macro-unconditional branches excluding calls
       [Taken speculative and retired macro-conditional branches]
       [Taken speculative and retired macro-conditional branch instructions
  br_inst_retired.conditional                       
       [Speculative and retired mispredicted macro conditional branches]
       [Speculative and retired mispredicted macro conditional branches]
       [Not taken speculative and retired mispredicted macro conditional
       [Taken speculative and retired mispredicted macro conditional branches]
  br_misp_retired.conditional                       
       [Mispredicted conditional branch instructions retired (Precise event)]

So… Apparently RCB does work?

@Hi-Angel
Copy link
Contributor Author

I mean, I don't know what RCB is, but I assume that would be br_inst_retired.conditional, right? FWIW, on my laptop where rr works fine I get much less impressive list:

 λ perf list | grep retired | grep conditional                                    
  br_inst_retired.conditional                       
  br_misp_retired.conditional                       
       [Mispredicted conditional branch instructions retired (Precise event)]

@rocallahan
Copy link
Collaborator

Can you dump the entire output of perf list inside the VM somewhere?

@Hi-Angel
Copy link
Contributor Author

Can you dump the entire output of perf list inside the VM somewhere?

Sure, here:
125.perf-list.txt

@rocallahan
Copy link
Collaborator

What does perf stat -e br_inst_retired.conditional /bin/ls show inside the VM?

@Hi-Angel
Copy link
Contributor Author

Hi-Angel commented Feb 21, 2021

What does perf stat -e br_inst_retired.conditional /bin/ls show inside the VM?

╰─$ perf stat -e br_inst_retired.conditional /bin/ls        
125.perf-list.txt  perf.data  perf.data.old  systemd-private-46587d79a71d48f082dd73dbe2cad9cc-ntp.service-W5Xflg  systemd-private-46587d79a71d48f082dd73dbe2cad9cc-systemd-logind.service-h8vT2g

 Performance counter stats for '/bin/ls':

                 0      br_inst_retired.conditional                                   

       0.002635531 seconds time elapsed

       0.000000000 seconds user
       0.002803000 seconds sys

I see, the br_inst_retired.conditional is 0. So, this is a bug in our Virtual Machine, right? And it needs to be solved on our side because machine claims to support it when it really doesn't, right?

@Hi-Angel
Copy link
Contributor Author

Oh, while on it, if you don't mind my curiosity: how did you initially figured out that RCB doesn't work on this machine? Is there some connection between PERF_COUNT_HW_CPU_CYCLES and br_inst_retired.conditional?

@rocallahan
Copy link
Collaborator

I see, the br_inst_retired.conditional is 0. So, this is a bug in our Virtual Machine, right?

Yes, something's broken, but I don't know what exactly.

And it needs to be solved on our side because machine claims to support it when it really doesn't, right?

I guess so. Certainly rr won't work unless you can get that perf stat -e br_inst_retired.conditional to report real numbers.

Is there some connection between PERF_COUNT_HW_CPU_CYCLES and br_inst_retired.conditional?

PERF_COUNT_HW_CPU_CYCLES is usually the best-supported hardware counter. I've never seen a system where that didn't work but retired-conditional-branches did work.

@Hi-Angel
Copy link
Contributor Author

If hardware RCB counting doesn't work there is no fix for that problem.

I think we should make the error message better here. We say

[FATAL ../src/PerfCounters.cc:217:start_counter() errno: ENOENT] Unable to open performance counter with 'perf_event_open'; are perf events enabled? Try 'perf record'.

We should say specifically "are hardware performance counters enabled?" and instead of saying "Try perf record", which is misleading since perf record can work without hardware counters, we should link to an rr wiki page that gives detailed instructions to check whether hardware counters are enabled by grepping dmesg, perf list, and running perf record and checking whether it's falling back to software.

Such wiki page does not exist yet, right? Unless I'm overlooking…

@rocallahan
Copy link
Collaborator

Right.

@Hi-Angel
Copy link
Contributor Author

Hi-Angel commented Feb 24, 2021

Right.

Okay, so, I created the page, but there's a little problem: I also see rr accesses events hw_intr_cntr_event and llsc_cntr_event and I have no idea what are these and how they're named in perf list. There's also minus_ticks_cntr_event which I'm not sure whether mandatory or not because it's inside an if-condition. (The PERF_COUNT_HW_CPU_CYCLES I don't count because it can be emulated in software).

@rocallahan
Copy link
Collaborator

(The PERF_COUNT_HW_CPU_CYCLES I don't count because it can be emulated in software).

This isn't really true but it doesn't matter.

The wiki page looks fine. The other events are actually not needed normally.

@rocallahan
Copy link
Collaborator

This is fixed, thanks for your help.

@GitMensch
Copy link
Contributor

Similar issue on a VM not under control here - is it possible to apply the same nice message we now have for rr during cmake? Building rr takes quite a while (and a bunch of dependencies ;-) so it feels a bit bad when you finally have build and installed it just to see the message then...
(the new message and wiki is definitely useful!)

Related question: would it be useful to reorder https://github.com/rr-debugger/rr/wiki/Building-And-Installing and have "Hardware/software configuration" first, then dependencies and build, then tests, then troubleshooting?

@rocallahan
Copy link
Collaborator

That would complicate the build because we'd have to build a test program first and do whatever it depends on. I'd rather not add that for something that shouldn't happen too often.

@rocallahan
Copy link
Collaborator

would it be useful to reorder https://github.com/rr-debugger/rr/wiki/Building-And-Installing and have "Hardware/software configuration" first, then dependencies and build, then tests, then troubleshooting?

I'm not sure about that either TBH. For people who are already confident they can run rr they have to scan down to find the build instructions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants