emulate_async_signal can be slow if the asynchronous signal to be delivered happens in a tight loop. (See #3650 for an example).
Instead of using a software breakpoint and evaluating conditions from the rr supervisor, we could use a hardware breakpoint, attach a bpf program that performs a subset of the necessary condition evaluation to rapidly reject as many iterations of the breakpoint as possible (e.g. by comparing the general purpose registers, which are available to bpf programs attached to perf events), and perform final validation from the rr supervisor. In testing this reduces the overhead on a pathological trace provided by a customer by 94%.
Kernel patches are up at https://lkml.org/lkml/2023/12/4/1384
The rr side stuff is a pile of hacks for now but I'll clean it up once the kernel patches are accepted.
emulate_async_signalcan be slow if the asynchronous signal to be delivered happens in a tight loop. (See #3650 for an example).Instead of using a software breakpoint and evaluating conditions from the rr supervisor, we could use a hardware breakpoint, attach a bpf program that performs a subset of the necessary condition evaluation to rapidly reject as many iterations of the breakpoint as possible (e.g. by comparing the general purpose registers, which are available to bpf programs attached to perf events), and perform final validation from the rr supervisor. In testing this reduces the overhead on a pathological trace provided by a customer by 94%.
Kernel patches are up at https://lkml.org/lkml/2023/12/4/1384
The rr side stuff is a pile of hacks for now but I'll clean it up once the kernel patches are accepted.