New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kretprobes are mysteriously missed #2825
Comments
Which kernel are you using? The maxactive may not support in earlier kernels. You can try with /sys/kernel/debug/tracing/kprobe_events to see whether it really supports or not. For your second question and link, you refer to internal kernel implementation about "have code of your own at the entry". bpf k[ret]probe support should guarantee the kretprobe execution if
Maybe you can share your whole program here so people can help reproduce and investigate. |
Hi @yonghong-song, thanks for your reply. I first added the maxactive parameter in the hope that it would help. The kernel version is (from uname -a) But when I found the data from But what you say about "bpf program running not stack on top of another bpf program" sounds possibly interesting. Looking at Is there an easy way to clean up those probes? As about showing the program, I'll ask around. Basically it is an extended version of |
I am attaching my program: Like I said, it is based on Here is a graph that shows how the exported value |
I cleaned up all extra reported kprobes, then I observed it over the weekend. The problem seemed at first not to occur again, but then it happened again on one host this morning, a difference of 70 invocations between the kprobe and the corresponding kretprobe:
I used this little helper script to clean up:
|
The following is a hypothesis by looking at linux/arch/x86/kernel/kprobes/core.c file.
For kretprobe,
There are a few more conditions may cause kretprobe skipped. These conditions might be triggerred in your case. Unfortunately, to really prove this, we need to build a custom kernel. |
@yonghong-song That is an interesting observation. Too bad that I also can't see a way to experiment with this easily. Of course, the problem doesn't happen reliably so this would be very difficult to reproduce in a test environment. |
I have a BCC / python script that aims to measure the time that is taken by FUSE I/O requests. For this purpose it sets a kprobe and a kretprobe on (among others, but this the one I have trouble with),
fuse_file_write_iter
. I set the kretprobe before the kprobe:However is seems that the kretprobe is called significantly less often than the kprobe, which causes my code (which counts enters and exits) to think that some I/O requests are ongoing forever.
I see this from these counts in
/sys/kernel/debug/tracing/kprobe_profile
:As you can see, the number of caught returns is 338 lower than the number of enters, but
nmissed
is0
. So the fact that events are lost is lost too. Something is very unreliable here. What?As a second question: I understand from reading about the implementation of kretprobes that a kretprobe is actually a special kprobe which causes the return from the function to trigger some code as well. You can even have code of your own at the entry, as I understand from
https://www.kernel.org/doc/Documentation/kprobes.txt
(section retprobe entry-handler). But it seems that this facility is not available from python-bcc. It would be exellently suited to ensure that entries and exits from a function balance properly. By splitting that up in 2 probes, the guarantee is lost, and it is no doubt less efficient.The text was updated successfully, but these errors were encountered: