New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bcc: Use bpf_probe_read_{kernel|user} functions if they are available #4109
Conversation
@dlan17 Please take a look. At least, execsnoop runs and seems to work OK now. Tested on the following platforms:
|
@davemarchevsky could you take a look? |
Hi, @euspectre . I'm glad to see your patch. I've also been following the bpf_probe_read_{kernel|user} issue recently, and we really should use the new interface whenever possible. However, I noticed that in the PR history, there have been attempts to use bpf_probe_read_kernel() for implicitly specified kernel memory read. But this is not totally correct. For details, please refer to #2986 . So, I'm not sure if this PR will break existing tools, especially those that access user memory in kernel data structures. |
So, if I understand it correctly, the problem was that bcc used However, And your concern is that switching BCC to Right? |
In my opinion, implicit accesses to the user memory could be error-prone. Explicit usage of bpf_probe_read_user() would be much clearer. @yonghong-song : You wrote in #3009 (comment):
2 years have passed since then - is it now safe to use |
bad50be
to
2ee5357
Compare
As suggested in https://lore.kernel.org/bpf/YslAxaryvm%2FMfGbq@ofant/ and in other messages of that mailing list thread, BCC tools should use the newer bpf_probe_read_{kernel|user} functions whenever possible, rather than bpf_probe_read(). The main reason is that bpf_probe_read() is unreliable in the systems where kernel and user address spaces can overlap, e.g. on s390x. See commit 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers") in the mainline kernel for more info. Let us just use bpf_probe_read_kernel() and bpf_probe_read_user(), if they are available in the kernel, and only try bpf_probe_read() as a fallback. This fixes the following problem among other things. On RISC-V, execsnoop tool failed to run because the BPF verifier rejected the relevant program: root@riscv64-test: # /usr/share/bcc/tools/execsnoop bpf: Failed to load program: Invalid argument 0: (bf) r6 = r1 1: (79) r8 = *(u64 *)(r6 +88) [...] 56: (85) call bpf_probe_read#4 unknown func bpf_probe_read#4 processed 57 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1 Traceback (most recent call last): File "/usr/share/bcc/tools/execsnoop", line 229, in <module> b.attach_kprobe(event=execve_fnname, fn_name="syscall__execve") File "/usr/lib/python3/dist-packages/bcc/__init__.py", line 837, in attach_kprobe fn = self.load_func(fn_name, BPF.KPROBE) File "/usr/lib/python3/dist-packages/bcc/__init__.py", line 522, in load_func raise Exception("Failed to load BPF program %s: %s" % Exception: Failed to load BPF program b'syscall__execve': Invalid argument This was because BCC incorrectly used a call to bpf_probe_read() in the BPF program, while bpf_probe_read_kernel() should have been used. Signed-off-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
2ee5357
to
4278b6a
Compare
I'm not sure I understand it correctly, and I'm equally concerned if this issue still exists. BTW, I'm trying a conservative solution, which is to solve this problem on riscv64 first, so that it won't break existing tools running on other architectures. Not sure if this is a good idea. See #4118 for details. Thanks. ^0^ |
This should work too. It is similar to what I suggested in #4085 (comment). However, the way I'd suggest to wait for @yonghong-song. |
I'm sorry about that, I overlooked that this was originated from your suggestion. We will wait for @yonghong-song suggestion so that we can find the best solution. ^0^/ |
@yonghong-song and I discussed this today. Ideally In the long term we can also consider using btf decl tag feature to help the rewriter determine whether it should replace implicit reads with
@euspectre, is my understanding correct here: on If so, special-casing rewriting seems like a reasonable path forward. I'd like to add a test with a minimal repro of the case that caused us to ship #2986. Will do it in the next day or two. |
Not quite. It seems, only s390x has such problem. On that arch, yes, More info: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6ae08ae3dea2 In RISC-V systems, the address spaces do not overlap, if I am not mistaken. However, pre-5.5 kernels do not provide As the issues described in #2986 are still valid, then yes, we should take a less invasive approach (#4118) to avoid breaking things. |
Yes, this would be more reliable. As far as I can see in the kernel, Besides, the newer kernel versions might drop BTW, does BCC require BTF support in the kernel? |
As suggested in https://lore.kernel.org/bpf/YslAxaryvm%2FMfGbq@ofant/ and
in other messages of that mailing list thread, BCC tools should use the
newer bpf_probe_read_{kernel|user} functions whenever possible, rather than
bpf_probe_read().
The main reason is that bpf_probe_read() is unreliable in the systems where
kernel and user address spaces can overlap, e.g. on s390x. See commit
6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers")
in the mainline kernel for more info.
Let us just use bpf_probe_read_kernel() and bpf_probe_read_user(), if they
are available in the kernel, and only try bpf_probe_read() as a fallback.
This fixes the following problem among other things. On RISC-V, execsnoop
tool failed to run because the BPF verifier rejected the relevant program:
This was because BCC incorrectly used a call to bpf_probe_read() in the BPF
program, while bpf_probe_read_kernel() should have been used.