bcc: Use bpf_probe_read_{kernel|user} functions if they are available #4109

euspectre · 2022-07-15T17:42:39Z

As suggested in https://lore.kernel.org/bpf/YslAxaryvm%2FMfGbq@ofant/ and
in other messages of that mailing list thread, BCC tools should use the
newer bpf_probe_read_{kernel|user} functions whenever possible, rather than
bpf_probe_read().

The main reason is that bpf_probe_read() is unreliable in the systems where
kernel and user address spaces can overlap, e.g. on s390x. See commit
6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers")
in the mainline kernel for more info.

Let us just use bpf_probe_read_kernel() and bpf_probe_read_user(), if they
are available in the kernel, and only try bpf_probe_read() as a fallback.

This fixes the following problem among other things. On RISC-V, execsnoop
tool failed to run because the BPF verifier rejected the relevant program:

root@riscv64-test: # /usr/share/bcc/tools/execsnoop

bpf: Failed to load program: Invalid argument
0: (bf) r6 = r1
1: (79) r8 = *(u64 *)(r6 +88)
[...]
56: (85) call bpf_probe_read#4
unknown func bpf_probe_read#4

processed 57 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1

Traceback (most recent call last):
File "/usr/share/bcc/tools/execsnoop", line 229, in
b.attach_kprobe(event=execve_fnname, fn_name="syscall__execve")
File "/usr/lib/python3/dist-packages/bcc/init.py", line 837, in attach_kprobe
fn = self.load_func(fn_name, BPF.KPROBE)
File "/usr/lib/python3/dist-packages/bcc/init.py", line 522, in load_func
raise Exception("Failed to load BPF program %s: %s" %
Exception: Failed to load BPF program b'syscall__execve': Invalid argument

This was because BCC incorrectly used a call to bpf_probe_read() in the BPF
program, while bpf_probe_read_kernel() should have been used.

euspectre · 2022-07-15T17:49:28Z

@dlan17 Please take a look.

At least, execsnoop runs and seems to work OK now.

Tested on the following platforms:

x86_64, Ubuntu 22.04, 5.15.0-41-generic (both bpf_probe_read_kernel() and bpf_probe_read() are available)
x86_64, Ubuntu 20.04, 5.4.0-122-generic (only bpf_probe_read() is available)
riscv64, Ubuntu 22.04, 5.15.0-1014-generic (only bpf_probe_read_kernel() is available)

yonghong-song · 2022-07-15T19:32:13Z

@davemarchevsky could you take a look?

xmzzz · 2022-07-16T02:21:42Z

Hi, @euspectre . I'm glad to see your patch.

I've also been following the bpf_probe_read_{kernel|user} issue recently, and we really should use the new interface whenever possible.

However, I noticed that in the PR history, there have been attempts to use bpf_probe_read_kernel() for implicitly specified kernel memory read. But this is not totally correct. For details, please refer to #2986 .
That's why it reverted to bpf_probe_read() and is treated differently for s390 (s390 has overlap user/kernel addresses and bpf_probe_read() is not available any more.) For details, please refer to #3014 .

So, I'm not sure if this PR will break existing tools, especially those that access user memory in kernel data structures.
Wouldn't it be better to solve this problem in a smoother way? If there is something wrong with what I said, thank you for correcting it.

src/cc/frontends/clang/b_frontend_action.cc

euspectre · 2022-07-18T09:09:42Z

However, I noticed that in the PR history, there have been attempts to use bpf_probe_read_kernel() for implicitly specified kernel memory read. But this is not totally correct. For details, please refer to #2986 .
That's why it reverted to bpf_probe_read() and is treated differently for s390 (s390 has overlap user/kernel addresses and bpf_probe_read() is not available any more.) For details, please refer to #3014 .

So, I'm not sure if this PR will break existing tools, especially those that access user memory in kernel data structures.
Wouldn't it be better to solve this problem in a smoother way?

So, if I understand it correctly, the problem was that bcc used bpf_probe_read_kernel() to access user-space data via pointers in the kernel data structures. The data accesses were implicit, that is, the authors of the BPF program did not have to specify bpf_probe_read_* themselves. The code rewriter did. So, PR #2986 switched the rewriter to bpf_probe_read(), which seemed a more universal way to access user and kernel memory.

However, bpf_probe_read() can be unreliable on s390x due to overlapping address spaces. To work around that, PR #3014 switched the rewriter to using bpf_probe_read_kernel() on s390x. The BPF programs using such implicit accesses became incompatible with pre-5.5 kernels on that arch, but that was OK, because bpf_probe_read() was unreliable anyway.

And your concern is that switching BCC to bpf_probe_read_kernel() on all arches could break some existing tools which rely on such implicit accesses to the user memory.

Right?

euspectre · 2022-07-18T09:20:21Z

In my opinion, implicit accesses to the user memory could be error-prone. Explicit usage of bpf_probe_read_user() would be much clearer.

@yonghong-song : You wrote in #3009 (comment):

Once user space applications are all deprecating bpf_probe_read, we can move to bpf_probe_read_kernel for implicitly generated probe read again.

2 years have passed since then - is it now safe to use bpf_probe_read_kernel in the implicitly generated reads, what do you think?

As suggested in https://lore.kernel.org/bpf/YslAxaryvm%2FMfGbq@ofant/ and in other messages of that mailing list thread, BCC tools should use the newer bpf_probe_read_{kernel|user} functions whenever possible, rather than bpf_probe_read(). The main reason is that bpf_probe_read() is unreliable in the systems where kernel and user address spaces can overlap, e.g. on s390x. See commit 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers") in the mainline kernel for more info. Let us just use bpf_probe_read_kernel() and bpf_probe_read_user(), if they are available in the kernel, and only try bpf_probe_read() as a fallback. This fixes the following problem among other things. On RISC-V, execsnoop tool failed to run because the BPF verifier rejected the relevant program: root@riscv64-test: # /usr/share/bcc/tools/execsnoop bpf: Failed to load program: Invalid argument 0: (bf) r6 = r1 1: (79) r8 = *(u64 *)(r6 +88) [...] 56: (85) call bpf_probe_read#4 unknown func bpf_probe_read#4 processed 57 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1 Traceback (most recent call last): File "/usr/share/bcc/tools/execsnoop", line 229, in <module> b.attach_kprobe(event=execve_fnname, fn_name="syscall__execve") File "/usr/lib/python3/dist-packages/bcc/__init__.py", line 837, in attach_kprobe fn = self.load_func(fn_name, BPF.KPROBE) File "/usr/lib/python3/dist-packages/bcc/__init__.py", line 522, in load_func raise Exception("Failed to load BPF program %s: %s" % Exception: Failed to load BPF program b'syscall__execve': Invalid argument This was because BCC incorrectly used a call to bpf_probe_read() in the BPF program, while bpf_probe_read_kernel() should have been used. Signed-off-by: Evgenii Shatokhin <e.shatokhin@yadro.com>

xmzzz · 2022-07-19T03:57:55Z

In my opinion, implicit accesses to the user memory could be error-prone. Explicit usage of bpf_probe_read_user() would be much clearer.

@yonghong-song : You wrote in #3009 (comment):

Once user space applications are all deprecating bpf_probe_read, we can move to bpf_probe_read_kernel for implicitly generated probe read again.

2 years have passed since then - is it now safe to use bpf_probe_read_kernel in the implicitly generated reads, what do you think?

I'm not sure I understand it correctly, and I'm equally concerned if this issue still exists.

BTW, I'm trying a conservative solution, which is to solve this problem on riscv64 first, so that it won't break existing tools running on other architectures. Not sure if this is a good idea. See #4118 for details.

Thanks. ^0^

euspectre · 2022-07-19T08:05:52Z

BTW, I'm trying a conservative solution, which is to solve this problem on riscv64 first, so that it won't break existing tools running on other architectures. Not sure if this is a good idea. See #4118 for details.

This should work too. It is similar to what I suggested in #4085 (comment).

However, the way bpf_probe_read* is used in bcc is far from self-explanatory. I'd like to make it cleaner, hence other changes in my variant of the fix.

I'd suggest to wait for @yonghong-song.

xmzzz · 2022-07-19T10:05:30Z

This should work too. It is similar to what I suggested in #4085 (comment).

However, the way bpf_probe_read* is used in bcc is far from self-explanatory. I'd like to make it cleaner, hence other changes in my variant of the fix.

I'd suggest to wait for @yonghong-song.

I'm sorry about that, I overlooked that this was originated from your suggestion.

We will wait for @yonghong-song suggestion so that we can find the best solution. ^0^/

davemarchevsky · 2022-07-19T22:31:20Z

@yonghong-song and I discussed this today.

Ideally bcc would use bpf_probe_read_{user,kernel} for all architectures so that complexity of its frontend rewriting is reduced. Per #2986 this isn't possible without breaking some existing programs. In the long term we should probably make such a breaking change regardless, but until we do so, limiting bpf_probe_read_{user,kernel} rewriting to architectures w/ overlapping address spaces seems reasonable.

In the long term we can also consider using btf decl tag feature to help the rewriter determine whether it should replace implicit reads with kernel or user variant.

The BPF programs using such implicit accesses became incompatible with pre-5.5 kernels on that arch, but that was OK, because bpf_probe_read() was unreliable anyway.

@euspectre, is my understanding correct here: on s390x and riscv archs, even if bpf_probe_read exists, programs that use it are likely to read garbage data / data from wrong address space?

If so, special-casing rewriting seems like a reasonable path forward.

I'd like to add a test with a minimal repro of the case that caused us to ship #2986. Will do it in the next day or two.

euspectre · 2022-07-20T06:11:53Z

is my understanding correct here: on s390x and riscv archs, even if bpf_probe_read exists, programs that use it are likely to read garbage data / data from wrong address space?

Not quite. It seems, only s390x has such problem. On that arch, yes, bpf_probe_read might read garbage data.

More info: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6ae08ae3dea2

In RISC-V systems, the address spaces do not overlap, if I am not mistaken. However, pre-5.5 kernels do not provide bpf_probe_read function on RISC-V, so it cannot be used as a fallback anyway.

As the issues described in #2986 are still valid, then yes, we should take a less invasive approach (#4118) to avoid breaking things.

euspectre · 2022-07-20T06:38:55Z

@davemarchevsky

In the long term we can also consider using btf decl tag feature to help the rewriter determine whether it should replace implicit reads with kernel or user variant.

Yes, this would be more reliable. As far as I can see in the kernel, __user specifier is already defined as either BTF_TYPE_TAG(user) or __attribute__((user)), depending on the options, so it should be possible to detect such pointers.

Besides, the newer kernel versions might drop bpf_probe_read in favour of bpf_probe_read_{kernel,user} even on the arches where it is still present like x86. The tools relying on bpf_probe_read would then break anyway.

BTW, does BCC require BTF support in the kernel?

euspectre requested review from drzaeus77, yonghong-song, 4ast and davemarchevsky as code owners July 15, 2022 17:42

euspectre mentioned this pull request Jul 15, 2022

riscv: initial ebpf support #4085

Merged

dlan17 reviewed Jul 16, 2022

View reviewed changes

src/cc/frontends/clang/b_frontend_action.cc Outdated Show resolved Hide resolved

src/cc/frontends/clang/b_frontend_action.cc Show resolved Hide resolved

euspectre force-pushed the fix-bpf-prog-read branch from bad50be to 2ee5357 Compare July 18, 2022 20:35

euspectre force-pushed the fix-bpf-prog-read branch from 2ee5357 to 4278b6a Compare July 18, 2022 20:41

euspectre mentioned this pull request Jul 19, 2022

riscv: using bpf_probe_read_{kernel|user} on riscv64 #4118

Merged

euspectre closed this Jul 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bcc: Use bpf_probe_read_{kernel|user} functions if they are available #4109

bcc: Use bpf_probe_read_{kernel|user} functions if they are available #4109

euspectre commented Jul 15, 2022 •

edited

euspectre commented Jul 15, 2022

yonghong-song commented Jul 15, 2022

xmzzz commented Jul 16, 2022

euspectre commented Jul 18, 2022

euspectre commented Jul 18, 2022

xmzzz commented Jul 19, 2022

euspectre commented Jul 19, 2022

xmzzz commented Jul 19, 2022

davemarchevsky commented Jul 19, 2022

euspectre commented Jul 20, 2022 •

edited

euspectre commented Jul 20, 2022 •

edited

bcc: Use bpf_probe_read_{kernel|user} functions if they are available #4109

bcc: Use bpf_probe_read_{kernel|user} functions if they are available #4109

Conversation

euspectre commented Jul 15, 2022 • edited

euspectre commented Jul 15, 2022

yonghong-song commented Jul 15, 2022

xmzzz commented Jul 16, 2022

euspectre commented Jul 18, 2022

euspectre commented Jul 18, 2022

xmzzz commented Jul 19, 2022

euspectre commented Jul 19, 2022

xmzzz commented Jul 19, 2022

davemarchevsky commented Jul 19, 2022

euspectre commented Jul 20, 2022 • edited

euspectre commented Jul 20, 2022 • edited

euspectre commented Jul 15, 2022 •

edited

euspectre commented Jul 20, 2022 •

edited

euspectre commented Jul 20, 2022 •

edited