Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcc: Use bpf_probe_read_{kernel|user} functions if they are available #4109

Closed
wants to merge 1 commit into from

Conversation

euspectre
Copy link

@euspectre euspectre commented Jul 15, 2022

As suggested in https://lore.kernel.org/bpf/YslAxaryvm%2FMfGbq@ofant/ and
in other messages of that mailing list thread, BCC tools should use the
newer bpf_probe_read_{kernel|user} functions whenever possible, rather than
bpf_probe_read().

The main reason is that bpf_probe_read() is unreliable in the systems where
kernel and user address spaces can overlap, e.g. on s390x. See commit
6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers")
in the mainline kernel for more info.

Let us just use bpf_probe_read_kernel() and bpf_probe_read_user(), if they
are available in the kernel, and only try bpf_probe_read() as a fallback.

This fixes the following problem among other things. On RISC-V, execsnoop
tool failed to run because the BPF verifier rejected the relevant program:

root@riscv64-test: # /usr/share/bcc/tools/execsnoop

bpf: Failed to load program: Invalid argument
0: (bf) r6 = r1
1: (79) r8 = *(u64 *)(r6 +88)
[...]
56: (85) call bpf_probe_read#4
unknown func bpf_probe_read#4

processed 57 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1

Traceback (most recent call last):
File "/usr/share/bcc/tools/execsnoop", line 229, in
b.attach_kprobe(event=execve_fnname, fn_name="syscall__execve")
File "/usr/lib/python3/dist-packages/bcc/init.py", line 837, in attach_kprobe
fn = self.load_func(fn_name, BPF.KPROBE)
File "/usr/lib/python3/dist-packages/bcc/init.py", line 522, in load_func
raise Exception("Failed to load BPF program %s: %s" %
Exception: Failed to load BPF program b'syscall__execve': Invalid argument

This was because BCC incorrectly used a call to bpf_probe_read() in the BPF
program, while bpf_probe_read_kernel() should have been used.

@euspectre
Copy link
Author

@dlan17 Please take a look.

At least, execsnoop runs and seems to work OK now.

Tested on the following platforms:

  • x86_64, Ubuntu 22.04, 5.15.0-41-generic (both bpf_probe_read_kernel() and bpf_probe_read() are available)
  • x86_64, Ubuntu 20.04, 5.4.0-122-generic (only bpf_probe_read() is available)
  • riscv64, Ubuntu 22.04, 5.15.0-1014-generic (only bpf_probe_read_kernel() is available)

@yonghong-song
Copy link
Collaborator

@davemarchevsky could you take a look?

@xmzzz
Copy link
Contributor

xmzzz commented Jul 16, 2022

Hi, @euspectre . I'm glad to see your patch.

I've also been following the bpf_probe_read_{kernel|user} issue recently, and we really should use the new interface whenever possible.

However, I noticed that in the PR history, there have been attempts to use bpf_probe_read_kernel() for implicitly specified kernel memory read. But this is not totally correct. For details, please refer to #2986 .
That's why it reverted to bpf_probe_read() and is treated differently for s390 (s390 has overlap user/kernel addresses and bpf_probe_read() is not available any more.) For details, please refer to #3014 .

So, I'm not sure if this PR will break existing tools, especially those that access user memory in kernel data structures.
Wouldn't it be better to solve this problem in a smoother way? If there is something wrong with what I said, thank you for correcting it.

@euspectre
Copy link
Author

However, I noticed that in the PR history, there have been attempts to use bpf_probe_read_kernel() for implicitly specified kernel memory read. But this is not totally correct. For details, please refer to #2986 .
That's why it reverted to bpf_probe_read() and is treated differently for s390 (s390 has overlap user/kernel addresses and bpf_probe_read() is not available any more.) For details, please refer to #3014 .

So, I'm not sure if this PR will break existing tools, especially those that access user memory in kernel data structures.
Wouldn't it be better to solve this problem in a smoother way?

So, if I understand it correctly, the problem was that bcc used bpf_probe_read_kernel() to access user-space data via pointers in the kernel data structures. The data accesses were implicit, that is, the authors of the BPF program did not have to specify bpf_probe_read_* themselves. The code rewriter did. So, PR #2986 switched the rewriter to bpf_probe_read(), which seemed a more universal way to access user and kernel memory.

However, bpf_probe_read() can be unreliable on s390x due to overlapping address spaces. To work around that, PR #3014 switched the rewriter to using bpf_probe_read_kernel() on s390x. The BPF programs using such implicit accesses became incompatible with pre-5.5 kernels on that arch, but that was OK, because bpf_probe_read() was unreliable anyway.

And your concern is that switching BCC to bpf_probe_read_kernel() on all arches could break some existing tools which rely on such implicit accesses to the user memory.

Right?

@euspectre
Copy link
Author

In my opinion, implicit accesses to the user memory could be error-prone. Explicit usage of bpf_probe_read_user() would be much clearer.

@yonghong-song : You wrote in #3009 (comment):

Once user space applications are all deprecating bpf_probe_read, we can move to bpf_probe_read_kernel for implicitly generated probe read again.

2 years have passed since then - is it now safe to use bpf_probe_read_kernel in the implicitly generated reads, what do you think?

As suggested in https://lore.kernel.org/bpf/YslAxaryvm%2FMfGbq@ofant/ and
in other messages of that mailing list thread, BCC tools should use the
newer bpf_probe_read_{kernel|user} functions whenever possible, rather than
bpf_probe_read().

The main reason is that bpf_probe_read() is unreliable in the systems where
kernel and user address spaces can overlap, e.g. on s390x. See commit
6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers")
in the mainline kernel for more info.

Let us just use bpf_probe_read_kernel() and bpf_probe_read_user(), if they
are available in the kernel, and only try bpf_probe_read() as a fallback.

This fixes the following problem among other things. On RISC-V, execsnoop
tool failed to run because the BPF verifier rejected the relevant program:

  root@riscv64-test: # /usr/share/bcc/tools/execsnoop

  bpf: Failed to load program: Invalid argument
  0: (bf) r6 = r1
  1: (79) r8 = *(u64 *)(r6 +88)
  [...]
  56: (85) call bpf_probe_read#4
  unknown func bpf_probe_read#4

  processed 57 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1

  Traceback (most recent call last):
    File "/usr/share/bcc/tools/execsnoop", line 229, in <module>
      b.attach_kprobe(event=execve_fnname, fn_name="syscall__execve")
    File "/usr/lib/python3/dist-packages/bcc/__init__.py", line 837, in attach_kprobe
      fn = self.load_func(fn_name, BPF.KPROBE)
    File "/usr/lib/python3/dist-packages/bcc/__init__.py", line 522, in load_func
      raise Exception("Failed to load BPF program %s: %s" %
  Exception: Failed to load BPF program b'syscall__execve': Invalid argument

This was because BCC incorrectly used a call to bpf_probe_read() in the BPF
program, while bpf_probe_read_kernel() should have been used.

Signed-off-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
@xmzzz
Copy link
Contributor

xmzzz commented Jul 19, 2022

In my opinion, implicit accesses to the user memory could be error-prone. Explicit usage of bpf_probe_read_user() would be much clearer.

@yonghong-song : You wrote in #3009 (comment):

Once user space applications are all deprecating bpf_probe_read, we can move to bpf_probe_read_kernel for implicitly generated probe read again.

2 years have passed since then - is it now safe to use bpf_probe_read_kernel in the implicitly generated reads, what do you think?

I'm not sure I understand it correctly, and I'm equally concerned if this issue still exists.

BTW, I'm trying a conservative solution, which is to solve this problem on riscv64 first, so that it won't break existing tools running on other architectures. Not sure if this is a good idea. See #4118 for details.

Thanks. ^0^

@euspectre
Copy link
Author

BTW, I'm trying a conservative solution, which is to solve this problem on riscv64 first, so that it won't break existing tools running on other architectures. Not sure if this is a good idea. See #4118 for details.

This should work too. It is similar to what I suggested in #4085 (comment).

However, the way bpf_probe_read* is used in bcc is far from self-explanatory. I'd like to make it cleaner, hence other changes in my variant of the fix.

I'd suggest to wait for @yonghong-song.

@xmzzz
Copy link
Contributor

xmzzz commented Jul 19, 2022

This should work too. It is similar to what I suggested in #4085 (comment).

However, the way bpf_probe_read* is used in bcc is far from self-explanatory. I'd like to make it cleaner, hence other changes in my variant of the fix.

I'd suggest to wait for @yonghong-song.

I'm sorry about that, I overlooked that this was originated from your suggestion.

We will wait for @yonghong-song suggestion so that we can find the best solution. ^0^/

@davemarchevsky
Copy link
Collaborator

@yonghong-song and I discussed this today.

Ideally bcc would use bpf_probe_read_{user,kernel} for all architectures so that complexity of its frontend rewriting is reduced. Per #2986 this isn't possible without breaking some existing programs. In the long term we should probably make such a breaking change regardless, but until we do so, limiting bpf_probe_read_{user,kernel} rewriting to architectures w/ overlapping address spaces seems reasonable.

In the long term we can also consider using btf decl tag feature to help the rewriter determine whether it should replace implicit reads with kernel or user variant.

The BPF programs using such implicit accesses became incompatible with pre-5.5 kernels on that arch, but that was OK, because bpf_probe_read() was unreliable anyway.

@euspectre, is my understanding correct here: on s390x and riscv archs, even if bpf_probe_read exists, programs that use it are likely to read garbage data / data from wrong address space?

If so, special-casing rewriting seems like a reasonable path forward.

I'd like to add a test with a minimal repro of the case that caused us to ship #2986. Will do it in the next day or two.

@euspectre
Copy link
Author

euspectre commented Jul 20, 2022

is my understanding correct here: on s390x and riscv archs, even if bpf_probe_read exists, programs that use it are likely to read garbage data / data from wrong address space?

Not quite. It seems, only s390x has such problem. On that arch, yes, bpf_probe_read might read garbage data.

More info: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6ae08ae3dea2

In RISC-V systems, the address spaces do not overlap, if I am not mistaken. However, pre-5.5 kernels do not provide bpf_probe_read function on RISC-V, so it cannot be used as a fallback anyway.

As the issues described in #2986 are still valid, then yes, we should take a less invasive approach (#4118) to avoid breaking things.

@euspectre euspectre closed this Jul 20, 2022
@euspectre
Copy link
Author

euspectre commented Jul 20, 2022

@davemarchevsky

In the long term we can also consider using btf decl tag feature to help the rewriter determine whether it should replace implicit reads with kernel or user variant.

Yes, this would be more reliable. As far as I can see in the kernel, __user specifier is already defined as either BTF_TYPE_TAG(user) or __attribute__((user)), depending on the options, so it should be possible to detect such pointers.

Besides, the newer kernel versions might drop bpf_probe_read in favour of bpf_probe_read_{kernel,user} even on the arches where it is still present like x86. The tools relying on bpf_probe_read would then break anyway.

BTW, does BCC require BTF support in the kernel?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants