New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] prepare for bpf_probe_read() split into user/kernel #614
Comments
If we're in a kprobe/kernel tracepoint will all pointers be to kernel memory, and vice versa for uprobes/usdts? |
@ajor Ah right, this was suggested as one possible solution. I'll check all my tools to see how feasible that is. Following that approach, we could leave these with their current syntax and the following meanings, with their behavior changing when bpf_probe_read() is split:
And add these optional qualified versions (going with the *read() version here):
Edit: this is option B. |
What about if you kprobed
And tried to read from |
Right, such functions are the exception, and now that I think of it, all the syscalls tracepoints that use buffers will have the same problem. Eg:
With options A or B you'd need to use uread() to handle this. If this happens for all syscall tracepoints, then I wonder if they can default to user-memory accesses. Would complicate documentation: tracepoints are kernel context unless they are syscall tracepoints. |
I suspect being explicit would be the better long term option for maintainability. Special casing a bunch of things would probably lead to a bunch of tricky gotcha moments. Especially b/c it's hard to tell sometimes if you're reading junk memory or actual data. How likely is the |
Yeah, I don't think we have enough information to identify which kprobe arguments are pointers to user space and which are to kernel space. I'm in favour of option B, as I'd like to keep the familiar C-style syntax as the common one. We could have the same four options from the original post:
for kprobes and kernel tracepoints, but for uprobes and USDTs just use Also, maybe the name |
I may save a kernel pointer (eg bio) in a per-tid map and then want to
fetch it and print members during a uprobe/usdt (eg, MySQL query done).
So Id like having addr being probe context sensitive as proposed, but I
think we probably need kstr and kread in uprobe/usdt
…On Sun, May 12, 2019, 2:51 PM Alastair Robertson ***@***.***> wrote:
Yeah, I don't think we have enough information to identify which kprobe
arguments are pointers to user space and which are to kernel space.
I'm in favour of option B, as I'd like to keep the familiar C-style syntax
as the common one. We could have the same four options from the original
post:
- *addr: dereference kernel
- str(addr): fetch NULL-terminated kernel string
- ucopy(addr): dereference user
- ustr(addr): fetch NULL-terminated user string
for kprobes and kernel tracepoints, but for uprobes and USDTs just use
*addr and str(addr) as we currently do, since I don't think they'll ever
have kernel pointers. I'd also suggest not allowing ucopy and ustr in
user space probes, so we don't introduce multiple ways of doing the same
thing.
Also, maybe the name ucopy should be something different, since currently
*addr doesn't actually copy the thing being pointed to into BPF memory.
That only happens if it then gets assigned to a map. Possibly something
like uptr, but not sure.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#614 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAIM3G3JKGJUWZGZCU3TE5DPVCGOBANCNFSM4HL5BXRQ>
.
|
Summarizing the current plan so far. It involves adding 4 builtins, and setting probe memory context. builtins:
context (for
Context for each probe type would be documented in the reference guide. |
I think |
Ok. Hmm. Maybe it should be kptr/uptr. I'm going through tracepoints and tools to see how feasible the plan is. Eg, to test if these pointers are user or kernel:
testing with bpftest.c: #include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
char filename[] = "bpftest.c";
char buf[256];
int fd, bytes;
printf("filename = 0x%llx\n", &filename);
printf("buf = 0x%llx\n", &buf);
fd = open(filename, O_RDONLY);
if (fd < 0) { printf("ERROR open %d\n", fd); return(1); }
bytes = read(fd, &buf, sizeof(buf));
close(fd);
return 0;
} running and tracing it:
Ok, so args->filename and args->buf are user-space pointers: bpftrace is printing the same address that the user-space program sees. |
What if
So we'd only be adding |
Yes I really like this idea! |
In the documentation, we can also make this clear: kprobes/kretprobes:
uprobes/uretprobes:
tracepoints:
|
Sounds good to me. I'll give it a go |
@sumanthkorikkar do you have time to continue work on this? :) |
ok. sure. I will check and give this a try. Good opportunity to know various use cases. Thanks. |
1. Usecases are described on bpftrace#614 2. bpftrace#1427 Implemented usecases: 1. *addr(), str(addr) -> Use bpf_probe_read_kernel or bpf_probe_read_user based on addrspace context. Addrspace context: 1. kprobe, kretprobe, kfunc, kretfunc, tracepoints without syscalls -> bpf_probe_read_kernel 2. usdt, uprobes, uretprobes, usdt, tracepoint with syscalls -> bpf_probe_read_user 2. *uptr(addr), str(uptr(addr)) -> use bpf_probe_read_user 3. *kptr(addr), str(kptr(addr)) -> use bpf_probe_read_kernel 4. argX, retval on kernel context -> bpf_probe_read_kernel 5. argX, retval on user context -> bpf_probe_read_user 6. args->field -> bpf_probe_read_user Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
1. Usecases are described on bpftrace#614 2. bpftrace#1427 Implemented usecases: 1. *addr(), str(addr) -> Use bpf_probe_read_kernel or bpf_probe_read_user based on addrspace context. Addrspace context: 1. kprobe, kretprobe, kfunc, kretfunc, tracepoints without syscalls -> bpf_probe_read_kernel 2. usdt, uprobes, uretprobes, usdt, tracepoint with syscalls -> bpf_probe_read_user 2. *uptr(addr), str(uptr(addr)) -> use bpf_probe_read_user 3. *kptr(addr), str(kptr(addr)) -> use bpf_probe_read_kernel 4. argX, retval on kernel context -> bpf_probe_read_kernel 5. argX, retval on user context -> bpf_probe_read_user 6. args->field -> bpf_probe_read_user Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
1. Usecases are described on bpftrace#614 2. bpftrace#1427 Implemented usecases: 1. *addr(), str(addr) -> Use bpf_probe_read_kernel or bpf_probe_read_user based on addrspace context. Addrspace context: 1. kprobe, kretprobe, kfunc, kretfunc, tracepoints without syscalls -> bpf_probe_read_kernel 2. usdt, uprobes, uretprobes, usdt, tracepoint with syscalls -> bpf_probe_read_user 2. *uptr(addr), str(uptr(addr)) -> use bpf_probe_read_user 3. *kptr(addr), str(kptr(addr)) -> use bpf_probe_read_kernel 4. argX, retval on kernel context -> bpf_probe_read_kernel 5. argX, retval on user context -> bpf_probe_read_user Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
1. Usecases are described on bpftrace#614 2. bpftrace#1427 Implemented usecases: 1. *addr(), str(addr) -> Use bpf_probe_read_kernel or bpf_probe_read_user based on addrspace context. Addrspace context: 1. kprobe, kretprobe, kfunc, kretfunc, tracepoints without syscalls -> bpf_probe_read_kernel 2. usdt, uprobes, uretprobes, usdt, tracepoint with syscalls -> bpf_probe_read_user 2. *uptr(addr), str(uptr(addr)) -> use bpf_probe_read_user 3. *kptr(addr), str(kptr(addr)) -> use bpf_probe_read_kernel 4. argX, retval on kernel context -> bpf_probe_read_kernel 5. argX, retval on user context -> bpf_probe_read_user Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
1. Usecases are described on bpftrace#614 2. bpftrace#1427 Implemented usecases: 1. *addr(), str(addr) -> Use bpf_probe_read_kernel or bpf_probe_read_user based on addrspace context. Addrspace context: 1. kprobe, kretprobe, kfunc, kretfunc, tracepoints without syscalls -> bpf_probe_read_kernel 2. usdt, uprobes, uretprobes, usdt, tracepoint with syscalls -> bpf_probe_read_user 2. *uptr(addr), str(uptr(addr)) -> use bpf_probe_read_user 3. *kptr(addr), str(kptr(addr)) -> use bpf_probe_read_kernel 4. argX, retval on kernel context -> bpf_probe_read_kernel 5. argX, retval on user context -> bpf_probe_read_user Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
1. Usecases are described on bpftrace#614 2. bpftrace#1427 Implemented usecases: 1. *addr(), str(addr) -> Use bpf_probe_read_kernel or bpf_probe_read_user based on addrspace context. Addrspace context: 1. kprobe, kretprobe, kfunc, kretfunc, tracepoints without syscalls -> bpf_probe_read_kernel 2. usdt, uprobes, uretprobes, usdt, tracepoint with syscalls -> bpf_probe_read_user 2. *uptr(addr), str(uptr(addr)) -> use bpf_probe_read_user 3. *kptr(addr), str(kptr(addr)) -> use bpf_probe_read_kernel 4. argX, retval on kernel context -> bpf_probe_read_kernel 5. argX, retval on user context -> bpf_probe_read_user Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
1. Usecases are described on bpftrace#614 2. bpftrace#1427 Implemented usecases: 1. *addr(), str(addr) -> Use bpf_probe_read_kernel or bpf_probe_read_user based on addrspace context. Addrspace context: 1. kprobe, kretprobe, kfunc, kretfunc, tracepoints without syscalls -> bpf_probe_read_kernel 2. usdt, uprobes, uretprobes, usdt, tracepoint with syscalls -> bpf_probe_read_user 2. *uptr(addr), str(uptr(addr)) -> use bpf_probe_read_user 3. *kptr(addr), str(kptr(addr)) -> use bpf_probe_read_kernel 4. argX, retval on kernel context -> bpf_probe_read_kernel 5. argX, retval on user context -> bpf_probe_read_user Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
1. Usecases are described on bpftrace#614 2. bpftrace#1427 Implemented usecases: 1. *addr(), str(addr) -> Use bpf_probe_read_kernel or bpf_probe_read_user based on addrspace context. Addrspace context: 1. kprobe, kretprobe, kfunc, kretfunc, tracepoints without syscalls -> bpf_probe_read_kernel 2. usdt, uprobes, uretprobes, usdt, tracepoint with syscalls -> bpf_probe_read_user 2. *uptr(addr), str(uptr(addr)) -> use bpf_probe_read_user 3. *kptr(addr), str(kptr(addr)) -> use bpf_probe_read_kernel 4. argX, retval on kernel context -> bpf_probe_read_kernel 5. argX, retval on user context -> bpf_probe_read_user Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
1. Usecases are described on bpftrace#614 2. bpftrace#1427 Implemented usecases: 1. *addr(), str(addr) -> Use bpf_probe_read_kernel or bpf_probe_read_user based on addrspace context. Addrspace context: 1. kprobe, kretprobe, kfunc, kretfunc, tracepoints without syscalls -> bpf_probe_read_kernel 2. usdt, uprobes, uretprobes, usdt, tracepoint with syscalls -> bpf_probe_read_user 2. *uptr(addr), str(uptr(addr)) -> use bpf_probe_read_user 3. *kptr(addr), str(kptr(addr)) -> use bpf_probe_read_kernel 4. argX, retval on kernel context -> bpf_probe_read_kernel 5. argX, retval on user context -> bpf_probe_read_user Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
1. Usecases are described on #614 2. #1427 Implemented usecases: 1. *addr(), str(addr) -> Use bpf_probe_read_kernel or bpf_probe_read_user based on addrspace context. Addrspace context: 1. kprobe, kretprobe, kfunc, kretfunc, tracepoints without syscalls -> bpf_probe_read_kernel 2. usdt, uprobes, uretprobes, usdt, tracepoint with syscalls -> bpf_probe_read_user 2. *uptr(addr), str(uptr(addr)) -> use bpf_probe_read_user 3. *kptr(addr), str(kptr(addr)) -> use bpf_probe_read_kernel 4. argX, retval on kernel context -> bpf_probe_read_kernel 5. argX, retval on user context -> bpf_probe_read_user Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
do we still have todos for this? I think it works? @sumanthkorikkar |
As was discussed at LSFMM, bpf_probe_read() may be split in the future into bpf_probe_read_kernel() and bpf_probe_read_user(), to support other architectures (like SPARC) where the kernel/user address space overlaps, and the pointer isn't sufficient to identify the mode. This will be a major change.
We currently read memory addresses in at least these two ways:
*addr
: dereferencestr(addr)
: fetch NULL-terminated stringbpftrace will need a way to determine whether to call the _kernel() or _user() version of bpf_probe_read(). Here's what I propose, which breaks the fewest tools:
*addr
: dereference kernelstr(addr)
: fetch NULL-terminated kernel stringucopy(addr)
: dereference userustr(addr)
: fetch NULL-terminated user stringRight now, ucopy() and ustr() would be dummy functions that do nothing, and later on can be switched to call bpf_probe_read_user(). Adding these dummy functions now will help us prevent breaking the bpftrace API in the future.
I picked "u"something() since we have other user functions that start with "u". ucopy() is short for copy_from_user(), and ustr() is the user version of str(). I think these names make sense.
If we are in agreement, we should add these dummy functions sooner rather than later for API stabilization.
Note that #28 calls for a uwrite() function, so if we want symmetry then we could call it uread() instead of ucopy().
Edit: this is option A.
The text was updated successfully, but these errors were encountered: