Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] prepare for bpf_probe_read() split into user/kernel #614

Closed
brendangregg opened this issue May 9, 2019 · 18 comments · Fixed by #1427
Closed

[RFC] prepare for bpf_probe_read() split into user/kernel #614

brendangregg opened this issue May 9, 2019 · 18 comments · Fixed by #1427
Labels
enhancement New feature or request, changes on existing features
Milestone

Comments

@brendangregg
Copy link
Contributor

brendangregg commented May 9, 2019

As was discussed at LSFMM, bpf_probe_read() may be split in the future into bpf_probe_read_kernel() and bpf_probe_read_user(), to support other architectures (like SPARC) where the kernel/user address space overlaps, and the pointer isn't sufficient to identify the mode. This will be a major change.

We currently read memory addresses in at least these two ways:

  • *addr: dereference
  • str(addr): fetch NULL-terminated string

bpftrace will need a way to determine whether to call the _kernel() or _user() version of bpf_probe_read(). Here's what I propose, which breaks the fewest tools:

  • *addr: dereference kernel
  • str(addr): fetch NULL-terminated kernel string
  • ucopy(addr): dereference user
  • ustr(addr): fetch NULL-terminated user string

Right now, ucopy() and ustr() would be dummy functions that do nothing, and later on can be switched to call bpf_probe_read_user(). Adding these dummy functions now will help us prevent breaking the bpftrace API in the future.

I picked "u"something() since we have other user functions that start with "u". ucopy() is short for copy_from_user(), and ustr() is the user version of str(). I think these names make sense.

If we are in agreement, we should add these dummy functions sooner rather than later for API stabilization.

Note that #28 calls for a uwrite() function, so if we want symmetry then we could call it uread() instead of ucopy().

Edit: this is option A.

@ajor
Copy link
Member

ajor commented May 9, 2019

If we're in a kprobe/kernel tracepoint will all pointers be to kernel memory, and vice versa for uprobes/usdts?

@brendangregg
Copy link
Contributor Author

brendangregg commented May 9, 2019

@ajor Ah right, this was suggested as one possible solution. I'll check all my tools to see how feasible that is.

Following that approach, we could leave these with their current syntax and the following meanings, with their behavior changing when bpf_probe_read() is split:

  • *addr: dereference event-context memory (kernel/user based on event)
  • str(addr): fetch NULL-terminated event-context string (kernel/user based on event)

And add these optional qualified versions (going with the *read() version here):

  • kread(addr): dereference kernel
  • kstr(addr): fetch NULL-terminated kernel string
  • uread(addr): dereference user
  • ustr(addr): fetch NULL-terminated user string

Edit: this is option B.

@danobi
Copy link
Member

danobi commented May 10, 2019

What about if you kprobed

ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count);

And tried to read from buf? If my understanding is correct, buf is a userspace address despite us being in kprobe context.

@brendangregg
Copy link
Contributor Author

Right, such functions are the exception, and now that I think of it, all the syscalls tracepoints that use buffers will have the same problem. Eg:

# bpftrace -lv 't:syscalls:sys_enter_connect'
failed to initialize usdt context for path syscalls
tracepoint:syscalls:sys_enter_connect
    int __syscall_nr;
    int fd;
    struct sockaddr * uservaddr;
    int addrlen;

With options A or B you'd need to use uread() to handle this.

If this happens for all syscall tracepoints, then I wonder if they can default to user-memory accesses. Would complicate documentation: tracepoints are kernel context unless they are syscall tracepoints.

@danobi
Copy link
Member

danobi commented May 10, 2019

I suspect being explicit would be the better long term option for maintainability. Special casing a bunch of things would probably lead to a bunch of tricky gotcha moments. Especially b/c it's hard to tell sometimes if you're reading junk memory or actual data.

How likely is the bpf_probe_read split going to happen?

@ajor
Copy link
Member

ajor commented May 12, 2019

Yeah, I don't think we have enough information to identify which kprobe arguments are pointers to user space and which are to kernel space.

I'm in favour of option B, as I'd like to keep the familiar C-style syntax as the common one. We could have the same four options from the original post:

  • *addr: dereference kernel
  • str(addr): fetch NULL-terminated kernel string
  • ucopy(addr): dereference user
  • ustr(addr): fetch NULL-terminated user string

for kprobes and kernel tracepoints, but for uprobes and USDTs just use *addr and str(addr) as we currently do, since I don't think they'll ever have kernel pointers. I'd also suggest not allowing ucopy and ustr in user space probes, so we don't introduce multiple ways of doing the same thing.

Also, maybe the name ucopy should be something different, since currently *addr doesn't actually copy the thing being pointed to into BPF memory. That only happens if it then gets assigned to a map. Possibly something like uptr, but not sure.

@brendangregg
Copy link
Contributor Author

brendangregg commented May 12, 2019 via email

@brendangregg
Copy link
Contributor Author

@brendangregg
Copy link
Contributor Author

Summarizing the current plan so far. It involves adding 4 builtins, and setting probe memory context.

builtins:

  • uread(addr): dereference user address
  • ustr(addr): fetch NULL-terminated user string
  • kread(addr): dereference kernel address
  • kstr(addr): fetch NULL-terminated kernel string

context (for *addr and str()):

  • kprobes/kretprobes: kernel
  • uprobes/uretprobes: user
  • tracepoints: kernel (with the exception of syscall tracepoints: user)
  • other probe types: kernel

Context for each probe type would be documented in the reference guide.

@ajor
Copy link
Member

ajor commented Jun 13, 2019

I think kptr() and uptr() would make more sense than kcopy/kread() and ucopy/uread() if these functions are just to dereference pointers. Dereferencing doesn't actually involve copying any data in bpftrace - that only happens when the dereferenced pointer is assigned to a map.

@brendangregg brendangregg changed the title [RFC] add copy_from_user() equivalent [RFC] prepare for bpf_probe_read() split into user/kernel Jun 13, 2019
@brendangregg
Copy link
Contributor Author

Ok. Hmm. Maybe it should be kptr/uptr.

I'm going through tracepoints and tools to see how feasible the plan is. Eg, to test if these pointers are user or kernel:

# bpftrace -lv 't:syscalls:sys_enter_read'
tracepoint:syscalls:sys_enter_read
    int __syscall_nr;
    unsigned int fd;
    char * buf;
    size_t count;
# bpftrace -lv 't:syscalls:sys_enter_openat'
tracepoint:syscalls:sys_enter_openat
    int __syscall_nr;
    int dfd;
    const char * filename;
    int flags;
    umode_t mode;

testing with bpftest.c:

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
	char filename[] = "bpftest.c";
	char buf[256];
	int fd, bytes;

	printf("filename = 0x%llx\n", &filename);
	printf("buf = 0x%llx\n", &buf);

	fd = open(filename, O_RDONLY);
	if (fd < 0) { printf("ERROR open %d\n", fd); return(1); }

	bytes = read(fd, &buf, sizeof(buf));
	close(fd);

	return 0;
}

running and tracing it:

# bpftrace -e 't:syscalls:sys_enter_openat /comm == "bpftest"/ { printf("%llx %s\n", args->filename, str(args->filename)); }' -c ./bpftest
Attaching 1 probe...
7f51deb6f428 /etc/ld.so.cache
7f51ded77dd0 /lib/x86_64-linux-gnu/libc.so.6
filename = 0x7fff21006526
buf = 0x7fff21006530
7fff21006526 bpftest.c
# bpftrace -e 't:syscalls:sys_enter_read /comm == "bpftest"/ { printf("%llx\n", args->buf); }' -c ./bpftest
Attaching 1 probe...
7ffd241c1208
filename = 0x7ffd241c1906
buf = 0x7ffd241c1910
7ffd241c1910

Ok, so args->filename and args->buf are user-space pointers: bpftrace is printing the same address that the user-space program sees.

@brendangregg
Copy link
Contributor Author

What if kptr() and uptr() just set the address mode for the address argument, and no more? ie, did not dereference. We'd then have:

  • *addr: dereference address from probe address space context
  • str(): fetch NULL-terminated string from probe address space context
  • *uptr(addr): dereference user address
  • str(uptr(addr)): fetch NULL-terminated user string
  • *kptr(addr): dereference kernel address
  • str(kptr(addr)): fetch NULL-terminated kernel string

So we'd only be adding kptr() and uptr(). I think their usage will be super rare, so maybe this is sufficient.

@ajor
Copy link
Member

ajor commented Jun 24, 2019

Yes I really like this idea!

@brendangregg
Copy link
Contributor Author

In the documentation, we can also make this clear:

kprobes/kretprobes:

  • arg0...argN, retval: when dereferenced, are kernel addresses.

uprobes/uretprobes:

  • arg0...argN, retval: when dereferenced, are user addresses.

tracepoints:

  • args->field: when dereferenced, are kernel addresses, except syscall tracepoints which are user.

@mmarchini mmarchini added the enhancement New feature or request, changes on existing features label Jul 24, 2019
@fbs
Copy link
Contributor

fbs commented Feb 11, 2020

Sounds good to me. I'll give it a go

@danobi danobi linked a pull request Aug 18, 2020 that will close this issue
@fbs fbs mentioned this issue Aug 19, 2020
@fbs fbs added this to the 0.12.0 milestone Aug 19, 2020
@fbs fbs closed this as completed in #1427 Aug 28, 2020
@fbs
Copy link
Contributor

fbs commented Sep 1, 2020

@sumanthkorikkar do you have time to continue work on this? :)

@sumanthkorikkar
Copy link
Contributor

ok. sure. I will check and give this a try. Good opportunity to know various use cases. Thanks.

@fbs fbs reopened this Sep 2, 2020
sumanthkorikkar added a commit to sumanthkorikkar/bpftrace that referenced this issue Sep 8, 2020
1. Usecases are described on bpftrace#614
2. bpftrace#1427

Implemented usecases:
1. *addr(), str(addr) -> Use bpf_probe_read_kernel or
   bpf_probe_read_user based on addrspace context.
   Addrspace context:
   1. kprobe, kretprobe, kfunc, kretfunc, tracepoints without syscalls
        -> bpf_probe_read_kernel
   2. usdt, uprobes, uretprobes, usdt, tracepoint with syscalls
        -> bpf_probe_read_user
2. *uptr(addr), str(uptr(addr)) -> use bpf_probe_read_user
3. *kptr(addr), str(kptr(addr)) -> use bpf_probe_read_kernel

4. argX, retval on kernel context -> bpf_probe_read_kernel
5. argX, retval  on user context -> bpf_probe_read_user
6. args->field -> bpf_probe_read_user

Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
sumanthkorikkar added a commit to sumanthkorikkar/bpftrace that referenced this issue Sep 8, 2020
1. Usecases are described on bpftrace#614
2. bpftrace#1427

Implemented usecases:
1. *addr(), str(addr) -> Use bpf_probe_read_kernel or
   bpf_probe_read_user based on addrspace context.
   Addrspace context:
   1. kprobe, kretprobe, kfunc, kretfunc, tracepoints without syscalls
        -> bpf_probe_read_kernel
   2. usdt, uprobes, uretprobes, usdt, tracepoint with syscalls
        -> bpf_probe_read_user
2. *uptr(addr), str(uptr(addr)) -> use bpf_probe_read_user
3. *kptr(addr), str(kptr(addr)) -> use bpf_probe_read_kernel

4. argX, retval on kernel context -> bpf_probe_read_kernel
5. argX, retval  on user context -> bpf_probe_read_user
6. args->field -> bpf_probe_read_user

Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
sumanthkorikkar added a commit to sumanthkorikkar/bpftrace that referenced this issue Sep 8, 2020
1. Usecases are described on bpftrace#614
2. bpftrace#1427

Implemented usecases:
1. *addr(), str(addr) -> Use bpf_probe_read_kernel or
   bpf_probe_read_user based on addrspace context.
   Addrspace context:
   1. kprobe, kretprobe, kfunc, kretfunc, tracepoints without syscalls
        -> bpf_probe_read_kernel
   2. usdt, uprobes, uretprobes, usdt, tracepoint with syscalls
        -> bpf_probe_read_user
2. *uptr(addr), str(uptr(addr)) -> use bpf_probe_read_user
3. *kptr(addr), str(kptr(addr)) -> use bpf_probe_read_kernel

4. argX, retval on kernel context -> bpf_probe_read_kernel
5. argX, retval  on user context -> bpf_probe_read_user

Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
sumanthkorikkar added a commit to sumanthkorikkar/bpftrace that referenced this issue Sep 13, 2020
1. Usecases are described on bpftrace#614
2. bpftrace#1427

Implemented usecases:
1. *addr(), str(addr) -> Use bpf_probe_read_kernel or
   bpf_probe_read_user based on addrspace context.
   Addrspace context:
   1. kprobe, kretprobe, kfunc, kretfunc, tracepoints without syscalls
        -> bpf_probe_read_kernel
   2. usdt, uprobes, uretprobes, usdt, tracepoint with syscalls
        -> bpf_probe_read_user
2. *uptr(addr), str(uptr(addr)) -> use bpf_probe_read_user
3. *kptr(addr), str(kptr(addr)) -> use bpf_probe_read_kernel

4. argX, retval on kernel context -> bpf_probe_read_kernel
5. argX, retval  on user context -> bpf_probe_read_user

Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
sumanthkorikkar added a commit to sumanthkorikkar/bpftrace that referenced this issue Sep 21, 2020
1. Usecases are described on bpftrace#614
2. bpftrace#1427

Implemented usecases:
1. *addr(), str(addr) -> Use bpf_probe_read_kernel or
   bpf_probe_read_user based on addrspace context.
   Addrspace context:
   1. kprobe, kretprobe, kfunc, kretfunc, tracepoints without syscalls
        -> bpf_probe_read_kernel
   2. usdt, uprobes, uretprobes, usdt, tracepoint with syscalls
        -> bpf_probe_read_user
2. *uptr(addr), str(uptr(addr)) -> use bpf_probe_read_user
3. *kptr(addr), str(kptr(addr)) -> use bpf_probe_read_kernel

4. argX, retval on kernel context -> bpf_probe_read_kernel
5. argX, retval  on user context -> bpf_probe_read_user

Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
sumanthkorikkar added a commit to sumanthkorikkar/bpftrace that referenced this issue Sep 22, 2020
1. Usecases are described on bpftrace#614
2. bpftrace#1427

Implemented usecases:
1. *addr(), str(addr) -> Use bpf_probe_read_kernel or
   bpf_probe_read_user based on addrspace context.
   Addrspace context:
   1. kprobe, kretprobe, kfunc, kretfunc, tracepoints without syscalls
        -> bpf_probe_read_kernel
   2. usdt, uprobes, uretprobes, usdt, tracepoint with syscalls
        -> bpf_probe_read_user
2. *uptr(addr), str(uptr(addr)) -> use bpf_probe_read_user
3. *kptr(addr), str(kptr(addr)) -> use bpf_probe_read_kernel

4. argX, retval on kernel context -> bpf_probe_read_kernel
5. argX, retval  on user context -> bpf_probe_read_user

Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
sumanthkorikkar added a commit to sumanthkorikkar/bpftrace that referenced this issue Sep 24, 2020
1. Usecases are described on bpftrace#614
2. bpftrace#1427

Implemented usecases:
1. *addr(), str(addr) -> Use bpf_probe_read_kernel or
   bpf_probe_read_user based on addrspace context.
   Addrspace context:
   1. kprobe, kretprobe, kfunc, kretfunc, tracepoints without syscalls
        -> bpf_probe_read_kernel
   2. usdt, uprobes, uretprobes, usdt, tracepoint with syscalls
        -> bpf_probe_read_user
2. *uptr(addr), str(uptr(addr)) -> use bpf_probe_read_user
3. *kptr(addr), str(kptr(addr)) -> use bpf_probe_read_kernel

4. argX, retval on kernel context -> bpf_probe_read_kernel
5. argX, retval  on user context -> bpf_probe_read_user

Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
sumanthkorikkar added a commit to sumanthkorikkar/bpftrace that referenced this issue Oct 7, 2020
1. Usecases are described on bpftrace#614
2. bpftrace#1427

Implemented usecases:
1. *addr(), str(addr) -> Use bpf_probe_read_kernel or
   bpf_probe_read_user based on addrspace context.
   Addrspace context:
   1. kprobe, kretprobe, kfunc, kretfunc, tracepoints without syscalls
        -> bpf_probe_read_kernel
   2. usdt, uprobes, uretprobes, usdt, tracepoint with syscalls
        -> bpf_probe_read_user
2. *uptr(addr), str(uptr(addr)) -> use bpf_probe_read_user
3. *kptr(addr), str(kptr(addr)) -> use bpf_probe_read_kernel

4. argX, retval on kernel context -> bpf_probe_read_kernel
5. argX, retval  on user context -> bpf_probe_read_user

Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
sumanthkorikkar added a commit to sumanthkorikkar/bpftrace that referenced this issue Oct 7, 2020
1. Usecases are described on bpftrace#614
2. bpftrace#1427

Implemented usecases:
1. *addr(), str(addr) -> Use bpf_probe_read_kernel or
   bpf_probe_read_user based on addrspace context.
   Addrspace context:
   1. kprobe, kretprobe, kfunc, kretfunc, tracepoints without syscalls
        -> bpf_probe_read_kernel
   2. usdt, uprobes, uretprobes, usdt, tracepoint with syscalls
        -> bpf_probe_read_user
2. *uptr(addr), str(uptr(addr)) -> use bpf_probe_read_user
3. *kptr(addr), str(kptr(addr)) -> use bpf_probe_read_kernel

4. argX, retval on kernel context -> bpf_probe_read_kernel
5. argX, retval  on user context -> bpf_probe_read_user

Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
fbs pushed a commit that referenced this issue Oct 9, 2020
1. Usecases are described on #614
2. #1427

Implemented usecases:
1. *addr(), str(addr) -> Use bpf_probe_read_kernel or
   bpf_probe_read_user based on addrspace context.
   Addrspace context:
   1. kprobe, kretprobe, kfunc, kretfunc, tracepoints without syscalls
        -> bpf_probe_read_kernel
   2. usdt, uprobes, uretprobes, usdt, tracepoint with syscalls
        -> bpf_probe_read_user
2. *uptr(addr), str(uptr(addr)) -> use bpf_probe_read_user
3. *kptr(addr), str(kptr(addr)) -> use bpf_probe_read_kernel

4. argX, retval on kernel context -> bpf_probe_read_kernel
5. argX, retval  on user context -> bpf_probe_read_user

Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
@fbs fbs modified the milestones: 0.12.0, 0.13.0 Mar 1, 2021
@fbs
Copy link
Contributor

fbs commented May 4, 2021

do we still have todos for this? I think it works? @sumanthkorikkar

@fbs fbs closed this as completed Jun 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request, changes on existing features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants