New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid reads: type=inv expected=fp #209

Closed
brendangregg opened this Issue Sep 12, 2015 · 9 comments

Comments

Projects
None yet
3 participants
@brendangregg
Member

brendangregg commented Sep 12, 2015

I can't figure out some errors like the following:

bpf: Permission denied
0: (79) r6 = *(u64 *)(r1 +112)
1: (85) call 14
2: (63) *(u32 *)(r10 -12) = r0
3: (15) if r6 == 0x0 goto pc+55
 R0=inv R6=inv R10=fp
4: (b7) r7 = 0
5: (7b) *(u64 *)(r10 -24) = r7
6: (7b) *(u64 *)(r10 -32) = r7
7: (07) r6 += 16
8: (bf) r1 = r10
9: (07) r1 += -32
10: (b7) r2 = 16
11: (bf) r3 = r6
12: (85) call 4
13: (79) r1 = *(u64 *)(r10 -24)
14: (15) if r1 == 0x0 goto pc+44
 R0=inv R1=inv R6=inv R7=imm0 R10=fp
15: (7b) *(u64 *)(r10 -40) = r7
16: (7b) *(u64 *)(r10 -48) = r7
17: (bf) r1 = r10
18: (07) r1 += -48
19: (b7) r2 = 16
20: (bf) r3 = r6
21: (85) call 4
22: (79) r1 = *(u64 *)(r10 -48)
23: (15) if r1 == 0x0 goto pc+35
 R0=inv R1=inv R6=inv R7=imm0 R10=fp
24: (b7) r7 = 0
25: (7b) *(u64 *)(r10 -56) = r7
26: (7b) *(u64 *)(r10 -64) = r7
27: (7b) *(u64 *)(r10 -72) = r7
28: (bf) r1 = r10
29: (07) r1 += -72
30: (b7) r2 = 16
31: (bf) r3 = r6
32: (85) call 4
33: (79) r6 = *(u64 *)(r10 -64)
34: (07) r6 += 56
35: (bf) r1 = r10
36: (07) r1 += -56
37: (b7) r2 = 8
38: (bf) r3 = r6
39: (85) call 4
40: (18) r1 = 0x93c8960
42: (bf) r2 = r10
43: (07) r2 += -12
44: (bf) r3 = r6
45: (b7) r4 = 0
46: (85) call 2
R3 type=inv expected=fp

Traceback (most recent call last):
  File "./vfsslower", line 83, in <module>
    b.attach_kprobe(event="vfs_read", fn_name="trace_entry")
  File "/usr/lib/python2.7/dist-packages/bcc/__init__.py", line 498, in attach_kprobe
    fn = self.load_func(fn_name, BPF.KPROBE)
  File "/usr/lib/python2.7/dist-packages/bcc/__init__.py", line 388, in load_func
    raise Exception("Failed to load BPF program %s" % func_name)
Exception: Failed to load BPF program trace_entry

What actually is "type=inv expected=fp"?

Here's a test program, that in this state works:

#!/usr/bin/python
#
# vfsslower Trace file system read/writes (via VFS) slower than a threshold.
#       For Linux, uses BCC, eBPF.
#
# In development. Incomplete. DO NOT USE.
#
# Copyright (c) 2015 Brendan Gregg.
# Licensed under the Apache License, Version 2.0 (the "License")
#
#12-Sep-2015   Brendan Gregg   Created this.

from __future__ import print_function
from bcc import BPF

REQ_WRITE = 1       # from include/linux/blk_types.h

# load BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
#include <linux/blkdev.h>
#include <linux/mount.h>

BPF_HASH(start, u32);
BPF_HASH(filebypid, u32, char *);

int trace_entry(struct pt_regs *ctx, struct file *file)
{
    u64 ts;
    u32 pid = bpf_get_current_pid_tgid();


    if (file != NULL && file->f_path.dentry != NULL &&
        file->f_path.mnt != NULL) {
// XXX can't get this to work:
//      struct vfsmount *mnt = file->f_path.mnt;
//      if (!(mnt->mnt_flags & MNT_NODEV)) {
//          return 0;
//      }

        void *name = 0;
        struct dentry *dent = file->f_path.dentry;

        // get filename via d_iname
        bpf_probe_read(&name, sizeof(name), dent->d_iname);
// XXX can't get this to work:
//      filebypid.update(&pid, (char **)&dent->d_iname);

// debug only:
        bpf_trace_printk("read d_iname %s\\n", &dent->d_iname);
    }

    ts = bpf_ktime_get_ns();
    start.update(&pid, &ts);
    return 0;
}

int trace_return(struct pt_regs *ctx)
{
    u64 *tsp, delta;
    u32 pid = bpf_get_current_pid_tgid();

    pid = bpf_get_current_pid_tgid();
    tsp = start.lookup(&pid);

    if (tsp != 0) {
        delta = bpf_ktime_get_ns() - *tsp;
        if (delta > MIN_LATENCY_NS) {
            char *file = (char *)filebypid.lookup(&pid);
            bpf_trace_printk("%d us %s\\n", delta / 1000, file);
            if (file != NULL) {
                filebypid.delete(&pid);
            }
        }
        start.delete(&pid);
    }

    return 0;
}
"""
bpf_text = bpf_text.replace('MIN_LATENCY_NS', '100000')
b = BPF(text=bpf_text)
b.attach_kprobe(event="vfs_read", fn_name="trace_entry")
b.attach_kretprobe(event="vfs_read", fn_name="trace_return")

b.trace_print()

There's a couple of blocks, that when uncommented, break it.

It's pretty noisy to run, since it prints tty vfs_read()s from ssh (hence trying to filter it based on MNT_NODEV, which I don't know yet if it works). So I'd been testing it like this:

window1# ./vfsslower > out
window2# echo 1 > /proc/sys/vm/drop_caches; cksum /usr/bin/*
window1# grep cksum out
@yonghong-song

This comment has been minimized.

Show comment
Hide comment
@yonghong-song

yonghong-song Sep 12, 2015

Collaborator

for map operations. the verifier has the following constraints for safely
reason (check linux:kernel/bpf/verifier.c)

  • ARG_PTR_TO_MAP_KEY is one of such argument constraints.
  • It means that the register type passed to this function must be
  • PTR_TO_STACK and it will be used inside the function as
  • 'pointer to map element key'

I think the following line

// filebypid.update(&pid, (char **)&dent->d_iname);

may cause trouble as pid is a pointer to an internal data structure and not

a pointer to stack.

On Sat, Sep 12, 2015 at 2:23 PM, Brendan Gregg notifications@github.com
wrote:

I can't figure out some errors like the following:

bpf: Permission denied
0: (79) r6 = *(u64 *)(r1 +112)
1: (85) call 14
2: (63) *(u32 *)(r10 -12) = r0
3: (15) if r6 == 0x0 goto pc+55
R0=inv R6=inv R10=fp
4: (b7) r7 = 0
5: (7b) *(u64 *)(r10 -24) = r7
6: (7b) *(u64 *)(r10 -32) = r7
7: (07) r6 += 16
8: (bf) r1 = r10
9: (07) r1 += -32
10: (b7) r2 = 16
11: (bf) r3 = r6
12: (85) call 4
13: (79) r1 = *(u64 *)(r10 -24)
14: (15) if r1 == 0x0 goto pc+44
R0=inv R1=inv R6=inv R7=imm0 R10=fp
15: (7b) *(u64 *)(r10 -40) = r7
16: (7b) *(u64 *)(r10 -48) = r7
17: (bf) r1 = r10
18: (07) r1 += -48
19: (b7) r2 = 16
20: (bf) r3 = r6
21: (85) call 4
22: (79) r1 = *(u64 *)(r10 -48)
23: (15) if r1 == 0x0 goto pc+35
R0=inv R1=inv R6=inv R7=imm0 R10=fp
24: (b7) r7 = 0
25: (7b) *(u64 *)(r10 -56) = r7
26: (7b) *(u64 *)(r10 -64) = r7
27: (7b) *(u64 *)(r10 -72) = r7
28: (bf) r1 = r10
29: (07) r1 += -72
30: (b7) r2 = 16
31: (bf) r3 = r6
32: (85) call 4
33: (79) r6 = *(u64 *)(r10 -64)
34: (07) r6 += 56
35: (bf) r1 = r10
36: (07) r1 += -56
37: (b7) r2 = 8
38: (bf) r3 = r6
39: (85) call 4
40: (18) r1 = 0x93c8960
42: (bf) r2 = r10
43: (07) r2 += -12
44: (bf) r3 = r6
45: (b7) r4 = 0
46: (85) call 2
R3 type=inv expected=fp

Traceback (most recent call last):
File "./vfsslower", line 83, in
b.attach_kprobe(event="vfs_read", fn_name="trace_entry")
File "/usr/lib/python2.7/dist-packages/bcc/init.py", line 498, in attach_kprobe
fn = self.load_func(fn_name, BPF.KPROBE)
File "/usr/lib/python2.7/dist-packages/bcc/init.py", line 388, in load_func
raise Exception("Failed to load BPF program %s" % func_name)
Exception: Failed to load BPF program trace_entry

What actually is "type=inv expected=fp"?

Here's a test program, that in this state works:

#!/usr/bin/python## vfsslower Trace file system read/writes (via VFS) slower than a threshold.# For Linux, uses BCC, eBPF.## In development. Incomplete. DO NOT USE.## Copyright (c) 2015 Brendan Gregg.# Licensed under the Apache License, Version 2.0 (the "License")## 12-Sep-2015 Brendan Gregg Created this.
from future import print_functionfrom bcc import BPF

REQ_WRITE = 1 # from include/linux/blk_types.h

load BPF program

bpf_text = """#include <uapi/linux/ptrace.h>#include <linux/blkdev.h>#include <linux/mount.h>BPF_HASH(start, u32);BPF_HASH(filebypid, u32, char );int trace_entry(struct pt_regs *ctx, struct file *file){ u64 ts; u32 pid = bpf_get_current_pid_tgid(); if (file != NULL && file->f_path.dentry != NULL && file->f_path.mnt != NULL) {// XXX can't get this to work:// struct vfsmount *mnt = file->f_path.mnt;// if (!(mnt->mnt_flags & MNT_NODEV)) {// return 0;// } void *name = 0; struct dentry *dent = file->f_path.dentry; // get filename via d_iname bpf_probe_read(&name, sizeof(name), dent->d_iname);// XXX can't get this to work:// filebypid.update(&pid, (char *)&dent->d_iname);// debug only: bpf_trace_printk("read d_iname %s\n", &dent->d_iname); } ts = bpf_ktime_get_ns(); start.update(&pid, &ts); return 0;}int trace_return(struct pt_regs *ctx){ u64 *tsp, delta; u32 pid = bpf_get_current_pid_tgid(); pid = bpf_get_current_pid_tgid(); tsp = start.lookup(&pid); if (tsp != 0) { delta = bpf_ktime_get_ns() - *tsp; if (delta > MIN_LATENCY_NS) { char *file = (char *)filebypid.lookup(&pid); bpf_trace_printk("%d us %s\n", delta / 1000, file); if (file != NULL) { filebypid.delete(&pid); } } start.delete(&pid); } return 0;}"""
bpf_text = bpf_text.replace('MIN_LATENCY_NS', '100000')
b = BPF(text=bpf_text)
b.attach_kprobe(event="vfs_read", fn_name="trace_entry")
b.attach_kretprobe(event="vfs_read", fn_name="trace_return")

b.trace_print()

There's a couple of blocks, that when uncommented, break it.

It's pretty noisy to run, since it prints tty vfs_read()s from ssh (hence
trying to filter it based on MNT_NODEV, which I don't know yet if it
works). So I'd been testing it like this:

window1# ./vfsslower > out
window2# echo 1 > /proc/sys/vm/drop_caches; cksum /usr/bin/*
window1# grep cksum out


Reply to this email directly or view it on GitHub
#209.

Collaborator

yonghong-song commented Sep 12, 2015

for map operations. the verifier has the following constraints for safely
reason (check linux:kernel/bpf/verifier.c)

  • ARG_PTR_TO_MAP_KEY is one of such argument constraints.
  • It means that the register type passed to this function must be
  • PTR_TO_STACK and it will be used inside the function as
  • 'pointer to map element key'

I think the following line

// filebypid.update(&pid, (char **)&dent->d_iname);

may cause trouble as pid is a pointer to an internal data structure and not

a pointer to stack.

On Sat, Sep 12, 2015 at 2:23 PM, Brendan Gregg notifications@github.com
wrote:

I can't figure out some errors like the following:

bpf: Permission denied
0: (79) r6 = *(u64 *)(r1 +112)
1: (85) call 14
2: (63) *(u32 *)(r10 -12) = r0
3: (15) if r6 == 0x0 goto pc+55
R0=inv R6=inv R10=fp
4: (b7) r7 = 0
5: (7b) *(u64 *)(r10 -24) = r7
6: (7b) *(u64 *)(r10 -32) = r7
7: (07) r6 += 16
8: (bf) r1 = r10
9: (07) r1 += -32
10: (b7) r2 = 16
11: (bf) r3 = r6
12: (85) call 4
13: (79) r1 = *(u64 *)(r10 -24)
14: (15) if r1 == 0x0 goto pc+44
R0=inv R1=inv R6=inv R7=imm0 R10=fp
15: (7b) *(u64 *)(r10 -40) = r7
16: (7b) *(u64 *)(r10 -48) = r7
17: (bf) r1 = r10
18: (07) r1 += -48
19: (b7) r2 = 16
20: (bf) r3 = r6
21: (85) call 4
22: (79) r1 = *(u64 *)(r10 -48)
23: (15) if r1 == 0x0 goto pc+35
R0=inv R1=inv R6=inv R7=imm0 R10=fp
24: (b7) r7 = 0
25: (7b) *(u64 *)(r10 -56) = r7
26: (7b) *(u64 *)(r10 -64) = r7
27: (7b) *(u64 *)(r10 -72) = r7
28: (bf) r1 = r10
29: (07) r1 += -72
30: (b7) r2 = 16
31: (bf) r3 = r6
32: (85) call 4
33: (79) r6 = *(u64 *)(r10 -64)
34: (07) r6 += 56
35: (bf) r1 = r10
36: (07) r1 += -56
37: (b7) r2 = 8
38: (bf) r3 = r6
39: (85) call 4
40: (18) r1 = 0x93c8960
42: (bf) r2 = r10
43: (07) r2 += -12
44: (bf) r3 = r6
45: (b7) r4 = 0
46: (85) call 2
R3 type=inv expected=fp

Traceback (most recent call last):
File "./vfsslower", line 83, in
b.attach_kprobe(event="vfs_read", fn_name="trace_entry")
File "/usr/lib/python2.7/dist-packages/bcc/init.py", line 498, in attach_kprobe
fn = self.load_func(fn_name, BPF.KPROBE)
File "/usr/lib/python2.7/dist-packages/bcc/init.py", line 388, in load_func
raise Exception("Failed to load BPF program %s" % func_name)
Exception: Failed to load BPF program trace_entry

What actually is "type=inv expected=fp"?

Here's a test program, that in this state works:

#!/usr/bin/python## vfsslower Trace file system read/writes (via VFS) slower than a threshold.# For Linux, uses BCC, eBPF.## In development. Incomplete. DO NOT USE.## Copyright (c) 2015 Brendan Gregg.# Licensed under the Apache License, Version 2.0 (the "License")## 12-Sep-2015 Brendan Gregg Created this.
from future import print_functionfrom bcc import BPF

REQ_WRITE = 1 # from include/linux/blk_types.h

load BPF program

bpf_text = """#include <uapi/linux/ptrace.h>#include <linux/blkdev.h>#include <linux/mount.h>BPF_HASH(start, u32);BPF_HASH(filebypid, u32, char );int trace_entry(struct pt_regs *ctx, struct file *file){ u64 ts; u32 pid = bpf_get_current_pid_tgid(); if (file != NULL && file->f_path.dentry != NULL && file->f_path.mnt != NULL) {// XXX can't get this to work:// struct vfsmount *mnt = file->f_path.mnt;// if (!(mnt->mnt_flags & MNT_NODEV)) {// return 0;// } void *name = 0; struct dentry *dent = file->f_path.dentry; // get filename via d_iname bpf_probe_read(&name, sizeof(name), dent->d_iname);// XXX can't get this to work:// filebypid.update(&pid, (char *)&dent->d_iname);// debug only: bpf_trace_printk("read d_iname %s\n", &dent->d_iname); } ts = bpf_ktime_get_ns(); start.update(&pid, &ts); return 0;}int trace_return(struct pt_regs *ctx){ u64 *tsp, delta; u32 pid = bpf_get_current_pid_tgid(); pid = bpf_get_current_pid_tgid(); tsp = start.lookup(&pid); if (tsp != 0) { delta = bpf_ktime_get_ns() - *tsp; if (delta > MIN_LATENCY_NS) { char *file = (char *)filebypid.lookup(&pid); bpf_trace_printk("%d us %s\n", delta / 1000, file); if (file != NULL) { filebypid.delete(&pid); } } start.delete(&pid); } return 0;}"""
bpf_text = bpf_text.replace('MIN_LATENCY_NS', '100000')
b = BPF(text=bpf_text)
b.attach_kprobe(event="vfs_read", fn_name="trace_entry")
b.attach_kretprobe(event="vfs_read", fn_name="trace_return")

b.trace_print()

There's a couple of blocks, that when uncommented, break it.

It's pretty noisy to run, since it prints tty vfs_read()s from ssh (hence
trying to filter it based on MNT_NODEV, which I don't know yet if it
works). So I'd been testing it like this:

window1# ./vfsslower > out
window2# echo 1 > /proc/sys/vm/drop_caches; cksum /usr/bin/*
window1# grep cksum out


Reply to this email directly or view it on GitHub
#209.

@yonghong-song

This comment has been minimized.

Show comment
Hide comment
@yonghong-song

yonghong-song Sep 12, 2015

Collaborator

To answer the question, what is "inv" and "fp"?
In linux:kernel/bpf/verifier.c,
static const char * const reg_type_str[] = {
[NOT_INIT] = "?",
[UNKNOWN_VALUE] = "inv",
[PTR_TO_CTX] = "ctx",
[CONST_PTR_TO_MAP] = "map_ptr",
[PTR_TO_MAP_VALUE] = "map_value",
[PTR_TO_MAP_VALUE_OR_NULL] = "map_value_or_null",
[FRAME_PTR] = "fp",
[PTR_TO_STACK] = "fp",
[CONST_IMM] = "imm",
};
so "inv" represents an unknown value and "fp" is a ptr to stack.

On Sat, Sep 12, 2015 at 2:23 PM, Brendan Gregg notifications@github.com
wrote:

I can't figure out some errors like the following:

bpf: Permission denied
0: (79) r6 = *(u64 *)(r1 +112)
1: (85) call 14
2: (63) *(u32 *)(r10 -12) = r0
3: (15) if r6 == 0x0 goto pc+55
R0=inv R6=inv R10=fp
4: (b7) r7 = 0
5: (7b) *(u64 *)(r10 -24) = r7
6: (7b) *(u64 *)(r10 -32) = r7
7: (07) r6 += 16
8: (bf) r1 = r10
9: (07) r1 += -32
10: (b7) r2 = 16
11: (bf) r3 = r6
12: (85) call 4
13: (79) r1 = *(u64 *)(r10 -24)
14: (15) if r1 == 0x0 goto pc+44
R0=inv R1=inv R6=inv R7=imm0 R10=fp
15: (7b) *(u64 *)(r10 -40) = r7
16: (7b) *(u64 *)(r10 -48) = r7
17: (bf) r1 = r10
18: (07) r1 += -48
19: (b7) r2 = 16
20: (bf) r3 = r6
21: (85) call 4
22: (79) r1 = *(u64 *)(r10 -48)
23: (15) if r1 == 0x0 goto pc+35
R0=inv R1=inv R6=inv R7=imm0 R10=fp
24: (b7) r7 = 0
25: (7b) *(u64 *)(r10 -56) = r7
26: (7b) *(u64 *)(r10 -64) = r7
27: (7b) *(u64 *)(r10 -72) = r7
28: (bf) r1 = r10
29: (07) r1 += -72
30: (b7) r2 = 16
31: (bf) r3 = r6
32: (85) call 4
33: (79) r6 = *(u64 *)(r10 -64)
34: (07) r6 += 56
35: (bf) r1 = r10
36: (07) r1 += -56
37: (b7) r2 = 8
38: (bf) r3 = r6
39: (85) call 4
40: (18) r1 = 0x93c8960
42: (bf) r2 = r10
43: (07) r2 += -12
44: (bf) r3 = r6
45: (b7) r4 = 0
46: (85) call 2
R3 type=inv expected=fp

Traceback (most recent call last):
File "./vfsslower", line 83, in
b.attach_kprobe(event="vfs_read", fn_name="trace_entry")
File "/usr/lib/python2.7/dist-packages/bcc/init.py", line 498, in attach_kprobe
fn = self.load_func(fn_name, BPF.KPROBE)
File "/usr/lib/python2.7/dist-packages/bcc/init.py", line 388, in load_func
raise Exception("Failed to load BPF program %s" % func_name)
Exception: Failed to load BPF program trace_entry

What actually is "type=inv expected=fp"?

Here's a test program, that in this state works:

#!/usr/bin/python## vfsslower Trace file system read/writes (via VFS) slower than a threshold.# For Linux, uses BCC, eBPF.## In development. Incomplete. DO NOT USE.## Copyright (c) 2015 Brendan Gregg.# Licensed under the Apache License, Version 2.0 (the "License")## 12-Sep-2015 Brendan Gregg Created this.
from future import print_functionfrom bcc import BPF

REQ_WRITE = 1 # from include/linux/blk_types.h

load BPF program

bpf_text = """#include <uapi/linux/ptrace.h>#include <linux/blkdev.h>#include <linux/mount.h>BPF_HASH(start, u32);BPF_HASH(filebypid, u32, char );int trace_entry(struct pt_regs *ctx, struct file *file){ u64 ts; u32 pid = bpf_get_current_pid_tgid(); if (file != NULL && file->f_path.dentry != NULL && file->f_path.mnt != NULL) {// XXX can't get this to work:// struct vfsmount *mnt = file->f_path.mnt;// if (!(mnt->mnt_flags & MNT_NODEV)) {// return 0;// } void *name = 0; struct dentry *dent = file->f_path.dentry; // get filename via d_iname bpf_probe_read(&name, sizeof(name), dent->d_iname);// XXX can't get this to work:// filebypid.update(&pid, (char *)&dent->d_iname);// debug only: bpf_trace_printk("read d_iname %s\n", &dent->d_iname); } ts = bpf_ktime_get_ns(); start.update(&pid, &ts); return 0;}int trace_return(struct pt_regs *ctx){ u64 *tsp, delta; u32 pid = bpf_get_current_pid_tgid(); pid = bpf_get_current_pid_tgid(); tsp = start.lookup(&pid); if (tsp != 0) { delta = bpf_ktime_get_ns() - *tsp; if (delta > MIN_LATENCY_NS) { char *file = (char *)filebypid.lookup(&pid); bpf_trace_printk("%d us %s\n", delta / 1000, file); if (file != NULL) { filebypid.delete(&pid); } } start.delete(&pid); } return 0;}"""
bpf_text = bpf_text.replace('MIN_LATENCY_NS', '100000')
b = BPF(text=bpf_text)
b.attach_kprobe(event="vfs_read", fn_name="trace_entry")
b.attach_kretprobe(event="vfs_read", fn_name="trace_return")

b.trace_print()

There's a couple of blocks, that when uncommented, break it.

It's pretty noisy to run, since it prints tty vfs_read()s from ssh (hence
trying to filter it based on MNT_NODEV, which I don't know yet if it
works). So I'd been testing it like this:

window1# ./vfsslower > out
window2# echo 1 > /proc/sys/vm/drop_caches; cksum /usr/bin/*
window1# grep cksum out


Reply to this email directly or view it on GitHub
#209.

Collaborator

yonghong-song commented Sep 12, 2015

To answer the question, what is "inv" and "fp"?
In linux:kernel/bpf/verifier.c,
static const char * const reg_type_str[] = {
[NOT_INIT] = "?",
[UNKNOWN_VALUE] = "inv",
[PTR_TO_CTX] = "ctx",
[CONST_PTR_TO_MAP] = "map_ptr",
[PTR_TO_MAP_VALUE] = "map_value",
[PTR_TO_MAP_VALUE_OR_NULL] = "map_value_or_null",
[FRAME_PTR] = "fp",
[PTR_TO_STACK] = "fp",
[CONST_IMM] = "imm",
};
so "inv" represents an unknown value and "fp" is a ptr to stack.

On Sat, Sep 12, 2015 at 2:23 PM, Brendan Gregg notifications@github.com
wrote:

I can't figure out some errors like the following:

bpf: Permission denied
0: (79) r6 = *(u64 *)(r1 +112)
1: (85) call 14
2: (63) *(u32 *)(r10 -12) = r0
3: (15) if r6 == 0x0 goto pc+55
R0=inv R6=inv R10=fp
4: (b7) r7 = 0
5: (7b) *(u64 *)(r10 -24) = r7
6: (7b) *(u64 *)(r10 -32) = r7
7: (07) r6 += 16
8: (bf) r1 = r10
9: (07) r1 += -32
10: (b7) r2 = 16
11: (bf) r3 = r6
12: (85) call 4
13: (79) r1 = *(u64 *)(r10 -24)
14: (15) if r1 == 0x0 goto pc+44
R0=inv R1=inv R6=inv R7=imm0 R10=fp
15: (7b) *(u64 *)(r10 -40) = r7
16: (7b) *(u64 *)(r10 -48) = r7
17: (bf) r1 = r10
18: (07) r1 += -48
19: (b7) r2 = 16
20: (bf) r3 = r6
21: (85) call 4
22: (79) r1 = *(u64 *)(r10 -48)
23: (15) if r1 == 0x0 goto pc+35
R0=inv R1=inv R6=inv R7=imm0 R10=fp
24: (b7) r7 = 0
25: (7b) *(u64 *)(r10 -56) = r7
26: (7b) *(u64 *)(r10 -64) = r7
27: (7b) *(u64 *)(r10 -72) = r7
28: (bf) r1 = r10
29: (07) r1 += -72
30: (b7) r2 = 16
31: (bf) r3 = r6
32: (85) call 4
33: (79) r6 = *(u64 *)(r10 -64)
34: (07) r6 += 56
35: (bf) r1 = r10
36: (07) r1 += -56
37: (b7) r2 = 8
38: (bf) r3 = r6
39: (85) call 4
40: (18) r1 = 0x93c8960
42: (bf) r2 = r10
43: (07) r2 += -12
44: (bf) r3 = r6
45: (b7) r4 = 0
46: (85) call 2
R3 type=inv expected=fp

Traceback (most recent call last):
File "./vfsslower", line 83, in
b.attach_kprobe(event="vfs_read", fn_name="trace_entry")
File "/usr/lib/python2.7/dist-packages/bcc/init.py", line 498, in attach_kprobe
fn = self.load_func(fn_name, BPF.KPROBE)
File "/usr/lib/python2.7/dist-packages/bcc/init.py", line 388, in load_func
raise Exception("Failed to load BPF program %s" % func_name)
Exception: Failed to load BPF program trace_entry

What actually is "type=inv expected=fp"?

Here's a test program, that in this state works:

#!/usr/bin/python## vfsslower Trace file system read/writes (via VFS) slower than a threshold.# For Linux, uses BCC, eBPF.## In development. Incomplete. DO NOT USE.## Copyright (c) 2015 Brendan Gregg.# Licensed under the Apache License, Version 2.0 (the "License")## 12-Sep-2015 Brendan Gregg Created this.
from future import print_functionfrom bcc import BPF

REQ_WRITE = 1 # from include/linux/blk_types.h

load BPF program

bpf_text = """#include <uapi/linux/ptrace.h>#include <linux/blkdev.h>#include <linux/mount.h>BPF_HASH(start, u32);BPF_HASH(filebypid, u32, char );int trace_entry(struct pt_regs *ctx, struct file *file){ u64 ts; u32 pid = bpf_get_current_pid_tgid(); if (file != NULL && file->f_path.dentry != NULL && file->f_path.mnt != NULL) {// XXX can't get this to work:// struct vfsmount *mnt = file->f_path.mnt;// if (!(mnt->mnt_flags & MNT_NODEV)) {// return 0;// } void *name = 0; struct dentry *dent = file->f_path.dentry; // get filename via d_iname bpf_probe_read(&name, sizeof(name), dent->d_iname);// XXX can't get this to work:// filebypid.update(&pid, (char *)&dent->d_iname);// debug only: bpf_trace_printk("read d_iname %s\n", &dent->d_iname); } ts = bpf_ktime_get_ns(); start.update(&pid, &ts); return 0;}int trace_return(struct pt_regs *ctx){ u64 *tsp, delta; u32 pid = bpf_get_current_pid_tgid(); pid = bpf_get_current_pid_tgid(); tsp = start.lookup(&pid); if (tsp != 0) { delta = bpf_ktime_get_ns() - *tsp; if (delta > MIN_LATENCY_NS) { char *file = (char *)filebypid.lookup(&pid); bpf_trace_printk("%d us %s\n", delta / 1000, file); if (file != NULL) { filebypid.delete(&pid); } } start.delete(&pid); } return 0;}"""
bpf_text = bpf_text.replace('MIN_LATENCY_NS', '100000')
b = BPF(text=bpf_text)
b.attach_kprobe(event="vfs_read", fn_name="trace_entry")
b.attach_kretprobe(event="vfs_read", fn_name="trace_return")

b.trace_print()

There's a couple of blocks, that when uncommented, break it.

It's pretty noisy to run, since it prints tty vfs_read()s from ssh (hence
trying to filter it based on MNT_NODEV, which I don't know yet if it
works). So I'd been testing it like this:

window1# ./vfsslower > out
window2# echo 1 > /proc/sys/vm/drop_caches; cksum /usr/bin/*
window1# grep cksum out


Reply to this email directly or view it on GitHub
#209.

@brendangregg

This comment has been minimized.

Show comment
Hide comment
@brendangregg

brendangregg Sep 13, 2015

Member

Thanks. So pid is used successfully as a key in the "start" hash, and well as in the https://github.com/iovisor/bcc/blob/master/examples/vfsreadlat.c example. I think the problem is with (char **)&dent->d_iname.

Member

brendangregg commented Sep 13, 2015

Thanks. So pid is used successfully as a key in the "start" hash, and well as in the https://github.com/iovisor/bcc/blob/master/examples/vfsreadlat.c example. I think the problem is with (char **)&dent->d_iname.

@4ast

This comment has been minimized.

Show comment
Hide comment
@4ast

4ast Sep 13, 2015

Member

yes. it's the value that is causing problems.
In the last few lines of the dump:

40: (18) r1 = 0x93c8960
42: (bf) r2 = r10
43: (07) r2 += -12
44: (bf) r3 = r6
45: (b7) r4 = 0
46: (85) call 2
R3 type=inv expected=fp

r1 - is a map pointer (1st argument to bpf_map_lookup call)
r2 - pointer to key (2nd argument)
r3 - pointer to value (3rd argument)

in this case r2 = r10 - 12 which is ok
but r3 is something unknown to verifier.

On Sat, Sep 12, 2015 at 7:58 PM, Brendan Gregg notifications@github.com
wrote:

Thanks. So pid is used successfully as a key in the "start" hash, and well
as in the https://github.com/iovisor/bcc/blob/master/examples/vfsreadlat.c
example. I think the problem is with (char **)&dent->d_iname.


Reply to this email directly or view it on GitHub
#209 (comment).

Member

4ast commented Sep 13, 2015

yes. it's the value that is causing problems.
In the last few lines of the dump:

40: (18) r1 = 0x93c8960
42: (bf) r2 = r10
43: (07) r2 += -12
44: (bf) r3 = r6
45: (b7) r4 = 0
46: (85) call 2
R3 type=inv expected=fp

r1 - is a map pointer (1st argument to bpf_map_lookup call)
r2 - pointer to key (2nd argument)
r3 - pointer to value (3rd argument)

in this case r2 = r10 - 12 which is ok
but r3 is something unknown to verifier.

On Sat, Sep 12, 2015 at 7:58 PM, Brendan Gregg notifications@github.com
wrote:

Thanks. So pid is used successfully as a key in the "start" hash, and well
as in the https://github.com/iovisor/bcc/blob/master/examples/vfsreadlat.c
example. I think the problem is with (char **)&dent->d_iname.


Reply to this email directly or view it on GitHub
#209 (comment).

@yonghong-song

This comment has been minimized.

Show comment
Hide comment
@yonghong-song

yonghong-song Sep 13, 2015

Collaborator

Thanks for explanation. Need to look at verifier code to understand more.☺

On Sat, Sep 12, 2015 at 8:08 PM, 4ast notifications@github.com wrote:

yes. it's the value that is causing problems.
In the last few lines of the dump:

40: (18) r1 = 0x93c8960
42: (bf) r2 = r10
43: (07) r2 += -12
44: (bf) r3 = r6
45: (b7) r4 = 0
46: (85) call 2
R3 type=inv expected=fp

r1 - is a map pointer (1st argument to bpf_map_lookup call)
r2 - pointer to key (2nd argument)
r3 - pointer to value (3rd argument)

in this case r2 = r10 - 12 which is ok
but r3 is something unknown to verifier.

On Sat, Sep 12, 2015 at 7:58 PM, Brendan Gregg notifications@github.com
wrote:

Thanks. So pid is used successfully as a key in the "start" hash, and
well
as in the
https://github.com/iovisor/bcc/blob/master/examples/vfsreadlat.c
example. I think the problem is with (char **)&dent->d_iname.


Reply to this email directly or view it on GitHub
#209 (comment).


Reply to this email directly or view it on GitHub
#209 (comment).

Collaborator

yonghong-song commented Sep 13, 2015

Thanks for explanation. Need to look at verifier code to understand more.☺

On Sat, Sep 12, 2015 at 8:08 PM, 4ast notifications@github.com wrote:

yes. it's the value that is causing problems.
In the last few lines of the dump:

40: (18) r1 = 0x93c8960
42: (bf) r2 = r10
43: (07) r2 += -12
44: (bf) r3 = r6
45: (b7) r4 = 0
46: (85) call 2
R3 type=inv expected=fp

r1 - is a map pointer (1st argument to bpf_map_lookup call)
r2 - pointer to key (2nd argument)
r3 - pointer to value (3rd argument)

in this case r2 = r10 - 12 which is ok
but r3 is something unknown to verifier.

On Sat, Sep 12, 2015 at 7:58 PM, Brendan Gregg notifications@github.com
wrote:

Thanks. So pid is used successfully as a key in the "start" hash, and
well
as in the
https://github.com/iovisor/bcc/blob/master/examples/vfsreadlat.c
example. I think the problem is with (char **)&dent->d_iname.


Reply to this email directly or view it on GitHub
#209 (comment).


Reply to this email directly or view it on GitHub
#209 (comment).

@yonghong-song

This comment has been minimized.

Show comment
Hide comment
@yonghong-song

yonghong-song Sep 13, 2015

Collaborator

Right. I made a mistake.

u32 pid = bpf_get_current_pid_tgid();

pid is on the stack, so &pid is perfect fine as the pointer-to-key used later.

if you do a map lookup, get a value returned as a pointer, and want to use

returned value as the key, then you may need to copy to the stack first.

On Sat, Sep 12, 2015 at 7:58 PM, Brendan Gregg notifications@github.com
wrote:

Thanks. So pid is used successfully as a key in the "start" hash, and well
as in the https://github.com/iovisor/bcc/blob/master/examples/vfsreadlat.c
example. I think the problem is with (char **)&dent->d_iname.


Reply to this email directly or view it on GitHub
#209 (comment).

Collaborator

yonghong-song commented Sep 13, 2015

Right. I made a mistake.

u32 pid = bpf_get_current_pid_tgid();

pid is on the stack, so &pid is perfect fine as the pointer-to-key used later.

if you do a map lookup, get a value returned as a pointer, and want to use

returned value as the key, then you may need to copy to the stack first.

On Sat, Sep 12, 2015 at 7:58 PM, Brendan Gregg notifications@github.com
wrote:

Thanks. So pid is used successfully as a key in the "start" hash, and well
as in the https://github.com/iovisor/bcc/blob/master/examples/vfsreadlat.c
example. I think the problem is with (char **)&dent->d_iname.


Reply to this email directly or view it on GitHub
#209 (comment).

@brendangregg

This comment has been minimized.

Show comment
Hide comment
@brendangregg

brendangregg Sep 13, 2015

Member

Thanks; well, given I just did a bpf_probe_read() of name, can't I use name?

                filebypid.update(&pid, (char **)&name);

When I try this, I get the first 8 chars of the string ok, and then garbage.

Member

brendangregg commented Sep 13, 2015

Thanks; well, given I just did a bpf_probe_read() of name, can't I use name?

                filebypid.update(&pid, (char **)&name);

When I try this, I get the first 8 chars of the string ok, and then garbage.

@brendangregg

This comment has been minimized.

Show comment
Hide comment
@brendangregg

brendangregg Sep 13, 2015

Member

Finally have it working:

#include <uapi/linux/ptrace.h>
#include <linux/blkdev.h>
#include <linux/mount.h>

BPF_HASH(start, u32);
BPF_HASH(filebypid, u32, struct dentry *);

int trace_entry(struct pt_regs *ctx, struct file *file)
{
    struct dentry *dent;
    u64 ts;
    u32 pid = bpf_get_current_pid_tgid();

    if (file != NULL && file->f_path.dentry != NULL &&
        file->f_path.mnt != NULL) {

// compiles (finally), but this test doesn't exclude what I wanted
//      int mnt_flags = 0;
//      bpf_probe_read(&mnt_flags, sizeof(mnt_flags), &file->f_path.mnt->mnt_flags);
//      if (mnt_flags & MNT_NODEV) {
//          return 0;
//      }

        // map pid to dentry pointer
        dent = file->f_path.dentry;
        filebypid.update(&pid, &dent);
        bpf_trace_printk("read flags %x\\n", mnt_flags);
    }

    ts = bpf_ktime_get_ns();
    start.update(&pid, &ts);
    return 0;
}

int trace_return(struct pt_regs *ctx)
{
    u64 *tsp, delta;
    u32 pid = bpf_get_current_pid_tgid();
    tsp = start.lookup(&pid);

    if (tsp != 0) {
        delta = bpf_ktime_get_ns() - *tsp;
        if (delta > MIN_LATENCY_NS) {
            struct dentry **dentpp, *dent = 0;
            dentpp = filebypid.lookup(&pid);
            bpf_probe_read(&dent, sizeof(dent), dentpp);
            bpf_trace_printk("slow %llx %d us %s\\n", dent, delta / 1000, dent->d_iname);
            if (dent != NULL) {
                filebypid.delete(&pid);
            }
        }
        start.delete(&pid);
    }

    return 0;
}

I'm passing over the dentry pointer, instead of the string.

This took about a dozen attempts (some I still don't understand why they failed). It should get a little easier now that I have two working examples to copy from, but this was really hard to figure out. Maybe an error message that said "you need to bpf_probe_read(X)".

Member

brendangregg commented Sep 13, 2015

Finally have it working:

#include <uapi/linux/ptrace.h>
#include <linux/blkdev.h>
#include <linux/mount.h>

BPF_HASH(start, u32);
BPF_HASH(filebypid, u32, struct dentry *);

int trace_entry(struct pt_regs *ctx, struct file *file)
{
    struct dentry *dent;
    u64 ts;
    u32 pid = bpf_get_current_pid_tgid();

    if (file != NULL && file->f_path.dentry != NULL &&
        file->f_path.mnt != NULL) {

// compiles (finally), but this test doesn't exclude what I wanted
//      int mnt_flags = 0;
//      bpf_probe_read(&mnt_flags, sizeof(mnt_flags), &file->f_path.mnt->mnt_flags);
//      if (mnt_flags & MNT_NODEV) {
//          return 0;
//      }

        // map pid to dentry pointer
        dent = file->f_path.dentry;
        filebypid.update(&pid, &dent);
        bpf_trace_printk("read flags %x\\n", mnt_flags);
    }

    ts = bpf_ktime_get_ns();
    start.update(&pid, &ts);
    return 0;
}

int trace_return(struct pt_regs *ctx)
{
    u64 *tsp, delta;
    u32 pid = bpf_get_current_pid_tgid();
    tsp = start.lookup(&pid);

    if (tsp != 0) {
        delta = bpf_ktime_get_ns() - *tsp;
        if (delta > MIN_LATENCY_NS) {
            struct dentry **dentpp, *dent = 0;
            dentpp = filebypid.lookup(&pid);
            bpf_probe_read(&dent, sizeof(dent), dentpp);
            bpf_trace_printk("slow %llx %d us %s\\n", dent, delta / 1000, dent->d_iname);
            if (dent != NULL) {
                filebypid.delete(&pid);
            }
        }
        start.delete(&pid);
    }

    return 0;
}

I'm passing over the dentry pointer, instead of the string.

This took about a dozen attempts (some I still don't understand why they failed). It should get a little easier now that I have two working examples to copy from, but this was really hard to figure out. Maybe an error message that said "you need to bpf_probe_read(X)".

@brendangregg

This comment has been minimized.

Show comment
Hide comment
@brendangregg

brendangregg Sep 13, 2015

Member

FWIW, I can't match on MNT_NODEV because FSes don't respect it:

# mount
/dev/xvda on / type ext4 (rw,noatime,barrier=0)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
none on /sys/fs/cgroup type tmpfs (rw)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
udev on /dev type devtmpfs (rw,mode=0755)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
tmpfs on /run/shm type tmpfs (rw,nosuid,nodev,size=250M)
none on /run/user type tmpfs (rw,noexec,nosuid,nodev,size=104857600,mode=0755)
none on /sys/fs/pstore type pstore (rw)
/dev/md0 on /mnt type xfs (rw,noatime)
systemd on /sys/fs/cgroup/systemd type cgroup (rw,noexec,nosuid,nodev,none,name=systemd)

Many things (I think) should have nodev, but don't. Eg, /sys/kernel/debug, from which we're doing VFS operations (reading trace_pipe).

Member

brendangregg commented Sep 13, 2015

FWIW, I can't match on MNT_NODEV because FSes don't respect it:

# mount
/dev/xvda on / type ext4 (rw,noatime,barrier=0)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
none on /sys/fs/cgroup type tmpfs (rw)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
udev on /dev type devtmpfs (rw,mode=0755)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
tmpfs on /run/shm type tmpfs (rw,nosuid,nodev,size=250M)
none on /run/user type tmpfs (rw,noexec,nosuid,nodev,size=104857600,mode=0755)
none on /sys/fs/pstore type pstore (rw)
/dev/md0 on /mnt type xfs (rw,noatime)
systemd on /sys/fs/cgroup/systemd type cgroup (rw,noexec,nosuid,nodev,none,name=systemd)

Many things (I think) should have nodev, but don't. Eg, /sys/kernel/debug, from which we're doing VFS operations (reading trace_pipe).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment