Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't get hello_world.py to work on Fedora 30 #2525

Closed
jgehrcke opened this issue Sep 23, 2019 · 10 comments

Comments

@jgehrcke
Copy link
Contributor

commented Sep 23, 2019

Hey. Environment: Fedora 30, SELinux disabled, kernel 5.2.15, see:

# uname -a
Linux jpx1carb 5.2.15-200.fc30.x86_64 #1 SMP Mon Sep 16 15:17:36 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
# sestatus
SELinux status:                 disabled
# cat /etc/*-release
Fedora release 30 (Thirty)
NAME=Fedora
VERSION="30 (Workstation Edition)"
...

I do run things as root:

# id
uid=0(root) gid=0(root) groups=0(root)

I have tried both, bcc built from source (head of master from today) as well as from the Fedora 30 package repository. The following output is from bcc installed via the latter:

# python3 /usr/share/bcc/examples/hello_world.py 
bpf: Failed to load program: Operation not permitted

Traceback (most recent call last):
  File "/usr/share/bcc/examples/hello_world.py", line 12, in <module>
    BPF(text='int kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); return 0; }').trace_print()
  File "/usr/lib/python3.7/site-packages/bcc/__init__.py", line 344, in __init__
    self._trace_autoload()
  File "/usr/lib/python3.7/site-packages/bcc/__init__.py", line 1090, in _trace_autoload
    fn = self.load_func(func_name, BPF.KPROBE)
  File "/usr/lib/python3.7/site-packages/bcc/__init__.py", line 380, in load_func
    raise Exception("Need super-user privileges to run")
Exception: Need super-user privileges to run

I used strace to see what exactly fails how. Here's the relevant output excerpt:

...
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7eff6c2ab000
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_KPROBE, insn_cnt=15, insns=0x7eff6c2ab7d8, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(5, 2, 15), prog_flags=0, prog_name="sys_clone", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0}, 112) = -1 EPERM (Operation not permitted)
prlimit64(0, RLIMIT_MEMLOCK, NULL, {rlim_cur=64*1024, rlim_max=64*1024}) = 0
prlimit64(0, RLIMIT_MEMLOCK, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}, NULL) = 0
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_KPROBE, insn_cnt=15, insns=0x7eff6c2ab7d8, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(5, 2, 15), prog_flags=0, prog_name="sys_clone", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0}, 112) = -1 EPERM (Operation not permitted)
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_KPROBE, insn_cnt=15, insns=0x7eff6c2ab7d8, license="GPL", log_level=1, log_size=65536, log_buf="", kern_version=KERNEL_VERSION(5, 2, 15), prog_flags=0, prog_name="sys_clone", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0}, 112) = -1 EPERM (Operation not permitted)
write(2, "bpf: Failed to load program: Ope"..., 53bpf: Failed to load program: Operation not permitted
...

This seems somewhat similar to #1532 which has no definite resolution.

Does the following configuration look OK to you?

# cat /boot/config-5.2.15-200.fc30.x86_64 | grep BPF
CONFIG_CGROUP_BPF=y
CONFIG_BPF=y
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_IPV6_SEG6_BPF=y
CONFIG_NETFILTER_XT_MATCH_BPF=m
# CONFIG_BPFILTER is not set
CONFIG_NET_CLS_BPF=m
CONFIG_NET_ACT_BPF=m
CONFIG_BPF_JIT=y
CONFIG_BPF_STREAM_PARSER=y
CONFIG_LWTUNNEL_BPF=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_BPF_LIRC_MODE2=y
CONFIG_BPF_EVENTS=y
# CONFIG_BPF_KPROBE_OVERRIDE is not set
# CONFIG_TEST_BPF is not set
# cat /boot/config-5.2.15-200.fc30.x86_64 | grep KPROBE
CONFIG_KPROBES=y
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_KPROBE_EVENTS=y
# CONFIG_KPROBE_EVENTS_ON_NOTRACE is not set
# CONFIG_BPF_KPROBE_OVERRIDE is not set
# CONFIG_KPROBES_SANITY_TEST is not set

I would very much appreciate some further debugging help from here.

@yonghong-song

This comment has been minimized.

Copy link
Collaborator

commented Sep 24, 2019

Your config seems okay. From the strace output, I did not see any issues. Maybe have to play it by myself to find out why...

@jgehrcke

This comment has been minimized.

Copy link
Contributor Author

commented Sep 24, 2019

Thanks for the feedback.

Maybe have to play it by myself to find out why...

That would be much appreciated. However, towards a quick turnaround, I would also be really happy to debug things a little further myself.

Curious: did anyone ever test the bcc examples with a 5.2.x kernel? Would be good to know for sure that it's just about my system specifics, and not so much about the cutting edge kernel version itself.

@jgehrcke

This comment has been minimized.

Copy link
Contributor Author

commented Sep 24, 2019

From the strace output, I did not see any issues

You mean except for the EPERM, right? :-) I mean, we probably agree that this EPERM is the symptom, and that we do not know where it comes from, right?

This snippet is the first bpf() line from my strace output above, manually formatted, to show the problem a little clearer:

bpf(
    BPF_PROG_LOAD,
    {
        prog_type=BPF_PROG_TYPE_KPROBE,
        insn_cnt=15,
        insns=0x7eff6c2ab7d8,
        license="GPL",
        log_level=0,
        log_size=0,
        log_buf=NULL,
        kern_version=KERNEL_VERSION(5, 2, 15),
        prog_flags=0,
        prog_name="sys_clone",
        prog_ifindex=0,
        expected_attach_type=BPF_CGROUP_INET_INGRESS,
        prog_btf_fd=0,
        func_info_rec_size=0,
        func_info=NULL,
        func_info_cnt=0,
        line_info_rec_size=0,
        line_info=NULL,
        line_info_cnt=0
    },
    112
) = -1 EPERM (Operation not permitted)
@yonghong-song

This comment has been minimized.

Copy link
Collaborator

commented Sep 24, 2019

I checked with the above output from strace and it looks okay. expected_attach_type=BPF_CGROUP_INET_INGRESS is actually expected_attach_type=0 which is not used for BPF_PROG_TYPE_KPROBE. bcc does work fine with our internal 5.2 kernels so there are probably some setup in Fedora 30 (mostly probably security related) preventing bpf syscall from running.

@yonghong-song

This comment has been minimized.

Copy link
Collaborator

commented Sep 24, 2019

Tried with manually build bcc, it seems working for me.

[yhs@localhost tools]$ sudo ./trace.py '__x64_sys_clone'
PID     TID     COMM            FUNC             
22475   22475   bash            __x64_sys_clone  
22475   22475   bash            __x64_sys_clone  
^C
[yhs@localhost tools]$ uname -a
Linux localhost.localdomain 5.2.16-200.fc30.x86_64 #1 SMP Thu Sep 19 16:14:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
sudo ./hello_world.py 
^CTraceback (most recent call last):
  File "./hello_world.py", line 12, in <module>
    BPF(text='int kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); return 0; }').trace_print()
  File "/usr/lib/python3.7/site-packages/bcc/__init__.py", line 1192, in trace_print
    line = self.trace_readline(nonblocking=False)
  File "/usr/lib/python3.7/site-packages/bcc/__init__.py", line 1172, in trace_readline
    line = trace.readline(1024).rstrip()
KeyboardInterrupt
[yhs@localhost examples]$
[yhs@localhost tools]$
@jgehrcke

This comment has been minimized.

Copy link
Contributor Author

commented Sep 26, 2019

I needed to lift the kernel lockdown to get this working. We can be very thankful to @deg00 for providing this hint over here: #1532 (comment)

I have documented the problem & solution here, in a quick blog post:

This PR adds a corresponding FAQ entry: #2532

Context: kernel lockdown was seemingly introduced in 4.17, and is activated on many (all?) newer Fedora installations.

I am closing this as resolved. Thank you @yonghong-song for the splendid help.

Open questions:

  • @yonghong-song why did you not need to lift kernel lockdown? Or did you, w/o knowing?
  • does it really make sense that a locked-down kernel does not allow bpf() being called?
@jgehrcke

This comment has been minimized.

Copy link
Contributor Author

commented Sep 26, 2019

In the discussion about the lockdown patch, Linus wrote:

I do not want my kernel to act differently depending on some really esoteric detail in how it was booted. That is fundamentally wrong.

Is that really so hard to understand?

Look at it this way: maybe lockdown breaks some application because that app does something odd. I get a report of that happening, and it so happens that the reporter is running the same distro I am, so I try it with his exact kernel configuration, and it works for me.

It is entirely non-obvious that the reporter happened to run a distro kernel that had secure boot enabled, and I obviously do not.

See what the problem is? Tying these things magically together IS A BAD IDEA.

Now, that's exactly what happened here!

@yonghong-song

This comment has been minimized.

Copy link
Collaborator

commented Sep 26, 2019

@yonghong-song why did you not need to lift kernel lockdown? Or did you, w/o knowing?
Looks like lockdown patch is not in mainline kernel.
I do not know how to detect whether a fc30 is in a lockdown mode or not.
I just downloaded a fc30 server from fedora website and I did not do anything special and it works for me. BTW, I am using vmware fusion to create VMs, not sure whether it makes a difference or not.

@yonghong-song

This comment has been minimized.

Copy link
Collaborator

commented Sep 26, 2019

I do not have 'lockdown' in my dmesg.
Some discussion in redhat mailing list https://bugzilla.redhat.com/show_bug.cgi?id=1599197

@jgehrcke

This comment has been minimized.

Copy link
Contributor Author

commented Sep 27, 2019

Hindsight...: my dmesg tells us everything we need to know, cool. It even told me "BPF is restricted; see man kernel_lockdown.7" :-)

[    0.000000] Kernel is locked down from EFI secure boot; see man kernel_lockdown.7
[...]
[    2.198433] Lockdown: systemd: BPF is restricted; see man kernel_lockdown.7
[...]
[58310.913828] Lifting lockdown

I do not have 'lockdown' in my dmesg.

Most likely you were not affected because in your VM setup you do not have secure boot enabled (not even sure if that's possible).

So really what happened here between the two of us is most likely this quite annoying scenario predicted by Linus:

I do not want my kernel to act differently depending on some really esoteric detail in how it was booted [...] It is entirely non-obvious that the reporter happened to run a distro kernel that had secure boot enabled, and I obviously do not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.