Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Falco with eBPF probe locks 4.19 kernel mainline when running lsiutil #896

Closed
a0145 opened this issue Oct 22, 2019 · 29 comments · Fixed by #1131
Closed

Falco with eBPF probe locks 4.19 kernel mainline when running lsiutil #896

a0145 opened this issue Oct 22, 2019 · 29 comments · Fixed by #1131
Assignees
Labels

Comments

@a0145
Copy link

a0145 commented Oct 22, 2019

What happened:
While running the stock config of falco on kernel 4.19.80 (longterm), falco operates normally, and alerts as expected. Then, after running the binary lsiutil, the system hangs. This behavior only happens after engaging falco + bpf probe.

Falco running properly:

[root@localhost ~]# /start.sh 
* Setting up /usr/src links from host
* Mounting debugfs
Found kernel config at /host/boot/config-4.19.80
* BPF probe located, it's now possible to start sysdig
2019-10-22T20:37:16+0000: Falco initialized with configuration file /etc/falco/falco.yaml
2019-10-22T20:37:16+0000: Loading rules from file /etc/falco/falco_rules.yaml:
2019-10-22T20:37:16+0000: Loading rules from file /etc/falco/falco_rules.local.yaml:
2019-10-22T20:37:16+0000: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
2019-10-22T20:37:18+0000: Starting internal webserver, listening on port 8765
2019-10-22T20:37:18.051015127+0000: Notice Privileged container started (user=root command=container:04a6af964761 blissful_mendeleev (id=04a6af964761) image=falcosecurity/falco:latest)

Invocation of lsituil after bpf instrumentation:

[root@localhost ~]# /halt.sh 
execve("/lsiutil", ["/lsiutil"], [/* 19 vars */]) = 0
brk(NULL)                               = 0x1f04000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f297db3e000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=27883, ...}) = 0
mmap(NULL, 27883, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f297db37000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20&\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2156160, ...}) = 0
mmap(NULL, 3985888, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f297d550000
mprotect(0x7f297d713000, 2097152, PROT_NONE) = 0
mmap(0x7f297d913000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c3000) = 0x7f297d913000
mmap(0x7f297d919000, 16864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f297d919000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f297db36000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f297db34000
arch_prctl(ARCH_SET_FS, 0x7f297db34740) = 0
mprotect(0x7f297d913000, 16384, PROT_READ) = 0
mprotect(0x668000, 4096, PROT_READ)     = 0
mprotect(0x7f297db3f000, 4096, PROT_READ) = 0
munmap(0x7f297db37000, 27883)           = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f297db3d000
write(1, "\n", 1
)                       = 1
write(1, "LSI Logic MPT Configuration Util"..., 64LSI Logic MPT Configuration Utility, Version 1.71, Sep 18, 2013
) = 64
open("/dev/mptctl", O_RDWR)             = -1 ENOENT (No such file or directory)
rt_sigaction(SIGINT, {SIG_IGN, [], SA_RESTORER, 0x7f297d5863b0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_IGN, [], SA_RESTORER, 0x7f297d5863b0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7ffc68f451a0) = 7781
wait4(7781, 
 kernel:watchdog: BUG: soft lockup - CPU#8 stuck for 23s! [lsiutil:7781]

What you expected to happen:
Falco to run with the bpf probe without soft-locking the system after a copy of lsiutil is launched. See output from above with kernel:watchdog: BUG

Clean output of lsitutil without ebpf sysdig falco running:

[root@localhost ~]# /halt.sh 
execve("/lsiutil", ["/lsiutil"], [/* 17 vars */]) = 0
brk(NULL)                               = 0x9a7000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f029df8a000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=27883, ...}) = 0
mmap(NULL, 27883, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f029df83000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20&\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2156160, ...}) = 0
mmap(NULL, 3985888, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f029d99c000
mprotect(0x7f029db5f000, 2097152, PROT_NONE) = 0
mmap(0x7f029dd5f000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c3000) = 0x7f029dd5f000
mmap(0x7f029dd65000, 16864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f029dd65000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f029df82000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f029df80000
arch_prctl(ARCH_SET_FS, 0x7f029df80740) = 0
mprotect(0x7f029dd5f000, 16384, PROT_READ) = 0
mprotect(0x668000, 4096, PROT_READ)     = 0
mprotect(0x7f029df8b000, 4096, PROT_READ) = 0
munmap(0x7f029df83000, 27883)           = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f029df89000
write(1, "\n", 1
)                       = 1
write(1, "LSI Logic MPT Configuration Util"..., 64LSI Logic MPT Configuration Utility, Version 1.71, Sep 18, 2013
) = 64
open("/dev/mptctl", O_RDWR)             = -1 ENOENT (No such file or directory)
rt_sigaction(SIGINT, {SIG_IGN, [], SA_RESTORER, 0x7f029d9d23b0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_IGN, [], SA_RESTORER, 0x7f029d9d23b0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7ffe0e695a90) = 8094
wait4(8094, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8094
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f029d9d23b0}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x7f029d9d23b0}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8094, si_uid=0, si_status=0, si_utime=0, si_stime=3} ---
open("/dev/mptctl", O_RDWR)             = 3
open("/dev/mpt2ctl", O_RDWR)            = -1 ENOENT (No such file or directory)
open("/dev/mpt3ctl", O_RDWR)            = -1 ENOENT (No such file or directory)
brk(NULL)                               = 0x9a7000
brk(0x9c8000)                           = 0x9c8000
brk(NULL)                               = 0x9c8000
open("/proc/mpt/ioc0", O_RDONLY)        = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc1", O_RDONLY)        = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc2", O_RDONLY)        = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc3", O_RDONLY)        = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc4", O_RDONLY)        = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc5", O_RDONLY)        = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc6", O_RDONLY)        = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc7", O_RDONLY)        = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc8", O_RDONLY)        = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc9", O_RDONLY)        = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc10", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc11", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc12", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc13", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc14", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc15", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc16", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc17", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc18", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc19", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc20", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc21", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc22", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc23", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc24", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc25", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc26", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc27", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc28", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc29", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc30", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc31", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc32", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc33", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc34", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc35", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc36", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc37", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc38", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc39", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc40", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc41", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc42", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc43", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc44", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc45", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc46", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc47", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc48", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc49", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc50", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc51", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc52", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc53", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc54", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc55", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc56", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc57", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc58", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc59", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc60", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc61", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc62", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
open("/proc/mpt/ioc63", O_RDONLY)       = -1 ENOENT (No such file or directory)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0x15, 0x14), 0x7ffe0e695de0) = -1 ENODEV (No such device)
close(3)                                = 0
write(1, "\n0 MPT Ports found\n", 19
0 MPT Ports found
)   = 19
close(3)                                = -1 EBADF (Bad file descriptor)
exit_group(0)                           = ?
+++ exited with 0 +++

How to reproduce it (as minimally and precisely as possible):
Manual Steps:

  1. Install kernel 4.19.80 on centos7 from source, or obtain via a prebuilt package
  2. Boot into that kernel
  3. Run falco container with stock config through docker, with the bpf environment var
  4. See falco working successfully
  5. Obtain a lsiutil binary
  6. Run strace ./lsiutil, and see a hang. The system will begin soft-locking and panicking shortly after:

Example Vagrant File:

$mainlinekernel = <<SCRIPT
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
yum install -y https://www.elrepo.org/elrepo-release-7.0-4.el7.elrepo.noarch.rpm
yum update -y 
yum --enablerepo=elrepo-kernel -y install kernel-ml kernel-ml-devel kernel-ml-headers
SCRIPT

skernelver = "4.19.80"
$buildkernel = <<SCRIPT
yum install -y ncurses-devel make gcc bison wget perl flex bc openssl-devel elfutils-libelf-devel
cd /root
wget https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-#{skernelver}.tar.xz
tar xf linux-#{skernelver}.tar.xz
cd linux-#{skernelver}
make olddefconfig 
make -j16
make modules -j16
make modules_install
make install
cp .config /boot/config-#{skernelver}
SCRIPT

kernelscript = $buildkernel	

$samplescript = <<SCRIPT
#{kernelscript}
grub2-set-default 0
grub2-mkconfig -o /boot/grub2/grub.cfg
yum install -y yum-utils device-mapper-persistent-data tmux lvm2 strace wget
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum install -y docker-ce docker-ce-cli containerd.io
systemctl enable docker
systemctl start docker
echo -e '#!/bin/sh\nsetenforce 0\ndocker run --rm -i -t --net=host --privileged -e SYSDIG_BPF_PROBE="" -v /root:/root -v /var/run/docker.sock:/host/var/run/docker.sock -v /dev:/host/dev -v /proc:/host/proc:ro -v /boot:/host/boot:ro -v /lib/modules:/host/lib/modules:ro -v /usr:/host/usr:ro -v /etc:/host/etc falcosecurity/falco' > /start.sh
chmod +x /start.sh
curl https://raw.githubusercontent.com/mute55/LSIUtil/master/Binaries/LSIutil_1.71_binaries/Linux/lsiutil.x86_64 > /lsiutil
chmod +x /lsiutil
echo -e '#!/bin/sh\nstrace /lsiutil' > /halt.sh
chmod +x /halt.sh
reboot
SCRIPT

Vagrant.configure("2") do |config|
	config.vm.provision "shell", inline: $samplescript
	config.vm.provider "vmware_fusion" do |v|
 	  v.vmx["memsize"] = "8192"
	  v.vmx["numvcpus"] = "16"
	end
  config.vm.box = "centos/7"
end
  1. vagrant up && vagrant ssh
  2. sudo su -, /start.sh (in one terminal), /halt.sh (in another terminal)

Anything else we need to know?:

When using mainline kernel 5.3.7, using the bpf probe, falco and the sysdig probe function correctly.

The hang was tested in 4.19.61, 4.19.80.

Environment:

  • Falco version (use falco --version):
    Falco version: 0.17.1

  • System info

{
  "machine": "x86_64",
  "nodename": "localhost.localdomain",
  "release": "4.19.80",
  "sysname": "Linux",
  "version": "#1 SMP Tue Oct 22 20:07:32 UTC 2019"
}
  • Cloud provider or hardware configuration:
    centos7/vmware

  • OS (e.g: cat /etc/os-release):

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
  • Kernel (e.g. uname -a):
Linux localhost.localdomain 4.19.80 #1 SMP Tue Oct 22 20:07:32 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Not a Contribution

@a0145 a0145 added the kind/bug label Oct 22, 2019
@fntlnz
Copy link
Contributor

fntlnz commented Oct 24, 2019

Thanks for opening @a0145 - Me and @leodido will take a look a this to confirm.

/assign @leodido
/assign @fntlnz

@a0145
Copy link
Author

a0145 commented Oct 30, 2019

I tested this on the latest kernels today. It appears that 4.19 is the only branch that hangs. This was tested with falco version 0.18.0

stable 5.3.8: works
longterm 4.19.81: hangs
longterm 4.14.151: works

@a0145
Copy link
Author

a0145 commented Nov 9, 2019

To help with reproduction, I ported the Vagrantfile to use virtualbox and have more detailed steps:

# host commands
cd
mkdir reproduce896
cd reproduce896
vi Vagrantfile #copy contents from below
vagrant up
# wait for ~20 minutes for kernel to compile
vagrant ssh

#guest commands
sudo su -
# start a tmux session with two windows, or have 2 ssh sessions up 
/start.sh
# wait for falco to download the image, and to boot
/hang.sh
# see lsiutil fail at wait4, and eventually (~10s), the kernel will begin to softlock

Vagrantfile


skernelver = "4.19.82"
$buildkernel = <<SCRIPT
yum install -y ncurses-devel make gcc bison wget perl flex bc openssl-devel elfutils-libelf-devel
cd /root
wget https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-#{skernelver}.tar.xz
tar xf linux-#{skernelver}.tar.xz
cd linux-#{skernelver}
make olddefconfig 
make -j16
make modules -j16
make modules_install
make install
cp .config /boot/config-#{skernelver}
SCRIPT

kernelscript = $buildkernel	

$samplescript = <<SCRIPT
#{kernelscript}
grub2-set-default 0
grub2-mkconfig -o /boot/grub2/grub.cfg
yum install -y yum-utils device-mapper-persistent-data tmux lvm2 strace wget
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum install -y docker-ce docker-ce-cli containerd.io
systemctl enable docker
systemctl start docker
echo -e '#!/bin/sh\nsetenforce 0\ndocker run --rm -i -t --net=host --privileged -e SYSDIG_BPF_PROBE="" -v /root:/root -v /var/run/docker.sock:/host/var/run/docker.sock -v /dev:/host/dev -v /proc:/host/proc:ro -v /boot:/host/boot:ro -v /lib/modules:/host/lib/modules:ro -v /usr:/host/usr:ro -v /etc:/host/etc falcosecurity/falco' > /start.sh
chmod +x /start.sh
curl https://raw.githubusercontent.com/mute55/LSIUtil/master/Binaries/LSIutil_1.71_binaries/Linux/lsiutil.x86_64 > /lsiutil
chmod +x /lsiutil
echo -e '#!/bin/sh\nstrace /lsiutil' > /halt.sh
chmod +x /halt.sh
reboot
SCRIPT

Vagrant.configure("2") do |config|
	config.vm.provision "shell", inline: $samplescript
	config.vm.provider "virtualbox" do |v|
 	  v.memory = 8192
 	  v.cpus = 8
	end
  config.vm.box = "centos/7"
end

@fntlnz
Copy link
Contributor

fntlnz commented Nov 13, 2019

Thanks for the update @a0145 - I will try this

@fntlnz
Copy link
Contributor

fntlnz commented Dec 9, 2019

@a0145
Copy link
Author

a0145 commented Jan 24, 2020

I had some spare time to try this out on the latest 4.19 kernel (4.19.98), and it still hangs with the bpf sysdig probe. I don't think this is related to the kernel lockdown feature - the bpf program is able to load and function, however it hangs at the wait4 syscall

@fntlnz
Copy link
Contributor

fntlnz commented Mar 19, 2020

I'm looking at this again, here are my repro steps using the Vagrant file from @a0145 but using driverkit to build the BPF probe:

Prepare the driverkit file

cd /root
cat <<EOT >> vanilla.yaml
kernelrelease: 4.19.82
kernelversion: 1
target: vanilla
driverversion: be1ea2d9482d0e6e2cb14a0fd7e08cbecf517f94
output:
  probe: /build/falco-probe.o
EOT

Add the kernel config data to it

cat /boot/config-4.19.82 | base64 -w0 | awk '{print "kernelconfigdata: " $1;}' >> vanilla.yaml 

Build the probe

docker run -v /root/vanilla.yaml:/vanilla.yaml -v /build:/build -it -v /var/run/docker.sock:/var/run/docker.sock falcosecurity/driverkit:latest driverkit -c /vanilla.yaml docker

Start Falco

setenforce 0
docker run --rm -i -t -v /build:/build --net=host --privileged -e FALCO_BPF_PROBE="/build/falco-probe.o" -v /root:/root -v /var/run/docker.sock:/host/var/run/docker.sock -v /dev:/host/dev -v /proc:/host/proc:ro -v /boot:/host/boot:ro -v /lib/modules:/host/lib/modules:ro -v /usr:/host/usr:ro -v /etc:/host/etc falcosecurity/falco

@fntlnz
Copy link
Contributor

fntlnz commented Mar 19, 2020

@a0145 I created a small C program to do a raw wait4 syscall and with your vagrantfile (it uses 4.19.82) and I'm not able to reproduce the soft lockup.

#define _GNU_SOURCE

#include <stdio.h>
#include <sys/syscall.h>
#include <unistd.h>

int main() {
  int pid = fork();

  if (pid == 0) {
    for (;;) {
    }
  }
  if (pid != 0) {
    syscall(SYS_wait4, pid, 0, 0, NULL);
  }
}

Here is my output of /proc/interrupts on the VM

           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7         0:        112          0          0          0          0          0          0          0   IO-APIC   2-edge      timer
  1:          0          0          0          9          0          0          0          0   IO-APIC   1-edge      i8042
  8:          0          0          0          0          0          0          0          0   IO-APIC   8-edge      rtc0
  9:          0          0          0          0          0          0          0          0   IO-APIC   9-fasteoi   acpi
 12:          0          0        156          0          0          0          0          0   IO-APIC  12-edge      i8042
 14:          0          0          0          0          0       7200          0          0   IO-APIC  14-edge      ata_piix
 15:          0          0          0          0          0          0          0          0   IO-APIC  15-edge      ata_piix
 19:       9573          0          0          0          0          0          0          0   IO-APIC  19-fasteoi   eth0
 21:          0          0          0          0          0          0          0          0   IO-APIC  21-fasteoi   snd_intel8x0
NMI:          0          0          0          0          0          0          0          0   Non-maskable interrupts
LOC:      29106      27861      22234      36876      31543      22903      49918      41931   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:          0          0          0          0          0          0          0          0   Performance monitoring interrupts
IWI:          0          0          0          0          0          0          0          0   IRQ work interrupts
RTR:          0          0          0          0          0          0          0          0   APIC ICR read retries
RES:      14652      13528       7201       9139      20134      10441      10076      19014   Rescheduling interrupts
CAL:       7855       8344       6605       8128       7749       7743       8378       8222   Function call interrupts
TLB:          9         22         19         26         20         12         10         11   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
DFR:          0          0          0          0          0          0          0          0   Deferred Error APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:          5          5          5          5          5          5          5          5   Machine check polls
HYP:          0          0          0          0          0          0          0          0   Hypervisor callback interrupts
HRE:          0          0          0          0          0          0          0          0   Hyper-V reenlightenment interrupts
HVS:          0          0          0          0          0          0          0          0   Hyper-V stimer0 interrupts
ERR:          0
MIS:          0
PIN:          0          0          0          0          0          0          0          0   Posted-interrupt notification event
NPI:          0          0          0          0          0          0          0          0   Nested posted-interrupt event
PIW:          0          0          0          0          0          0          0          0   Posted-interrupt wakeup event

At this point I think we could have two roads here:

  1. There was a kernel bug in that specific version (I'm verifying)
  2. On your system there are problems with resources scheduling (either too much resources given to the VM or too few resources on the host)

@fntlnz
Copy link
Contributor

fntlnz commented Mar 19, 2020

Ok, suddenly it happens to me too after leaving it there for a while:

Look at CPU 5:

 cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       
  0:        112          0          0          0          0          0          0          0   IO-APIC   2-edge      timer
  1:          0          0          0          9          0          0          0          0   IO-APIC   1-edge      i8042
  8:          0          0          0          0          0          0          0          0   IO-APIC   8-edge      rtc0
  9:          0          0          0          0          0          0          0          0   IO-APIC   9-fasteoi   acpi
 12:          0          0        156          0          0          0          0          0   IO-APIC  12-edge      i8042
 14:          0          0          0          0          0      32675          0          0   IO-APIC  14-edge      ata_piix
 15:          0          0          0          0          0          0          0          0   IO-APIC  15-edge      ata_piix
 19:      33408          0          0          0          0          0          0          0   IO-APIC  19-fasteoi   eth0
 21:          0          0          0          0          0          0          0          0   IO-APIC  21-fasteoi   snd_intel8x0
NMI:          0          0          0          0          0          0          0          0   Non-maskable interrupts
LOC:    1222290    1221210    1213930    1233574    1227982    1271763     980255    1293899   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:          0          0          0          0          0          0          0          0   Performance monitoring interrupts
IWI:          3          6          5          8         14          7          4          8   IRQ work interrupts
RTR:          0          0          0          0          0          0          0          0   APIC ICR read retries
RES:      24349      29874      20369      22046      34824      17451     258141      37749   Rescheduling interrupts
CAL:       8526      15359       9672      10447      14072       7917       9213      13468   Function call interrupts
TLB:         17         49         20         26         21         12         16         11   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
DFR:          0          0          0          0          0          0          0          0   Deferred Error APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:          9          9          9          9          9          5          9          9   Machine check polls
HYP:          0          0          0          0          0          0          0          0   Hypervisor callback interrupts
HRE:          0          0          0          0          0          0          0          0   Hyper-V reenlightenment interrupts
HVS:          0          0          0          0          0          0          0          0   Hyper-V stimer0 interrupts
ERR:          0
MIS:          0
PIN:          0          0          0          0          0          0          0          0   Posted-interrupt notification event
NPI:          0          0          0          0          0          0          0          0   Nested posted-interrupt event
PIW:          0          0          0          0          0          0          0          0   Posted-interrupt wakeup event

@fntlnz
Copy link
Contributor

fntlnz commented Mar 19, 2020

While it was happening I was compiling the newer kernel version.

@fntlnz
Copy link
Contributor

fntlnz commented Mar 19, 2020

Kernel stack trace during the lock

image

@fntlnz
Copy link
Contributor

fntlnz commented Mar 19, 2020

This seems to be related to the events frequency. Higher frequency means higher probability for this to crash a kernel.

@fntlnz
Copy link
Contributor

fntlnz commented Mar 19, 2020

It seems to happen regardless the syscall, here I'm getting it with a read

read(5, 
Message from syslogd@localhost at Mar 19 12:20:14 ...
 kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [conf:9755]

@a0145
Copy link
Author

a0145 commented Mar 19, 2020

Hey @fntlnz - thanks so much for looking into this. I was able to reproduce in multiple 4.19 LTS patch releases, and on VMs and bare metal with the same results, and my digging into the issue led me to believe it’s a problem with the BPF facility in 4.19.

In 5.4 LTS I’m unable to reproduce the lockup however so that led me to suspect the JIT in 4.19 due to the changes between 5.4 and 4.19.

@fntlnz
Copy link
Contributor

fntlnz commented Mar 19, 2020

I will try to debug more to discover where exactly this is happening but everything 100% points to a kernel bug. Thanks for making us aware @a0145

@fntlnz
Copy link
Contributor

fntlnz commented Mar 19, 2020

Here's a full stack trace I've been able to extract with @gnosek and @nathan-b

  284.542668] RSP: 0018:ffffa7f301887c70 EFLAGS: 00000297 ORIG_RAX: ffffffffffffff13
[  284.543392] RAX: fffffffffffffff2 RBX: ffffa7f301887ea8 RCX: 0000000000000000
[  284.543989] RDX: 0000000000000000 RSI: fffffffffffffff2 RDI: fffffff200000000
[  284.544755] RBP: ffffa7f301887e98 R08: ffffa7f301ba90ed R09: ffffa7f301ba90ec
[  284.545367] R10: 0000000000000000 R11: 00007ffffffff000 R12: 000000000000003b
[  284.545966] R13: ffff93da55844170 R14: ffffa7f301ba90d0 R15: ffffa7f301964410
[  284.546585] FS:  00007f9cb596d740(0000) GS:ffff93da57a80000(0000) knlGS:0000000000000000
[  284.547345] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  284.547880] CR2: 00007f9cb5510cc9 CR3: 000000018a626006 CR4: 00000000000606e0
[  284.548547] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  284.549399] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  284.550007] Call Trace:
[  284.550413]  ? __alloc_pages_nodemask+0x126/0x2d0
[  284.550959]  ? mem_cgroup_commit_charge+0x80/0x130
[  284.551552]  ? page_add_new_anon_rmap+0x73/0xc0
[  284.552282]  ? do_anonymous_page+0x406/0x670
[  284.552774]  ? __handle_mm_fault+0xc9c/0xef0
[  284.553492]  ? bpf_trace_run2+0x2d/0x50
[  284.553971]  ? syscall_trace_enter+0x298/0x2c0
[  284.554569]  ? do_syscall_64+0x139/0x1b0
[  284.555030]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  290.397054] rcu: INFO: rcu_sched self-detected stall on CPU
[  290.397604] rcu:     2-....: (57073 ticks this GP) idle=47e/1/0x4000000000000002 softirq=17110/17110 fqs=13527 
[  290.398523] rcu:      (t=60001 jiffies g=32281 q=431)
[  290.398991] NMI backtrace for cpu 2
[  290.399399] CPU: 2 PID: 5450 Comm: conf Kdump: loaded Tainted: G             L    4.19.82 #1
[  290.400249] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  290.401387] Call Trace:
[  290.401778]  <IRQ>
[  290.402177]  dump_stack+0x5a/0x73
[  290.402664]  nmi_cpu_backtrace+0x94/0xa0
[  290.403165]  ? lapic_can_unplug_cpu+0xa0/0xa0
[  290.403707]  nmi_trigger_cpumask_backtrace+0xd5/0x110
[  290.404340]  rcu_dump_cpu_stacks+0x8c/0xbc
[  290.404907]  rcu_check_callbacks+0x66e/0x7e0
[  290.405506]  ? tick_sched_do_timer+0x80/0x80
[  290.406105]  update_process_times+0x28/0x50
[  290.406792]  tick_sched_handle+0x25/0x60
[  290.407339]  tick_sched_timer+0x37/0x70
[  290.407747]  __hrtimer_run_queues+0xfb/0x270
[  290.408181]  hrtimer_interrupt+0x122/0x270
[  290.408595]  smp_apic_timer_interrupt+0x6a/0x140
[  290.409041]  apic_timer_interrupt+0xf/0x20
[  290.409528]  </IRQ>
[  290.409878] RIP: 0010:bpf_prog_1d34ec13f1920fc4+0x768/0x1000
[  290.410397] Code: 81 e6 ff ff 01 00 4c 89 f7 48 01 f7 be e8 fd 00 00 e8 dc c9 e0 c3 48 89 c7 48 c1 e7 20 48 89 fe 48 c1 fe 20 48 83 fe ff 7f 02 <eb> fe 48 c1 ef 20 b9 ff ff ff ff 48 81 ff e8 fd 00 00 0f 87 5b 03
[  290.413641] RSP: 0018:ffffa7f301887c70 EFLAGS: 00000297 ORIG_RAX: ffffffffffffff13
[  290.416410] RAX: fffffffffffffff2 RBX: ffffa7f301887ea8 RCX: 0000000000000000
[  290.417045] RDX: 0000000000000000 RSI: fffffffffffffff2 RDI: fffffff200000000
[  290.417710] RBP: ffffa7f301887e98 R08: ffffa7f301ba90ed R09: ffffa7f301ba90ec
[  290.418492] R10: 0000000000000000 R11: 00007ffffffff000 R12: 000000000000003b
[  290.419152] R13: ffff93da55844170 R14: ffffa7f301ba90d0 R15: ffffa7f301964410
[  290.419839]  ? __alloc_pages_nodemask+0x126/0x2d0
[  290.420434]  ? mem_cgroup_commit_charge+0x80/0x130
[  290.420925]  ? page_add_new_anon_rmap+0x73/0xc0
[  290.421400]  ? do_anonymous_page+0x406/0x670
[  290.421867]  ? __handle_mm_fault+0xc9c/0xef0
[  290.422333]  ? bpf_trace_run2+0x2d/0x50
[  290.422758]  ? syscall_trace_enter+0x298/0x2c0
[  290.423274]  ? do_syscall_64+0x139/0x1b0
[  290.423766]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  316.533927] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [conf:5450]
[  316.535299] Modules linked in: xt_conntrack ipt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc overlay crct10dif_pclmul crc32_pclmul snd_intel8x0 ghash_clmulni_intel snd_ac97_codec pcbc ac97_bus snd_pcm aesni_intel sunrpc snd_timer crypto_simd snd cryptd glue_helper e1000 soundcore video sg pcspkr i2c_piix4 ip_tables xfs libcrc32c sd_mod ata_generic pata_acpi ata_piix libata serio_raw crc32c_intel
[  316.540983] CPU: 2 PID: 5450 Comm: conf Kdump: loaded Tainted: G             L    4.19.82 #1
[  316.542050] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  316.543073] RIP: 0010:bpf_prog_1d34ec13f1920fc4+0x768/0x1000
[  316.543760] Code: 81 e6 ff ff 01 00 4c 89 f7 48 01 f7 be e8 fd 00 00 e8 dc c9 e0 c3 48 89 c7 48 c1 e7 20 48 89 fe 48 c1 fe 20 48 83 fe ff 7f 02 <eb> fe 48 c1 ef 20 b9 ff ff ff ff 48 81 ff e8 fd 00 00 0f 87 5b 03
[  316.546078] RSP: 0018:ffffa7f301887c70 EFLAGS: 00000297 ORIG_RAX: ffffffffffffff13
[  316.547036] RAX: fffffffffffffff2 RBX: ffffa7f301887ea8 RCX: 0000000000000000
[  316.547864] RDX: 0000000000000000 RSI: fffffffffffffff2 RDI: fffffff200000000
[  316.548685] RBP: ffffa7f301887e98 R08: ffffa7f301ba90ed R09: ffffa7f301ba90ec
[  316.549861] R10: 0000000000000000 R11: 00007ffffffff000 R12: 000000000000003b
[  316.550700] R13: ffff93da55844170 R14: ffffa7f301ba90d0 R15: ffffa7f301964410
[  316.551526] FS:  00007f9cb596d740(0000) GS:ffff93da57a80000(0000) knlGS:0000000000000000
[  316.552544] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  316.553240] CR2: 00007f9cb5510cc9 CR3: 000000018a626006 CR4: 00000000000606e0
[  316.554135] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  316.555364] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  316.556176] Call Trace:
[  316.556599]  ? __alloc_pages_nodemask+0x126/0x2d0
[  316.557219]  ? mem_cgroup_commit_charge+0x80/0x130
[  316.557837]  ? page_add_new_anon_rmap+0x73/0xc0
[  316.558434]  ? do_anonymous_page+0x406/0x670
[  316.559002]  ? __handle_mm_fault+0xc9c/0xef0
[  316.560056]  ? bpf_trace_run2+0x2d/0x50
[  316.560599]  ? syscall_trace_enter+0x298/0x2c0
[  316.561184]  ? do_syscall_64+0x139/0x1b0
[  316.561720]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  344.533768] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [conf:5450]
[  344.534421] Modules linked in: xt_conntrack ipt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc overlay crct10dif_pclmul crc32_pclmul snd_intel8x0 ghash_clmulni_intel snd_ac97_codec pcbc ac97_bus snd_pcm aesni_intel sunrpc snd_timer crypto_simd snd cryptd glue_helper e1000 soundcore video sg pcspkr i2c_piix4 ip_tables xfs libcrc32c sd_mod ata_generic pata_acpi ata_piix libata serio_raw crc32c_intel
[  344.538745] CPU: 2 PID: 5450 Comm: conf Kdump: loaded Tainted: G             L    4.19.82 #1
[  344.539659] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  344.540807] RIP: 0010:bpf_prog_1d34ec13f1920fc4+0x768/0x1000
[  344.541387] Code: 81 e6 ff ff 01 00 4c 89 f7 48 01 f7 be e8 fd 00 00 e8 dc c9 e0 c3 48 89 c7 48 c1 e7 20 48 89 fe 48 c1 fe 20 48 83 fe ff 7f 02 <eb> fe 48 c1 ef 20 b9 ff ff ff ff 48 81 ff e8 fd 00 00 0f 87 5b 03
[  344.543055] RSP: 0018:ffffa7f301887c70 EFLAGS: 00000297 ORIG_RAX: ffffffffffffff13
[  344.543820] RAX: fffffffffffffff2 RBX: ffffa7f301887ea8 RCX: 0000000000000000
[  344.544479] RDX: 0000000000000000 RSI: fffffffffffffff2 RDI: fffffff200000000
[  344.545130] RBP: ffffa7f301887e98 R08: ffffa7f301ba90ed R09: ffffa7f301ba90ec
[  344.545751] R10: 0000000000000000 R11: 00007ffffffff000 R12: 000000000000003b
[  344.546369] R13: ffff93da55844170 R14: ffffa7f301ba90d0 R15: ffffa7f301964410
[  344.547027] FS:  00007f9cb596d740(0000) GS:ffff93da57a80000(0000) knlGS:0000000000000000
[  344.547911] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  344.548480] CR2: 00007f9cb5510cc9 CR3: 000000018a626006 CR4: 00000000000606e0
[  344.549113] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  344.549800] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  344.550604] Call Trace:
[  344.551033]  ? __alloc_pages_nodemask+0x126/0x2d0
[  344.551494]  ? mem_cgroup_commit_charge+0x80/0x130
[  344.552192]  ? page_add_new_anon_rmap+0x73/0xc0
[  344.553866]  ? do_anonymous_page+0x406/0x670
[  344.554468]  ? __handle_mm_fault+0xc9c/0xef0
[  344.555028]  ? bpf_trace_run2+0x2d/0x50
[  344.555706]  ? syscall_trace_enter+0x298/0x2c0
[  344.556698]  ? do_syscall_64+0x139/0x1b0
[  344.557278]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

@fntlnz
Copy link
Contributor

fntlnz commented Mar 20, 2020

Just did another experiment with @leodido and this also happen in kernels that are compiled without eBPF JIT support.

@fntlnz
Copy link
Contributor

fntlnz commented Mar 20, 2020

Stack trace without JIT

.727905] RAX: ffffffffffffffda RBX: 00007f560b605760 RCX: 00007f560b5149d6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     [0/27956]
[  615.728825] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000                                                                               
[  615.731250] RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffbeb0                                                                               
[  615.732127] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f560b605760                                                                               
[  615.733236] R13: 0000000000000117 R14: 00007f560b60e428 R15: 0000000000000000                                                                               
[  615.734156] INFO: task grub2-probe:10004 blocked for more than 120 seconds.                                                                                 
[  615.735161]       Tainted: G             L    4.19.82 #2                                                                                                    
[  615.735982] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.                                                                       
[  615.736987] grub2-probe     D    0 10004  10003 0x00000080                                                                                                  
[  615.737524] Call Trace:                                                                                                                                     
[  615.738145]  ? __schedule+0x2ab/0x880                                                                                                                       
[  615.738626]  schedule+0x32/0x80                                                                                                                             
[  615.739082]  schedule_timeout+0x1d8/0x300                                                                                                                   
[  615.739607]  ? find_get_pages_range_tag+0x165/0x2c0                                                                                                         
[  615.740299]  wait_for_completion+0x123/0x190                                                                                                                
[  615.740798]  ? wake_up_q+0x70/0x70                                                                                                                          
[  615.741236]  __flush_work.isra.31+0x125/0x1d0                                                                                                               
[  615.741797]  ? wake_up_worker+0x30/0x30                                                                                                                     
[  615.742512]  lru_add_drain_all+0x11c/0x160                                                                                                                  
[  615.743257]  invalidate_bdev+0x3b/0x50                                                                                                                      
[  615.743837]  blkdev_ioctl+0x75a/0x930                                                                                                                       
[  615.744448]  ? selinux_file_ioctl+0x137/0x1f0                                                                                                               
[  615.745567]  block_ioctl+0x3d/0x40                                                                                                                          
[  615.746126]  do_vfs_ioctl+0xa9/0x630                                                                                                                        
[  615.746699]  ksys_ioctl+0x60/0x90                                                                                                                           
[  615.749161]  __x64_sys_ioctl+0x16/0x20                                                                                                                      
[  615.749717]  do_syscall_64+0x5b/0x1b0                                                                                                                       
[  615.750484]  entry_SYSCALL_64_after_hwframe+0x44/0xa9                                                                                                       
[  615.751606] RIP: 0033:0x7fa2a45c42b7                                                                                                                        
[  615.752140] Code: Bad RIP value.                                                                                                                            
[  615.752633] RSP: 002b:00007fff26144f08 EFLAGS: 00000202 ORIG_RAX: 0000000000000010                                                                          
[  615.753640] RAX: ffffffffffffffda RBX: 6b73696474736f68 RCX: 00007fa2a45c42b7                                                                               
[  615.754327] RDX: 0000000000000000 RSI: 0000000000001261 RDI: 0000000000000004                                                                               
[  615.754977] RBP: 00007fff26144fc0 R08: 0000000000000000 R09: 0000000000000012                                                                               
[  615.755712] R10: 00007fff26144960 R11: 0000000000000202 R12: 0000000000402bd0                                                                               
[  615.756447] R13: 00007fff261653f0 R14: 0000000000000000 R15: 0000000000000000                                                                               
[  632.449763] watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [dracut-install:7899]                                                                         
[  632.452277] Modules linked in: dm_mod xt_conntrack ipt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc overlay crct10dif_pclmul sunrpc crc32_pclmul snd_intel8x0 ghash_clmulni_intel snd_ac97_codec pcbc ac97_bus snd_pcm snd_timer sg aesni_intel snd video i2c_piix4 crypto_simd soundcore pcspkr cryptd glue_helper ip_tables xfs libcrc32c sd_mod ata_generic pata_acpi ata_piix crc32c_intel e1000 libata serio_raw
[  632.455904] CPU: 7 PID: 7899 Comm: dracut-install Tainted: G             L    4.19.82 #2                                                                    
[  632.457886] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006                                                                   
[  632.458866] RIP: 0010:___bpf_prog_run+0x7fa/0xf20                                                                                                           
[  632.459296] Code: 20 0f 87 36 07 00 00 89 d2 41 83 c4 01 48 8d 84 d0 d0 00 00 00 48 8b 00 48 85 c0 0f 84 1c 07 00 00 48 8d 58 38 e9 18 f8 ff ff <48> 0f bf 43 02 48 8d 5c c3 08 e9 09 f8 ff ff 0f b6 43 01 48 89 c2                                                                                                         
[  632.460777] RSP: 0018:ffff96a741517c18 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13                                                                          
[  632.461630] RAX: ffffffff8bbad4fa RBX: ffff96a740d09b78 RCX: 0000000000000020                                                                               
[  632.462341] RDX: 0000000000000002 RSI: 0000000000000005 RDI: ffff96a74564bed4                                                                               
[  632.463052] RBP: ffff96a741517c38 R08: ffff96a74563c0ed R09: ffff96a74563c0ec                                                                               
[  632.466222] R10: 0000000000000000 R11: 00007ffffffff000 R12: 0000000000000001                                                                               
[  632.466223] R13: ffff88b652294528 R14: 0000000000000000 R15: 0000000000000000                                                                               
[  632.466225] FS:  00007f7f5956e740(0000) GS:ffff88b657bc0000(0000) knlGS:0000000000000000                                                                    
[  632.466226] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                                                               
[  632.466227] CR2: 00007f7f58f00cc9 CR3: 0000000169a50005 CR4: 00000000000606e0                                                                               
[  632.466231] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000                                                                               
[  632.466232] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400                                                                               
[  632.466233] Call Trace:                                                                                                                                     
[  632.466245]  ? __bpf_prog_run512+0x3e/0x60                                                                                                                  
[  632.471128]  ? flush_tlb_func_common.isra.7+0x144/0x230                                                                                                     
[  632.471599]  ? mem_cgroup_commit_charge+0x80/0x130                                                                                                          
[  632.472031]  ? page_add_new_anon_rmap+0x73/0xc0                                                                                                             
[  632.472538]  ? wp_page_copy+0x369/0x7d0                                                                                                                     
[  632.472946]  ? do_wp_page+0x8d/0x580                                                                                                                        
[  632.473321]  ? __handle_mm_fault+0x7b6/0xef0                                                                                                                
[  632.473771]  ? bpf_trace_run2+0x2d/0x50                                                                                                                     
[  632.474129]  ? syscall_trace_enter+0x298/0x2c0                                                                                                              
[  632.475257]  ? do_syscall_64+0x139/0x1b0                                                                                                                    
[  632.475640]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9                                                                                                     
[  647.552717] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:                                                                                             
[  647.553389] rcu:     7-....: (340066 ticks this GP) idle=f6e/1/0x4000000000000000 softirq=7708/7711 fqs=86643                                                                                                                                                                                                               
[  647.554448] rcu:     (detected by 5, t=600021 jiffies, g=20449, q=242741)                                                                                   
[  647.555124] Sending NMI from CPU 5 to CPUs 7:                                                                                                               
[  647.555699] NMI backtrace for cpu 7                                                                                                                         
[  647.555700] CPU: 7 PID: 7899 Comm: dracut-install Tainted: G             L    4.19.82 #2                                                                    
[  647.555701] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006                                                                   
[  647.555701] RIP: 0010:___bpf_prog_run+0x7ff/0xf20                                                                                                           
[  647.555702] Code: 00 00 89 d2 41 83 c4 01 48 8d 84 d0 d0 00 00 00 48 8b 00 48 85 c0 0f 84 1c 07 00 00 48 8d 58 38 e9 18 f8 ff ff 48 0f bf 43 02 <48> 8d 5c c3 08 e9 09 f8 ff ff 0f b6 43 01 48 89 c2 c0 e8 04 83 e2                                                                                                         
[  647.555702] RSP: 0018:ffff96a741517c18 EFLAGS: 00000286                                                                                                     
[  647.555703] RAX: ffffffffffffffff RBX: ffff96a740d09b78 RCX: 0000000000000020                                                                               
[  647.555703] RDX: 0000000000000002 RSI: 0000000000000005 RDI: ffff96a74564bed4                                                                               
[  647.555704] RBP: ffff96a741517c38 R08: ffff96a74563c0ed R09: ffff96a74563c0ec                                                                               
[  647.555704] R10: 0000000000000000 R11: 00007ffffffff000 R12: 0000000000000001                                                                               
[  647.555705] R13: ffff88b652294528 R14: 0000000000000000 R15: 0000000000000000                                                                               
[  647.555705] FS:  00007f7f5956e740(0000) GS:ffff88b657bc0000(0000) knlGS:0000000000000000                                                                    
[  647.555706] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                                                               
[  647.555706] CR2: 00007f7f58f00cc9 CR3: 0000000169a50005 CR4: 00000000000606e0                                                                               
[  647.555707] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000                                                                               
[  647.555707] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400                                                                               
[  647.555707] Call Trace:                                                                                                                                     
[  647.555708]  ? __bpf_prog_run512+0x3e/0x60                                                                                                                  
[  647.555708]  ? flush_tlb_func_common.isra.7+0x144/0x230                                                                                                     
[  647.555709]  ? mem_cgroup_commit_charge+0x80/0x130                                                                                                          
[  647.555709]  ? page_add_new_anon_rmap+0x73/0xc0                                                                                                             
[  647.555709]  ? wp_page_copy+0x369/0x7d0                                                                                                                     
[  647.555710]  ? do_wp_page+0x8d/0x580                                                                                                                        
[  647.555710]  ? __handle_mm_fault+0x7b6/0xef0                                                                                                                
[  647.555710]  ? bpf_trace_run2+0x2d/0x50                                                                                                                     
[  647.555711]  ? syscall_trace_enter+0x298/0x2c0                                                                                                              
[  647.555711]  ? do_syscall_64+0x139/0x1b0                                                                                                                    
[  647.555712]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9                                                                                                     
[  672.447322] watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [dracut-install:7899]                                                                         
[  672.448242] Modules linked in: dm_mod xt_conntrack ipt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc overlay crct10dif_pclmul sunrpc crc32_pclmul snd_intel8x0 ghash_clmulni_intel snd_ac97_codec pcbc ac97_bus snd_pcm snd_timer sg aesni_intel snd video i2c_piix4 crypto_simd soundcore pcspkr cryptd glue_helper ip_tables xfs libcrc32c sd_mod ata_generic pata_acpi ata_piix crc32c_intel e1000 libata serio_raw
[  672.453626] CPU: 7 PID: 7899 Comm: dracut-install Tainted: G             L    4.19.82 #2                                                                    
[  672.454397] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006                                                                   
[  672.455228] RIP: 0010:__x86_indirect_thunk_rax+0x10/0x20                                                                                                    
[  672.455887] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 e8 07 00 00 00 f3 90 0f ae e8 eb f9 48 89 04 24 <c3> 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 e8 07 00 00 00 f3                                                                                                         
[  672.458020] RSP: 0018:ffff96a741517c10 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13                                                                          
[  672.461792] RAX: ffffffff8bbad4fa RBX: ffff96a740d09b78 RCX: 0000000000000020                                                                               
[  672.462737] RDX: 0000000000000002 RSI: 0000000000000005 RDI: ffff96a74564bed4                                                                               
[  672.463705] RBP: ffff96a741517c38 R08: ffff96a74563c0ed R09: ffff96a74563c0ec                                                                               
[  672.464542] R10: 0000000000000000 R11: 00007ffffffff000 R12: 0000000000000001                                                                               
[  672.465262] R13: ffff88b652294528 R14: 0000000000000000 R15: 0000000000000000                                                                               
[  672.466878] FS:  00007f7f5956e740(0000) GS:ffff88b657bc0000(0000) knlGS:0000000000000000                                                                    
[  672.466879] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                                                               
[  672.466880] CR2: 00007f7f58f00cc9 CR3: 0000000169a50005 CR4: 00000000000606e0                                                                               
[  672.466883] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000                                                                               
[  672.466884] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400                                                                               
[  672.466886] Call Trace:                                                                                                                                     
[  672.466894]  ? ___bpf_prog_run+0x7fa/0xf20                                                                                                                  
[  672.466897]  ? __bpf_prog_run512+0x3e/0x60                                                                                                                  
[  672.466903]  ? flush_tlb_func_common.isra.7+0x144/0x230                                                                                                     
[  672.473918]  ? mem_cgroup_commit_charge+0x80/0x130                                                                                                          
[  672.476020]  ? page_add_new_anon_rmap+0x73/0xc0                                                                                                             
[  672.476501]  ? wp_page_copy+0x369/0x7d0                                                                                                                     
[  672.476931]  ? do_wp_page+0x8d/0x580                                                                                                                        
[  672.477301]  ? __handle_mm_fault+0x7b6/0xef0                                                                                                                
[  672.477730]  ? bpf_trace_run2+0x2d/0x50                                                                                                                     
[  672.478115]  ? syscall_trace_enter+0x298/0x2c0                                                                                                              
[  672.478538]  ? do_syscall_64+0x139/0x1b0                                                                                                                    
[  672.478958]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9       

@leodido
Copy link
Member

leodido commented Mar 20, 2020

Another try in this direction (to exclude that is JIT the cause) would be to enable JIT in debugging mode (/proc/sys/net/core/bpf_jit_enable equal to 2), so to have the opcodes dumped in the kernel log.

And using tools/bpf/bpf_jit_disasm.c (provided in the kernel source tree) to disassemble them.


Wait, it only seems to work for BPF not eBPF. Nevermind

@natalysheinin
Copy link
Contributor

natalysheinin commented Mar 26, 2020

I am also getting a soft lockup:
watchdog: BUG: soft lockup - CPU#16 stuck for 22s!

Running kernel 4.19.76+ on COS GKE. Is there a workaround/recommendation?

fntlnz added a commit that referenced this issue Apr 7, 2020
Bump version of the driver to (commit: cd3d10123eef161d9f4e237581c1056fca29c130) that fixes #896
Summary of the needed fix can be found at patch [0]

[0] https://patch-diff.githubusercontent.com/raw/draios/sysdig/pull/1612.patch

Co-Authored-By: Leonardo Di Donato <leodidonato@gmail.com>
Signed-off-by: Lorenzo Fontana <lo@linux.com>
fntlnz added a commit that referenced this issue Apr 8, 2020
Bump version of the driver to (commit: cd3d10123eef161d9f4e237581c1056fca29c130) that fixes #896
Summary of the needed fix can be found at patch [0]

[0] https://patch-diff.githubusercontent.com/raw/draios/sysdig/pull/1612.patch

Co-Authored-By: Leonardo Di Donato <leodidonato@gmail.com>
Signed-off-by: Lorenzo Fontana <lo@linux.com>
poiana pushed a commit that referenced this issue Apr 8, 2020
Bump version of the driver to (commit: cd3d10123eef161d9f4e237581c1056fca29c130) that fixes #896
Summary of the needed fix can be found at patch [0]

[0] https://patch-diff.githubusercontent.com/raw/draios/sysdig/pull/1612.patch

Co-Authored-By: Leonardo Di Donato <leodidonato@gmail.com>
Signed-off-by: Lorenzo Fontana <lo@linux.com>
@smijolovic
Copy link

smijolovic commented Apr 15, 2020

This bug also present in CentOS 8.1 (4.18.0-147.8.1).

This fix appears to work for that as well...running overnight on k8s 1.18.0 node to verify the driver is working properly.

@leodido
Copy link
Member

leodido commented Apr 15, 2020

@smijolovic thabks for reporting it.

Could you provide a log for that kernel version? I’m curious :)

@fntlnz
Copy link
Contributor

fntlnz commented Apr 15, 2020

Interesting! @smijolovic I didn’t expect this to be on a 4.18 - probably some backport on the kernel.

For The log @leodido is asking, please do a dmesg -w in a new terminal while Falco (The one without the fix) is running and keep it until it crashes because of this, it should produce a stack trace.

@smijolovic
Copy link

smijolovic commented Apr 15, 2020

Fix ran overnight and no issues so far. Seeing a few "Falco internal: syscall event drop. 1 system calls dropped in last second" and looking to tweak the syscall_event_drops rate. I also built a systemd service file for falco vs an init script for service control.

As requested:

[42328.030993] watchdog: BUG: soft lockup - CPU#31 stuck for 23s! [sh:6839]
[42328.031185] Modules linked in: binfmt_misc ipt_REJECT nf_reject_ipv4 vxlan ip6_udp_tunnel udp_tunnel nf_conntrack_netlink xt_addrtype xt_nat xt_statistic xt_conntrack ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs ip6t_MASQUERADE nft_chain_nat_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 veth nft_chain_route_ipv4 ipt_MASQUERADE xt_comment nft_counter xt_mark nft_compat nft_chain_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack nf_tables overlay nfnetlink vfat fat intel_rapl dell_smbios iTCO_wdt dell_wmi_descriptor wmi_bmof iTCO_vendor_support skx_edac nfit libnvdimm dcdbas x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif i2c_algo_bit irqbypass ttm drm_kms_helper syscopyarea sysfillrect crc32_pclmul sysimgblt intel_cstate fb_sys_fops intel_uncore mei_me sg intel_rapl_perf
[42328.031225] pcspkr drm joydev mei lpc_ich i2c_i801 wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter xfs libcrc32c sd_mod ahci libahci bnxt_en megaraid_sas libata dm_mirror dm_region_hash dm_log dm_mod crypto_user ansi_cprng cmac authenc ccm xts des3_ede_x86_64 des_generic ghash_clmulni_intel crct10dif_pclmul crc32c_intel sha3_generic sha512_mb sha512_ssse3 sha512_generic sha256_mb sha1_mb mcryptd br_netfilter bridge stp llc [last unloaded: tcrypt]
[42328.031251] Features: eBPF/rawtrace
[42328.031255] CPU: 31 PID: 6839 Comm: sh Not tainted 4.18.0-147.8.1.el8_1.x86_64 #1
[42328.031256] Hardware name: Dell Inc. PowerEdge T640/0H8GYJ, BIOS 2.5.4 01/14/2020
[42328.031262] RIP: 0010:bpf_prog_1d34ec13f1920fc4+0x84d/0x1000
[42328.031264] Code: 81 e6 ff ff 01 00 4c 89 f7 48 01 f7 be e8 fd 00 00 e8 b7 65 04 f6 48 89 c7 48 c1 e7 20 48 89 fe 48 c1 fe 20 48 83 fe ff 7f 02 fe 48 c1 ef 20 b9 ff ff ff ff 48 81 ff e8 fd 00 00 0f 87 5b 03
[42328.031265] RSP: 0018:ffffa47dca5ffc70 EFLAGS: 00000297 ORIG_RAX: ffffffffffffff13
[42328.031267] RAX: fffffffffffffff2 RBX: ffffa47dca5ffea8 RCX: 0000000000000000
[42328.031268] RDX: 000000000000fde8 RSI: fffffffffffffff2 RDI: fffffff200000000
[42328.031268] RBP: ffffa47dca5ffe70 R08: 000055b4b3df8400 R09: ffffa47de9f5912c
[42328.031269] R10: 00007ffffffff000 R11: 0000000000000000 R12: 0000000000000000
[42328.031270] R13: ffff90d44f681ac8 R14: ffffa47de9f59110 R15: ffffa47dcc46c450
[42328.031271] FS: 00007fb9d59a2580(0000) GS:ffff90d61fbc0000(0000) knlGS:0000000000000000
[42328.031272] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[42328.031273] CR2: 000055b4b3df844c CR3: 0000000f3a1a6001 CR4: 00000000007606e0
[42328.031274] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[42328.031275] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[42328.031276] PKRU: 55555554
[42328.031276] Call Trace:
[42328.031284] ? bpf_perf_event_output_raw_tp+0x121/0x170
[42328.031288] ? touch_atime+0x82/0xe0
[42328.031293] ? generic_file_buffered_read+0x87a/0xb10
[42328.031298] ? init_object+0x6b/0x80
[42328.031300] ? free_debug_processing+0x1bc/0x210
[42328.031303] ? __slab_free+0x1c7/0x340
[42328.031347] ? xfs_iunlock+0x9d/0x100 [xfs]
[42328.031350] ? bpf_trace_run2+0x2d/0x50
[42328.031353] ? syscall_trace_enter+0x29e/0x2c0
[42328.031356] ? _cond_resched+0x15/0x30
[42328.031358] ? do_syscall_64+0x13d/0x1b0
[42328.031361] ? entry_SYSCALL_64_after_hwframe+0x65/0xca
[42339.985902] watchdog: BUG: soft lockup - CPU#16 stuck for 23s! [migration/16:108]
[42339.986115] Modules linked in: binfmt_misc ipt_REJECT nf_reject_ipv4 vxlan ip6_udp_tunnel udp_tunnel nf_conntrack_netlink xt_addrtype xt_nat xt_statistic xt_conntrack ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs ip6t_MASQUERADE nft_chain_nat_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 veth nft_chain_route_ipv4 ipt_MASQUERADE xt_comment nft_counter xt_mark nft_compat nft_chain_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack nf_tables overlay nfnetlink vfat fat intel_rapl dell_smbios iTCO_wdt dell_wmi_descriptor wmi_bmof iTCO_vendor_support skx_edac nfit libnvdimm dcdbas x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif i2c_algo_bit irqbypass ttm drm_kms_helper syscopyarea sysfillrect crc32_pclmul sysimgblt intel_cstate fb_sys_fops intel_uncore mei_me sg intel_rapl_perf
[42339.986157] pcspkr drm joydev mei lpc_ich i2c_i801 wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter xfs libcrc32c sd_mod ahci libahci bnxt_en megaraid_sas libata dm_mirror dm_region_hash dm_log dm_mod crypto_user ansi_cprng cmac authenc ccm xts des3_ede_x86_64 des_generic ghash_clmulni_intel crct10dif_pclmul crc32c_intel sha3_generic sha512_mb sha512_ssse3 sha512_generic sha256_mb sha1_mb mcryptd br_netfilter bridge stp llc [last unloaded: tcrypt]
[42339.986185] Features: eBPF/rawtrace
[42339.986189] CPU: 16 PID: 108 Comm: migration/16 Tainted: G L --------- - - 4.18.0-147.8.1.el8_1.x86_64 #1
[42339.986190] Hardware name: Dell Inc. PowerEdge T640/0H8GYJ, BIOS 2.5.4 01/14/2020
[42339.986198] RIP: 0010:multi_cpu_stop+0x4b/0xf0
[42339.986200] Code: 44 00 00 48 89 04 24 48 8b 47 18 48 85 c0 0f 84 95 00 00 00 89 db 48 0f a3 18 41 0f 92 c7 4c 8d 65 24 45 31 f6 45 31 ed f3 90 <8b> 5d 20 44 39 eb 74 3f 83 fb 02 74 4c 83 fb 03 75 15 45 84 ff 74
[42339.986201] RSP: 0000:ffffa47dc6ab7e78 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[42339.986203] RAX: ffffffffb76194a0 RBX: 0000000000000001 RCX: dead000000000200
[42339.986204] RDX: ffff90ce1fe1d710 RSI: ffffa47dc94b3b30 RDI: ffffa47dc94b3bb0
[42339.986205] RBP: ffffa47dc94b3bb0 R08: 0000000000000001 R09: 0000000000000001
[42339.986206] R10: 0000000000000000 R11: 0000000000000010 R12: ffffa47dc94b3bd4
[42339.986207] R13: 0000000000000001 R14: 0000000000000000 R15: ffff90ce1fe1d700
[42339.986208] FS: 0000000000000000(0000) GS:ffff90ce1fe00000(0000) knlGS:0000000000000000
[42339.986209] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[42339.986210] CR2: 00007f22cefb9300 CR3: 00000008f0e0a004 CR4: 00000000007606e0
[42339.986211] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[42339.986212] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[42339.986213] PKRU: 55555554
[42339.986213] Call Trace:
[42339.986222] ? cpu_stopper_thread+0x100/0x100
[42339.986223] cpu_stopper_thread+0x47/0x100
[42339.986228] ? sort_range+0x20/0x20
[42339.986229] smpboot_thread_fn+0xc5/0x160
[42339.986233] kthread+0x112/0x130
[42339.986235] ? kthread_flush_work_fn+0x10/0x10
[42339.986240] ret_from_fork+0x35/0x40
[42356.028437] watchdog: BUG: soft lockup - CPU#31 stuck for 23s! [sh:6839]
[42356.028627] Modules linked in: binfmt_misc ipt_REJECT nf_reject_ipv4 vxlan ip6_udp_tunnel udp_tunnel nf_conntrack_netlink xt_addrtype xt_nat xt_statistic xt_conntrack ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs ip6t_MASQUERADE nft_chain_nat_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 veth nft_chain_route_ipv4 ipt_MASQUERADE xt_comment nft_counter xt_mark nft_compat nft_chain_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack nf_tables overlay nfnetlink vfat fat intel_rapl dell_smbios iTCO_wdt dell_wmi_descriptor wmi_bmof iTCO_vendor_support skx_edac nfit libnvdimm dcdbas x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif i2c_algo_bit irqbypass ttm drm_kms_helper syscopyarea sysfillrect crc32_pclmul sysimgblt intel_cstate fb_sys_fops intel_uncore mei_me sg intel_rapl_perf
[42356.028670] pcspkr drm joydev mei lpc_ich i2c_i801 wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter xfs libcrc32c sd_mod ahci libahci bnxt_en megaraid_sas libata dm_mirror dm_region_hash dm_log dm_mod crypto_user ansi_cprng cmac authenc ccm xts des3_ede_x86_64 des_generic ghash_clmulni_intel crct10dif_pclmul crc32c_intel sha3_generic sha512_mb sha512_ssse3 sha512_generic sha256_mb sha1_mb mcryptd br_netfilter bridge stp llc [last unloaded: tcrypt]
[42356.028697] Features: eBPF/rawtrace
[42356.028701] CPU: 31 PID: 6839 Comm: sh Tainted: G L --------- - - 4.18.0-147.8.1.el8_1.x86_64 #1
[42356.028702] Hardware name: Dell Inc. PowerEdge T640/0H8GYJ, BIOS 2.5.4 01/14/2020
[42356.028708] RIP: 0010:bpf_prog_1d34ec13f1920fc4+0x84d/0x1000
[42356.028711] Code: 81 e6 ff ff 01 00 4c 89 f7 48 01 f7 be e8 fd 00 00 e8 b7 65 04 f6 48 89 c7 48 c1 e7 20 48 89 fe 48 c1 fe 20 48 83 fe ff 7f 02 fe 48 c1 ef 20 b9 ff ff ff ff 48 81 ff e8 fd 00 00 0f 87 5b 03
[42356.028712] RSP: 0018:ffffa47dca5ffc70 EFLAGS: 00000297 ORIG_RAX: ffffffffffffff13
[42356.028713] RAX: fffffffffffffff2 RBX: ffffa47dca5ffea8 RCX: 0000000000000000
[42356.028714] RDX: 000000000000fde8 RSI: fffffffffffffff2 RDI: fffffff200000000
[42356.028715] RBP: ffffa47dca5ffe70 R08: 000055b4b3df8400 R09: ffffa47de9f5912c
[42356.028716] R10: 00007ffffffff000 R11: 0000000000000000 R12: 0000000000000000
[42356.028716] R13: ffff90d44f681ac8 R14: ffffa47de9f59110 R15: ffffa47dcc46c450
[42356.028718] FS: 00007fb9d59a2580(0000) GS:ffff90d61fbc0000(0000) knlGS:0000000000000000
[42356.028719] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[42356.028720] CR2: 000055b4b3df844c CR3: 0000000f3a1a6001 CR4: 00000000007606e0
[42356.028721] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[42356.028722] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[42356.028722] PKRU: 55555554
[42356.028723] Call Trace:
[42356.028731] ? bpf_perf_event_output_raw_tp+0x121/0x170
[42356.028734] ? touch_atime+0x82/0xe0
[42356.028738] ? generic_file_buffered_read+0x87a/0xb10
[42356.028742] ? init_object+0x6b/0x80
[42356.028745] ? free_debug_processing+0x1bc/0x210
[42356.028747] ? __slab_free+0x1c7/0x340
[42356.028791] ? xfs_iunlock+0x9d/0x100 [xfs]
[42356.028794] ? bpf_trace_run2+0x2d/0x50
[42356.028797] ? syscall_trace_enter+0x29e/0x2c0
[42356.028800] ? _cond_resched+0x15/0x30
[42356.028803] ? do_syscall_64+0x13d/0x1b0
[42356.028805] ? entry_SYSCALL_64_after_hwframe+0x65/0xca
[42362.725824] rcu: INFO: rcu_sched self-detected stall on CPU
[42362.725985] rcu: 31-....: (59999 ticks this GP) idle=736/1/0x4000000000000002 softirq=3302951/3302951 fqs=14046
[42362.726279] rcu: (t=60001 jiffies g=19183753 q=2019490)
[42362.726437] NMI backtrace for cpu 31
[42362.726440] CPU: 31 PID: 6839 Comm: sh Tainted: G L --------- - - 4.18.0-147.8.1.el8_1.x86_64 #1
[42362.726441] Hardware name: Dell Inc. PowerEdge T640/0H8GYJ, BIOS 2.5.4 01/14/2020
[42362.726441] Call Trace:
[42362.726444]
[42362.726454] dump_stack+0x5c/0x80
[42362.726458] nmi_cpu_backtrace.cold.6+0x13/0x58
[42362.726463] ? lapic_can_unplug_cpu.cold.25+0x3b/0x3b
[42362.726464] nmi_trigger_cpumask_backtrace+0xde/0xe0
[42362.726469] rcu_dump_cpu_stacks+0x9c/0xca
[42362.726471] rcu_check_callbacks.cold.70+0x16e/0x329
[42362.726475] ? tick_sched_do_timer+0x60/0x60
[42362.726478] update_process_times+0x28/0x60
[42362.726481] tick_sched_handle+0x22/0x60
[42362.726483] tick_sched_timer+0x37/0x70
[42362.726485] __hrtimer_run_queues+0x100/0x280
[42362.726488] hrtimer_interrupt+0x100/0x220
[42362.726492] smp_apic_timer_interrupt+0x6a/0x140
[42362.726494] apic_timer_interrupt+0xf/0x20
[42362.726495]
[42362.726499] RIP: 0010:bpf_prog_1d34ec13f1920fc4+0x84d/0x1000
[42362.726501] Code: 81 e6 ff ff 01 00 4c 89 f7 48 01 f7 be e8 fd 00 00 e8 b7 65 04 f6 48 89 c7 48 c1 e7 20 48 89 fe 48 c1 fe 20 48 83 fe ff 7f 02 fe 48 c1 ef 20 b9 ff ff ff ff 48 81 ff e8 fd 00 00 0f 87 5b 03
[42362.726502] RSP: 0018:ffffa47dca5ffc70 EFLAGS: 00000297 ORIG_RAX: ffffffffffffff13
[42362.726503] RAX: fffffffffffffff2 RBX: ffffa47dca5ffea8 RCX: 0000000000000000
[42362.726504] RDX: 000000000000fde8 RSI: fffffffffffffff2 RDI: fffffff200000000
[42362.726505] RBP: ffffa47dca5ffe70 R08: 000055b4b3df8400 R09: ffffa47de9f5912c
[42362.726506] R10: 00007ffffffff000 R11: 0000000000000000 R12: 0000000000000000
[42362.726506] R13: ffff90d44f681ac8 R14: ffffa47de9f59110 R15: ffffa47dcc46c450
[42362.726515] ? bpf_perf_event_output_raw_tp+0x121/0x170
[42362.726519] ? touch_atime+0x82/0xe0
[42362.726522] ? generic_file_buffered_read+0x87a/0xb10
[42362.726526] ? init_object+0x6b/0x80
[42362.726529] ? free_debug_processing+0x1bc/0x210
[42362.726531] ? __slab_free+0x1c7/0x340
[42362.726577] ? xfs_iunlock+0x9d/0x100 [xfs]
[42362.726580] ? bpf_trace_run2+0x2d/0x50
[42362.726583] ? syscall_trace_enter+0x29e/0x2c0
[42362.726584] ? _cond_resched+0x15/0x30
[42362.726587] ? do_syscall_64+0x13d/0x1b0
[42362.726589] ? entry_SYSCALL_64_after_hwframe+0x65/0xca

a bit further down the stack there's the bad RIP values

[42507.970565] watchdog: BUG: soft lockup - CPU#16 stuck for 22s! [migration/16:108]
[42507.970778] Modules linked in: binfmt_misc ipt_REJECT nf_reject_ipv4 vxlan ip6_udp_tunnel udp_tunnel nf_conntrack_netlink xt_addrtype xt_nat xt_statistic xt_conntrack ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs ip6t_MASQUERADE nft_chain_nat_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 veth nft_chain_route_ipv4 ipt_MASQUERADE xt_comment nft_counter xt_mark nft_compat nft_chain_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack nf_tables overlay nfnetlink vfat fat intel_rapl dell_smbios iTCO_wdt dell_wmi_descriptor wmi_bmof iTCO_vendor_support skx_edac nfit libnvdimm dcdbas x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif i2c_algo_bit irqbypass ttm drm_kms_helper syscopyarea sysfillrect crc32_pclmul sysimgblt intel_cstate fb_sys_fops intel_uncore mei_me sg intel_rapl_perf
[42507.970822] pcspkr drm joydev mei lpc_ich i2c_i801 wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter xfs libcrc32c sd_mod ahci libahci bnxt_en megaraid_sas libata dm_mirror dm_region_hash dm_log dm_mod crypto_user ansi_cprng cmac authenc ccm xts des3_ede_x86_64 des_generic ghash_clmulni_intel crct10dif_pclmul crc32c_intel sha3_generic sha512_mb sha512_ssse3 sha512_generic sha256_mb sha1_mb mcryptd br_netfilter bridge stp llc [last unloaded: tcrypt]
[42507.970849] Features: eBPF/rawtrace
[42507.970853] CPU: 16 PID: 108 Comm: migration/16 Tainted: G L --------- - - 4.18.0-147.8.1.el8_1.x86_64 #1
[42507.970855] Hardware name: Dell Inc. PowerEdge T640/0H8GYJ, BIOS 2.5.4 01/14/2020
[42507.970862] RIP: 0010:multi_cpu_stop+0x4b/0xf0
[42507.970865] Code: 44 00 00 48 89 04 24 48 8b 47 18 48 85 c0 0f 84 95 00 00 00 89 db 48 0f a3 18 41 0f 92 c7 4c 8d 65 24 45 31 f6 45 31 ed f3 90 <8b> 5d 20 44 39 eb 74 3f 83 fb 02 74 4c 83 fb 03 75 15 45 84 ff 74
[42507.970866] RSP: 0000:ffffa47dc6ab7e78 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[42507.970868] RAX: ffffffffb76194a0 RBX: 0000000000000001 RCX: dead000000000200
[42507.970869] RDX: ffff90ce1fe1d710 RSI: ffffa47dc94b3b30 RDI: ffffa47dc94b3bb0
[42507.970870] RBP: ffffa47dc94b3bb0 R08: 0000000000000001 R09: 0000000000000001
[42507.970871] R10: 0000000000000000 R11: 0000000000000010 R12: ffffa47dc94b3bd4
[42507.970871] R13: 0000000000000001 R14: 0000000000000000 R15: ffff90ce1fe1d700
[42507.970873] FS: 0000000000000000(0000) GS:ffff90ce1fe00000(0000) knlGS:0000000000000000
[42507.970874] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[42507.970875] CR2: 00007f22cefb9300 CR3: 00000008f0e0a004 CR4: 00000000007606e0
[42507.970876] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[42507.970877] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[42507.970878] PKRU: 55555554
[42507.970878] Call Trace:
[42507.970887] ? cpu_stopper_thread+0x100/0x100
[42507.970889] cpu_stopper_thread+0x47/0x100
[42507.970894] ? sort_range+0x20/0x20
[42507.970895] smpboot_thread_fn+0xc5/0x160
[42507.970899] kthread+0x112/0x130
[42507.970902] ? kthread_flush_work_fn+0x10/0x10
[42507.970907] ret_from_fork+0x35/0x40
[42513.287204] INFO: task node_exporter:2682 blocked for more than 120 seconds.
[42513.287415] Tainted: G L --------- - - 4.18.0-147.8.1.el8_1.x86_64 #1
[42513.287664] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[42513.287903] node_exporter D 0 2682 1 0x90000084
[42513.287909] Call Trace:
[42513.287931] ? __schedule+0x253/0x830
[42513.287937] ? __switch_to_asm+0x35/0x70
[42513.287942] schedule+0x28/0x70
[42513.287947] schedule_timeout+0x26d/0x390
[42513.287952] ? __schedule+0x25b/0x830
[42513.287958] ? check_preempt_curr+0x7a/0x90
[42513.287963] wait_for_completion+0x11f/0x190
[42513.287968] ? wake_up_q+0x70/0x70
[42513.287976] stop_two_cpus+0x1e4/0x230
[42513.287983] ? cpu_stopper_thread+0x100/0x100
[42513.287992] ? perf_output_sample+0x26d/0x830
[42513.287998] ? cpu_stopper_thread+0x100/0x100
[42513.288004] ? __migrate_swap_task.part.71+0x80/0x80
[42513.288012] migrate_swap+0x98/0x110
[42513.288024] task_numa_migrate+0x5b0/0x800
[42513.288031] ? __probe_kernel_read+0x54/0x80
[42513.288042] task_numa_fault+0x8b2/0xae0
[42513.288050] ? __sigqueue_alloc+0x45/0xa0
[42513.288059] do_numa_page+0x25a/0x280
[42513.288065] __handle_mm_fault+0x68e/0x6b0
[42513.288073] handle_mm_fault+0xda/0x200
[42513.288081] __do_page_fault+0x22b/0x4e0
[42513.288132] do_page_fault+0x32/0x110
[42513.288148] ? page_fault+0x8/0x30
[42513.288159] page_fault+0x1e/0x30
[42513.288172] RIP: 0033:0x42dbbc
[42513.288202] Code: Bad RIP value.
[42513.288212] RSP: 002b:000000c0005b53c0 EFLAGS: 00010297
[42513.288240] RAX: 0000000000023660 RBX: 000000c000800000 RCX: 0aaaeba27d37d8f0
[42513.288254] RDX: 0000203d08ba3682 RSI: 000000c0005b5468 RDI: 000000000000000b
[42513.288266] RBP: 000000c0005b5410 R08: 00007f22cee9e000 R09: 0000000000008000
[42513.288278] R10: 588b825b6a3dce1e R11: 588b825b6a3b97be R12: 0000000000000002
[42513.288289] R13: 0000000000f4fe80 R14: 0000000000000000 R15: 0000000000468190
[42513.288869] INFO: task exe:6858 blocked for more than 120 seconds.
[42513.289059] Tainted: G L --------- - - 4.18.0-147.8.1.el8_1.x86_64 #1
[42513.289337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[42513.289568] exe D 0 6858 6852 0x90000080
[42513.289578] Call Trace:
[42513.289590] ? __schedule+0x253/0x830
[42513.289601] ? select_idle_sibling+0x22/0x3d0
[42513.289608] schedule+0x28/0x70
[42513.289618] schedule_timeout+0x26d/0x390
[42513.289623] ? enqueue_entity+0xf6/0x630
[42513.289635] ? free_debug_processing+0x1bc/0x210
[42513.289644] ? enqueue_task_fair+0x7d/0x3e0
[42513.289651] wait_for_completion+0x11f/0x190
[42513.289659] ? wake_up_q+0x70/0x70
[42513.289668] __wait_rcu_gp+0x12c/0x160
[42513.289681] synchronize_rcu+0x76/0x80
[42513.289686] ? __call_rcu+0x290/0x290
[42513.289694] ? __bpf_trace_rcu_utilization+0x10/0x10
[42513.289704] namespace_unlock+0x67/0x80
[42513.289715] ksys_umount+0x234/0x470
[42513.289732] ? syscall_trace_enter+0x1d3/0x2c0
[42513.289740] ? bpf_trace_run2+0x2d/0x50
[42513.289753] __x64_sys_umount+0x12/0x20
[42513.289759] do_syscall_64+0x5b/0x1b0
[42513.289769] entry_SYSCALL_64_after_hwframe+0x65/0xca
[42513.289775] RIP: 0033:0x7f5512a2a16b
[42513.289785] Code: Bad RIP value.
[42513.289788] RSP: 002b:00007ffd24becde8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a6
[42513.289796] RAX: ffffffffffffffda RBX: 00007ffd24becdf0 RCX: 00007f5512a2a16b
[42513.289801] RDX: 0000000000280000 RSI: 0000000000000002 RDI: 00007ffd24becdf0
[42513.289803] RBP: 0000000000000005 R08: 0000558536fcc921 R09: 00007f5512aac300
[42513.289807] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000010
[42513.289814] R13: 0000000000000018 R14: 000055853950b280 R15: 000055853950b273
[42513.289828] INFO: task exe:6876 blocked for more than 120 seconds.
[42513.290011] Tainted: G L --------- - - 4.18.0-147.8.1.el8_1.x86_64 #1
[42513.290280] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[42513.290513] exe D 0 6876 6870 0x90000080
[42513.290517] Call Trace:
[42513.290523] ? __schedule+0x253/0x830
[42513.290529] ? select_idle_sibling+0x22/0x3d0
[42513.290536] schedule+0x28/0x70
[42513.290540] schedule_timeout+0x26d/0x390
[42513.290544] ? enqueue_entity+0xf6/0x630
[42513.290550] ? free_debug_processing+0x1bc/0x210
[42513.290553] ? enqueue_task_fair+0x7d/0x3e0
[42513.290558] wait_for_completion+0x11f/0x190
[42513.290562] ? wake_up_q+0x70/0x70
[42513.290566] __wait_rcu_gp+0x12c/0x160
[42513.290571] synchronize_rcu+0x76/0x80
[42513.290576] ? __call_rcu+0x290/0x290
[42513.290584] ? __bpf_trace_rcu_utilization+0x10/0x10
[42513.290595] namespace_unlock+0x67/0x80
[42513.290603] ksys_umount+0x234/0x470
[42513.290614] ? syscall_trace_enter+0x1d3/0x2c0
[42513.290622] ? bpf_trace_run2+0x2d/0x50
[42513.290636] __x64_sys_umount+0x12/0x20
[42513.290649] do_syscall_64+0x5b/0x1b0
[42513.290661] entry_SYSCALL_64_after_hwframe+0x65/0xca
[42513.290664] RIP: 0033:0x7f626f1d316b
[42513.290671] Code: Bad RIP value.
[42513.290673] RSP: 002b:00007fff7508f2d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a6
[42513.290676] RAX: ffffffffffffffda RBX: 00007fff7508f2e0 RCX: 00007f626f1d316b
[42513.290678] RDX: 0000000000280000 RSI: 0000000000000002 RDI: 00007fff7508f2e0
[42513.290680] RBP: 0000000000000005 R08: 000055fa4ebc1921 R09: 00007f626f255300
[42513.290682] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000010
[42513.290684] R13: 0000000000000018 R14: 000055fa50193280 R15: 000055fa50193273
[42513.290693] INFO: task exe:6911 blocked for more than 120 seconds.
[42513.290869] Tainted: G L --------- - - 4.18.0-147.8.1.el8_1.x86_64 #1
[42513.291132] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[42513.291354] exe D 0 6911 6905 0x90000080
[42513.291357] Call Trace:
[42513.291363] ? __schedule+0x253/0x830
[42513.291372] ? select_idle_sibling+0x22/0x3d0
[42513.291377] schedule+0x28/0x70
[42513.291384] schedule_timeout+0x26d/0x390
[42513.291387] ? enqueue_entity+0xf6/0x630
[42513.291395] ? free_debug_processing+0x1bc/0x210
[42513.291401] ? enqueue_task_fair+0x7d/0x3e0
[42513.291405] wait_for_completion+0x11f/0x190
[42513.291412] ? wake_up_q+0x70/0x70
[42513.291418] __wait_rcu_gp+0x12c/0x160
[42513.291425] synchronize_rcu+0x76/0x80
[42513.291435] ? __call_rcu+0x290/0x290
[42513.291442] ? __bpf_trace_rcu_utilization+0x10/0x10
[42513.291449] namespace_unlock+0x67/0x80
[42513.291457] ksys_umount+0x234/0x470
[42513.291469] ? syscall_trace_enter+0x1d3/0x2c0
[42513.291475] ? bpf_trace_run2+0x2d/0x50
[42513.291485] __x64_sys_umount+0x12/0x20
[42513.291491] do_syscall_64+0x5b/0x1b0
[42513.291499] entry_SYSCALL_64_after_hwframe+0x65/0xca
[42513.291501] RIP: 0033:0x7f45fa32216b
[42513.291511] Code: Bad RIP value.
[42513.291513] RSP: 002b:00007fff2721ba18 EFLAGS: 00000206 ORIG_RAX: 00000000000000a6
[42513.291516] RAX: ffffffffffffffda RBX: 00007fff2721ba20 RCX: 00007f45fa32216b
[42513.291523] RDX: 0000000000280000 RSI: 0000000000000002 RDI: 00007fff2721ba20
[42513.291526] RBP: 0000000000000005 R08: 0000560c5827d921 R09: 00007f45fa3a4300
[42513.291527] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000010
[42513.291529] R13: 0000000000000018 R14: 0000560c5a2af280 R15: 0000560c5a2af273
[42513.291541] INFO: task exe:6931 blocked for more than 120 seconds.
[42513.291718] Tainted: G L --------- - - 4.18.0-147.8.1.el8_1.x86_64 #1
[42513.291960] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[42513.292227] exe D 0 6931 6930 0x90000080
[42513.292253] Call Trace:
[42513.292270] ? __schedule+0x253/0x830
[42513.292282] schedule+0x28/0x70
[42513.292298] schedule_timeout+0x26d/0x390
[42513.292316] ? init_object+0x6b/0x80
[42513.292336] ? free_debug_processing+0x1bc/0x210
[42513.292355] wait_for_completion+0x11f/0x190
[42513.292372] ? wake_up_q+0x70/0x70
[42513.292389] __wait_rcu_gp+0x12c/0x160
[42513.292406] synchronize_rcu+0x76/0x80
[42513.292423] ? __call_rcu+0x290/0x290
[42513.292439] ? __bpf_trace_rcu_utilization+0x10/0x10
[42513.292455] namespace_unlock+0x67/0x80
[42513.292472] ksys_umount+0x234/0x470
[42513.292487] ? syscall_trace_enter+0x1d3/0x2c0
[42513.292499] ? bpf_trace_run2+0x2d/0x50
[42513.292505] __x64_sys_umount+0x12/0x20
[42513.292514] do_syscall_64+0x5b/0x1b0
[42513.292520] entry_SYSCALL_64_after_hwframe+0x65/0xca
[42513.292528] RIP: 0033:0x7f78bf5eb16b
[42513.292534] Code: Bad RIP value.
[42513.292536] RSP: 002b:00007fff7c2ed928 EFLAGS: 00000206 ORIG_RAX: 00000000000000a6
[42513.292540] RAX: ffffffffffffffda RBX: 00007fff7c2ed930 RCX: 00007f78bf5eb16b
[42513.292544] RDX: 0000000000280000 RSI: 0000000000000002 RDI: 00007fff7c2ed930
[42513.292545] RBP: 0000000000000005 R08: 000055754dd2d921 R09: 00007f78bf66d300
[42513.292551] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000010
[42513.292554] R13: 0000000000000018 R14: 000055754e88c280 R15: 000055754e88c273
[42513.292564] INFO: task exe:6956 blocked for more than 120 seconds.
[42513.292753] Tainted: G L --------- - - 4.18.0-147.8.1.el8_1.x86_64 #1
[42513.292997] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[42513.293247] exe D 0 6956 6949 0x90000080
[42513.293256] Call Trace:
[42513.293264] ? __schedule+0x253/0x830
[42513.293274] ? select_idle_sibling+0x22/0x3d0
[42513.293280] schedule+0x28/0x70
[42513.293288] schedule_timeout+0x26d/0x390
[42513.293294] ? enqueue_entity+0xf6/0x630
[42513.293302] ? free_debug_processing+0x1bc/0x210
[42513.293318] ? enqueue_task_fair+0x7d/0x3e0
[42513.293335] wait_for_completion+0x11f/0x190
[42513.293348] ? wake_up_q+0x70/0x70
[42513.293365] __wait_rcu_gp+0x12c/0x160
[42513.293375] synchronize_rcu+0x76/0x80
[42513.293381] ? __call_rcu+0x290/0x290
[42513.293390] ? __bpf_trace_rcu_utilization+0x10/0x10
[42513.293398] namespace_unlock+0x67/0x80
[42513.293408] ksys_umount+0x234/0x470
[42513.293416] ? syscall_trace_enter+0x1d3/0x2c0
[42513.293422] ? bpf_trace_run2+0x2d/0x50
[42513.293435] __x64_sys_umount+0x12/0x20
[42513.293443] do_syscall_64+0x5b/0x1b0
[42513.293452] entry_SYSCALL_64_after_hwframe+0x65/0xca
[42513.293456] RIP: 0033:0x7fba1e68f16b
[42513.293465] Code: Bad RIP value.
[42513.293480] RSP: 002b:00007ffcf9a18e58 EFLAGS: 00000202 ORIG_RAX: 00000000000000a6
[42513.293504] RAX: ffffffffffffffda RBX: 00007ffcf9a18e60 RCX: 00007fba1e68f16b
[42513.293519] RDX: 0000000000280000 RSI: 0000000000000002 RDI: 00007ffcf9a18e60
[42513.293531] RBP: 0000000000000005 R08: 000055f21f968921 R09: 00007fba1e711300
[42513.293536] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000010
[42513.293539] R13: 0000000000000018 R14: 000055f22187b280 R15: 000055f22187b273
[42513.293554] INFO: task exe:6974 blocked for more than 120 seconds.
[42513.293738] Tainted: G L --------- - - 4.18.0-147.8.1.el8_1.x86_64 #1
[42513.293981] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[42513.294248] exe D 0 6974 6972 0x90000080
[42513.294256] Call Trace:
[42513.294266] ? __schedule+0x253/0x830
[42513.294273] ? select_idle_sibling+0x22/0x3d0
[42513.294282] schedule+0x28/0x70
[42513.294301] schedule_timeout+0x26d/0x390
[42513.294313] ? enqueue_entity+0xf6/0x630
[42513.294319] ? free_debug_processing+0x1bc/0x210
[42513.294322] ? enqueue_task_fair+0x7d/0x3e0
[42513.294327] wait_for_completion+0x11f/0x190
[42513.294337] ? wake_up_q+0x70/0x70
[42513.294343] __wait_rcu_gp+0x12c/0x160
[42513.294350] synchronize_rcu+0x76/0x80
[42513.294369] ? __call_rcu+0x290/0x290
[42513.294386] ? __bpf_trace_rcu_utilization+0x10/0x10
[42513.294406] namespace_unlock+0x67/0x80
[42513.294422] ksys_umount+0x234/0x470
[42513.294428] ? syscall_trace_enter+0x1d3/0x2c0
[42513.294438] ? bpf_trace_run2+0x2d/0x50
[42513.294446] __x64_sys_umount+0x12/0x20
[42513.294455] do_syscall_64+0x5b/0x1b0
[42513.294463] entry_SYSCALL_64_after_hwframe+0x65/0xca
[42513.294473] RIP: 0033:0x7f51f799d16b
[42513.294493] Code: Bad RIP value.
[42513.294507] RSP: 002b:00007ffc151fce68 EFLAGS: 00000202 ORIG_RAX: 00000000000000a6
[42513.294521] RAX: ffffffffffffffda RBX: 00007ffc151fce70 RCX: 00007f51f799d16b
[42513.294523] RDX: 0000000000280000 RSI: 0000000000000002 RDI: 00007ffc151fce70
[42513.294526] RBP: 0000000000000005 R08: 000055fc529b1921 R09: 00007f51f7a1f300
[42513.294532] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000010
[42513.294535] R13: 0000000000000018 R14: 000055fc53363280 R15: 000055fc53363273
[42513.294548] INFO: task exe:7002 blocked for more than 120 seconds.
[42513.294729] Tainted: G L --------- - - 4.18.0-147.8.1.el8_1.x86_64 #1
[42513.294976] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[42513.295232] exe D 0 7002 7000 0x90000080
[42513.295242] Call Trace:
[42513.295270] ? __schedule+0x253/0x830
[42513.295284] ? select_idle_sibling+0x22/0x3d0
[42513.295294] schedule+0x28/0x70
[42513.295300] schedule_timeout+0x26d/0x390
[42513.295310] ? free_debug_processing+0x1bc/0x210
[42513.295316] ? enqueue_task_fair+0x7d/0x3e0
[42513.295323] wait_for_completion+0x11f/0x190
[42513.295334] ? wake_up_q+0x70/0x70
[42513.295339] __wait_rcu_gp+0x12c/0x160
[42513.295348] synchronize_rcu+0x76/0x80
[42513.295360] ? __call_rcu+0x290/0x290
[42513.295367] ? __bpf_trace_rcu_utilization+0x10/0x10
[42513.295377] namespace_unlock+0x67/0x80
[42513.295386] ksys_umount+0x234/0x470
[42513.295394] ? syscall_trace_enter+0x1d3/0x2c0
[42513.295402] ? bpf_trace_run2+0x2d/0x50
[42513.295411] __x64_sys_umount+0x12/0x20
[42513.295417] do_syscall_64+0x5b/0x1b0
[42513.295427] entry_SYSCALL_64_after_hwframe+0x65/0xca
[42513.295434] RIP: 0033:0x7f71b3cbd16b
[42513.295451] Code: Bad RIP value.
[42513.295463] RSP: 002b:00007ffe2ed52e28 EFLAGS: 00000206 ORIG_RAX: 00000000000000a6
[42513.295494] RAX: ffffffffffffffda RBX: 00007ffe2ed52e30 RCX: 00007f71b3cbd16b
[42513.295512] RDX: 0000000000280000 RSI: 0000000000000002 RDI: 00007ffe2ed52e30
[42513.295531] RBP: 0000000000000005 R08: 0000557b8c18f921 R09: 00007f71b3d3f300
[42513.295546] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000010
[42513.295559] R13: 0000000000000018 R14: 0000557b8d113280 R15: 0000557b8d113273
[42513.295571] INFO: task exe:7069 blocked for more than 120 seconds.
[42513.295758] Tainted: G L --------- - - 4.18.0-147.8.1.el8_1.x86_64 #1
[42513.296003] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[42513.296266] exe D 0 7069 7067 0x90000080
[42513.296274] Call Trace:
[42513.296281] ? __schedule+0x253/0x830
[42513.296288] ? select_idle_sibling+0x22/0x3d0
[42513.296299] schedule+0x28/0x70
[42513.296305] schedule_timeout+0x26d/0x390
[42513.296313] ? free_debug_processing+0x1bc/0x210
[42513.296328] ? enqueue_task_fair+0x7d/0x3e0
[42513.296339] wait_for_completion+0x11f/0x190
[42513.296350] ? wake_up_q+0x70/0x70
[42513.296355] __wait_rcu_gp+0x12c/0x160
[42513.296361] synchronize_rcu+0x76/0x80
[42513.296366] ? __call_rcu+0x290/0x290
[42513.296372] ? __bpf_trace_rcu_utilization+0x10/0x10
[42513.296379] namespace_unlock+0x67/0x80
[42513.296390] ksys_umount+0x234/0x470
[42513.296396] ? syscall_trace_enter+0x1d3/0x2c0
[42513.296413] ? bpf_trace_run2+0x2d/0x50
[42513.296431] __x64_sys_umount+0x12/0x20
[42513.296446] do_syscall_64+0x5b/0x1b0
[42513.296459] entry_SYSCALL_64_after_hwframe+0x65/0xca
[42513.296463] RIP: 0033:0x7f5ac4ec316b
[42513.296475] Code: Bad RIP value.
[42513.296479] RSP: 002b:00007fffecc4a2d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a6
[42513.296485] RAX: ffffffffffffffda RBX: 00007fffecc4a2e0 RCX: 00007f5ac4ec316b
[42513.296488] RDX: 0000000000280000 RSI: 0000000000000002 RDI: 00007fffecc4a2e0
[42513.296492] RBP: 0000000000000005 R08: 000055e263fca921 R09: 00007f5ac4f45300
[42513.296495] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000010
[42513.296508] R13: 0000000000000018 R14: 000055e26609a280 R15: 000055e26609a273

@smijolovic
Copy link

smijolovic commented Apr 15, 2020

This looks like a second issue identified in the 4.18.0-147 kernel. These backports can be tricky because they pull in major bug and security fixes from mainline branches...and can be a real pain to pinpoint the inclusions. While backports are great for uniformity...this is where they hurt.

This is the one you pinged me on yesterday:
#1129

@smijolovic
Copy link

No longer an issue on 0.22.1

@leodido
Copy link
Member

leodido commented Apr 24, 2020 via email

@opsnull
Copy link

opsnull commented Jul 13, 2023

@smijolovic Same problem here.

I am using libbpf library to write eBPF kernel program,which using bpf_probe_read_user_str() liking:

int ret = bpf_probe_read_user_str(&event->filename, TASK_FILENAME_LEN, (const char *)ctx->args[0]);
it sometimes return -EFAULT。When I run it on 4.19.91 kernel,after about 30 seconds,the kernel dump lots of soft lockup message,like this:

[ 8515.309704] watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [agentd:109734]
[ 8515.317298] Modules linked in: tcp_diag udp_diag inet_diag binfmt_misc intel_rapl_msr intel_rapl_common 。。。
[ 8515.374270] CPU: 24 PID: 109734 Comm: agentd Kdump: loaded Tainted: G           OE     4.19.91-013.x86_64 #1
[ 8515.393182] RIP: 0010:0xffffffffc00669c6
[ 8515.397115] Code: 5b c9 c3 48 8b 5d c8 48 8b 53 10 4c 89 ef 48 83 c7 40 be 00 02 00 00 e8 38 4a 15 c1 bf 00 00 00 80 48 21 f8 48 83 f8 00 74 12 <eb> fe 40 88 7d e8 48 89 e9 48 83 c1 e8 48 89 df eb a1 4c 8b 73 20
[ 8515.415911] RSP: 0018:ffffc90006523da0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
[ 8515.423495] RAX: 0000000080000000 RBX: ffffc90006523e48 RCX: 0000000000000000
[ 8515.430645] RDX: 0000000000000200 RSI: 0000000000000000 RDI: 0000000080000000
[ 8515.437794] RBP: ffffc90006523e00 R08: 0000000000898600 R09: ffffe8ffffc37f38
[ 8515.444944] R10: 00007ffffffff000 R11: 0000000000000008 R12: ffffc90006523e48
[ 8515.452094] R13: ffffe8ffffc37ef8 R14: 0001aca60001aca6 R15: 000000000003594c
[ 8515.459244] FS:  00007f7f650f4700(0000) GS:ffff889fffa00000(0000) knlGS:0000000000000000
[ 8515.467352] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8515.473105] CR2: 0000000000898684 CR3: 0000001feada0001 CR4: 00000000001606e0
[ 8515.480251] Call Trace:
[ 8515.482716]  ? do_fault+0x2aa/0x5d0
[ 8515.486217]  trace_call_bpf+0x67/0x90
[ 8515.489892]  perf_syscall_enter+0x174/0x260
[ 8515.494087]  syscall_trace_enter+0x268/0x2c0
[ 8515.498369]  do_syscall_64+0x180/0x200
[ 8515.502128]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 8515.507192] RIP: 0033:0x7f7f651bac37
[ 8515.510774] Code: ff ff 76 df 89 c6 f7 de 64 41 89 32 eb d5 89 c6 f7 de 64 41 89 32 eb db 66 2e 0f 1f 84 00 00 00 00 00 90 b8 3b 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 02 f3 c3 48 8b 15 08 12 30 00 f7 d8 64 89 02
[ 8515.529571] RSP: 002b:00007f7f650f3818 EFLAGS: 00000206 ORIG_RAX: 000000000000003b
[ 8515.537158] RAX: ffffffffffffffda RBX: 00007f7ee8003c80 RCX: 00007f7f651bac37
[ 8515.544306] RDX: 00000000023a54c0 RSI: 00007f7f650f3840 RDI: 0000000000898684
[ 8515.551456] RBP: 00007f7ee80040ce R08: 00007f7f650f36c0 R09: 00007f7f650f3610
[ 8515.558606] R10: 00007f7f650f32a0 R11: 0000000000000206 R12: 00007f7ee8001c88
[ 8515.565756] R13: 0000000000000006 R14: 00007f7ee8004090 R15: 0000000000000000
[ 8519.309217] watchdog: BUG: soft lockup - CPU#6 stuck for 23s! [agentd:109763]
[ 8519.309241] watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [agentd:109778]
[ 8519.309638] watchdog: BUG: soft lockup - CPU#26 stuck for 23s! [agent:109780]
[ 8519.309640] Modules linked in: tcp_diag udp_diag inet_diag binfmt_misc intel_rapl_msr 。。。。
[ 8519.309656] CPU: 26 PID: 109780 Comm: agent Kdump: loaded Tainted: G           OEL    4.19.91-01.x86_64 #1
[ 8519.309659] RIP: 0010:0xffffffffc00669c6
[ 8519.309661] Code: 5b c9 c3 48 8b 5d c8 48 8b 53 10 4c 89 ef 48 83 c7 40 be 00 02 00 00 e8 38 4a 15 c1 bf 00 00 00 80 48 21 f8 48 83 f8 00 74 12 <eb> fe 40 88 7d e8 48 89 e9 48 83 c1 e8 48 89 df eb a1 4c 8b 73 20
[ 8519.309662] RSP: 0018:ffffc9000fe9bda0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
[ 8519.309664] RAX: 0000000080000000 RBX: ffffc9000fe9be48 RCX: 0000000000000000
[ 8519.309665] RDX: 0000000000000200 RSI: 0000000000000000 RDI: 0000000080000000
[ 8519.309667] RBP: ffffc9000fe9be00 R08: 00007f530af96d00 R09: ffffe8ffffcb7f38
[ 8519.309668] R10: 00007ffffffff000 R11: 0000000000000008 R12: ffffc9000fe9be48
[ 8519.309670] R13: ffffe8ffffcb7ef8 R14: 0001acd40001acd4 R15: 00000000000359a8
[ 8519.309671] FS:  00007f529bfff700(0000) GS:ffff889fffa80000(0000) knlGS:0000000000000000
[ 8519.309673] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8519.309674] CR2: 00007f530af96dc9 CR3: 0000001f96dc0005 CR4: 00000000001606e0
[ 8519.309675] Call Trace:
[ 8519.309679]  trace_call_bpf+0x67/0x90
[ 8519.309682]  perf_syscall_enter+0x174/0x260
[ 8519.309685]  syscall_trace_enter+0x268/0x2c0
[ 8519.309687]  do_syscall_64+0x180/0x200
[ 8519.309690]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 8519.309692] RIP: 0033:0x7f530aed4c37
[ 8519.309694] Code: ff ff 76 df 89 c6 f7 de 64 41 89 32 eb d5 89 c6 f7 de 64 41 89 32 eb db 66 2e 0f 1f 84 00 00 00 00 00 90 b8 3b 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 02 f3 c3 48 8b 15 08 12 30 00 f7 d8 64 89 02
[ 8519.309695] RSP: 002b:00007f529bffb968 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
[ 8519.309697] RAX: ffffffffffffffda RBX: 0000000000000028 RCX: 00007f530aed4c37
[ 8519.309699] RDX: 00007fff2a752b08 RSI: 00007f529bffb9a0 RDI: 00007f530af96dc9
[ 8519.309700] RBP: 0000000000000004 R08: 00007f529bffb9b8 R09: 00007f530ae0e700
[ 8519.309701] R10: 00007f52f37fe7f8 R11: 0000000000000246 R12: 0000000000000003
[ 8519.309702] R13: 00007f530af96dc9 R14: 00007f529bffb9a0 R15: 00007f529bffb9a0

I don't understand how to resolve it using your path,could you please give some details?

@nathan-b
Copy link
Contributor

nathan-b commented Aug 8, 2023

Hi @opsnull! This issue is focused specifically on getting the open-source project Falco working with eBPF. If you are having problems with your own eBPF program then I suggest the Linux kernel's eBPF mailing list. It's possible that the pointers in this ticket will help you figure out if you are being hit by the same kernel backport issue, but we can't really help you debug your own eBPF program.

Best of luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants