When a device with procd-ujail installed has been running for a while (hit it today with 28d uptime), restarting dnsmasq results in dnsmasq no longer being started, there is only the ujail process. There are no errors displayed on stdout/stderr while restarting, nor in syslog.
Commenting out the lines in the init script starting with procd_add_jail and then restarting the service solves the problem. The problem also does not occur when dnsmasq is started during boot.
I've seen this problem before, mentioned it a few times on IRC, the first time was in October 2020, so before 21.02 was branched, so it's very likely this problem exists there as well.
I didn't reboot the system where I'm currently experiencing this, I've commented out the procd_add_jail lines instead. Uncommenting those lines brings back the problem, so further investigation is possible.
This seems to be a general problem with ujail, as even a simple echo refuses to start:
# ujail -d1 -n blah -r /tmp -- /bin/echo test
jail: adding mount /tmp /tmp bind(1) ro(1) err(0)
jail: Using namespaces(0x28020000), capabilities(0), seccomp(0)
jail: adding mount /bin/echo /bin/echo bind(1) ro(1) err(1)
jail: adding mount /lib/ld-musl-x86_64.so.1 /lib/ld-musl-x86_64.so.1 bind(1) ro(1) err(1)
jail: adding library /lib/libgcc_s.so.1 (libgcc_s.so.1)
jail: adding library /lib/libc.so (libc.so)
The process hangs here until killed with kill -9.
Running in strace, the process hangs on epoll_pwait.
Backtrace with gdbserver:
#0 epoll_pwait (fd=3, ev=ev@entry=0x7ffff7f802c0 , cnt=cnt@entry=10, to=1834444156, sigs=sigs@entry=0x0) at ./arch/x86_64/syscall_arch.h:61
#1 0x00007ffff7fa7ada in epoll_wait (fd=, ev=ev@entry=0x7ffff7f802c0 , cnt=cnt@entry=10, to=)
at src/linux/epoll.c:36
#2 0x00007ffff7f7805f in uloop_fetch_events (timeout=)
at /home/stijn/Development/OpenWrt/openwrt/build_dir/target-x86_64_musl/libubox-2021-08-19-d716ac4b/uloop-epoll.c:73
#3 uloop_run_events (timeout=)
at /home/stijn/Development/OpenWrt/openwrt/build_dir/target-x86_64_musl/libubox-2021-08-19-d716ac4b/uloop.c:170
#4 uloop_run_timeout (timeout=-1) at /home/stijn/Development/OpenWrt/openwrt/build_dir/target-x86_64_musl/libubox-2021-08-19-d716ac4b/uloop.c:555
#5 0x000055555555b915 in ?? ()
#6 0x00007ffff7f69bb8 in ?? ()
#7 0xffffffff01203ff2 in ?? ()
#8 0x00007ffff7ffd880 in ?? () from /home/stijn/Development/OpenWrt/openwrt/scripts/../staging_dir/target-x86_64_musl/root-x86/lib/ld-musl-x86_64.so.1
#9 0x00007ffff7fe30a0 in do_init_fini (queue=) at ldso/dynlink.c:1545
#10 0x00007ffff7ff80e0 in ?? () from /home/stijn/Development/OpenWrt/openwrt/scripts/../staging_dir/target-x86_64_musl/root-x86/lib/ld-musl-x86_64.so.1
#11 0x00007fffffffecb8 in ?? ()
#12 0x00007ffff7fa5d3c in libc_start_main_stage2 (main=0x6b77814f93720b1f, argc=1431674893, argv=0x55555555a00d) at src/env/__libc_start_main.c:94
#13 0x000055555555ba5b in ?? ()
#14 0x0000000000000008 in ?? ()
#15 0x00007fffffffeeca in ?? ()
#16 0x00007fffffffeed6 in ?? ()
#17 0x00007fffffffeed9 in ?? ()
#18 0x00007fffffffeede in ?? ()
#19 0x00007fffffffeee1 in ?? ()
#20 0x00007fffffffeee6 in ?? ()
#21 0x00007fffffffeee9 in ?? ()
#22 0x00007fffffffeef3 in ?? ()
#23 0x0000000000000000 in ?? ()
The text was updated successfully, but these errors were encountered:
openwrt-bot commentedJul 20, 2021
stintel:
When a device with procd-ujail installed has been running for a while (hit it today with 28d uptime), restarting dnsmasq results in dnsmasq no longer being started, there is only the ujail process. There are no errors displayed on stdout/stderr while restarting, nor in syslog.
root@ar0:~# /etc/init.d/dnsmasq restart udhcpc: started, v1.33.1 udhcpc: sending discover udhcpc: no lease, failing udhcpc: started, v1.33.1 udhcpc: sending discover udhcpc: no lease, failing udhcpc: started, v1.33.1 udhcpc: sending discover udhcpc: no lease, failing udhcpc: started, v1.33.1 udhcpc: sending discover udhcpc: no lease, failing
Tue Jul 20 15:17:15 2021 user.notice dnsmasq: DNS rebinding protection is active, will discard upstream RFC1918 responses! Tue Jul 20 15:17:15 2021 user.notice dnsmasq: Allowing 127.0.0.0/8 responses Tue Jul 20 15:17:15 2021 user.notice dnsmasq: Allowing RFC1918 responses for domain plex.direct
root@ar0:~# ps aux | grep dnsmasq root 21289 0.0 0.0 2088 872 ? S 15:17 0:00 /sbin/ujail -n dnsmasq -u -l -r /dev/null -r /dev/urandom -r /etc/TZ -r /etc/dnsmasq.conf -r /etc/ethers -r /etc/group -r /etc/hosts -r /etc/passwd -r /sbin/hotplug-call -r /tftpboot -r /tmp/dnsmasq.d -r /tmp/etc/dnsmasq.conf.main -r /tmp/hosts/dhcp.main -r /usr/lib/dnsmasq/dhcp-script.sh -r /usr/share/dnsmasq/dhcpbogushostname.conf -r /usr/share/dnsmasq/rfc6761.conf -r /usr/share/dnsmasq/trust-anchors.conf -w /var/lib/dhcp.leases -w /var/run/dnsmasq/ -- /usr/sbin/dnsmasq -C /tmp/etc/dnsmasq.conf.main -k -x /var/run/dnsmasq/dnsmasq.main.pid root 21455 0.0 0.0 1132 468 pts/1 S+ 15:19 0:00 grep dnsmasq root@ar0:~# ss -anput | grep dnsmasq root@ar0:~#
Commenting out the lines in the init script starting with procd_add_jail and then restarting the service solves the problem. The problem also does not occur when dnsmasq is started during boot.
I've seen this problem before, mentioned it a few times on IRC, the first time was in October 2020, so before 21.02 was branched, so it's very likely this problem exists there as well.
I didn't reboot the system where I'm currently experiencing this, I've commented out the procd_add_jail lines instead. Uncommenting those lines brings back the problem, so further investigation is possible.
This seems to be a general problem with ujail, as even a simple echo refuses to start:
# ujail -d1 -n blah -r /tmp -- /bin/echo test jail: adding mount /tmp /tmp bind(1) ro(1) err(0) jail: Using namespaces(0x28020000), capabilities(0), seccomp(0) jail: adding mount /bin/echo /bin/echo bind(1) ro(1) err(1) jail: adding mount /lib/ld-musl-x86_64.so.1 /lib/ld-musl-x86_64.so.1 bind(1) ro(1) err(1) jail: adding library /lib/libgcc_s.so.1 (libgcc_s.so.1) jail: adding library /lib/libc.so (libc.so)
The process hangs here until killed with kill -9.
Running in strace, the process hangs on epoll_pwait.
Backtrace with gdbserver:
#0 epoll_pwait (fd=3, ev=ev@entry=0x7ffff7f802c0 , cnt=cnt@entry=10, to=1834444156, sigs=sigs@entry=0x0) at ./arch/x86_64/syscall_arch.h:61 #1 0x00007ffff7fa7ada in epoll_wait (fd=, ev=ev@entry=0x7ffff7f802c0 , cnt=cnt@entry=10, to=) at src/linux/epoll.c:36 #2 0x00007ffff7f7805f in uloop_fetch_events (timeout=) at /home/stijn/Development/OpenWrt/openwrt/build_dir/target-x86_64_musl/libubox-2021-08-19-d716ac4b/uloop-epoll.c:73 #3 uloop_run_events (timeout=) at /home/stijn/Development/OpenWrt/openwrt/build_dir/target-x86_64_musl/libubox-2021-08-19-d716ac4b/uloop.c:170 #4 uloop_run_timeout (timeout=-1) at /home/stijn/Development/OpenWrt/openwrt/build_dir/target-x86_64_musl/libubox-2021-08-19-d716ac4b/uloop.c:555 #5 0x000055555555b915 in ?? () #6 0x00007ffff7f69bb8 in ?? () #7 0xffffffff01203ff2 in ?? () #8 0x00007ffff7ffd880 in ?? () from /home/stijn/Development/OpenWrt/openwrt/scripts/../staging_dir/target-x86_64_musl/root-x86/lib/ld-musl-x86_64.so.1 #9 0x00007ffff7fe30a0 in do_init_fini (queue=) at ldso/dynlink.c:1545 #10 0x00007ffff7ff80e0 in ?? () from /home/stijn/Development/OpenWrt/openwrt/scripts/../staging_dir/target-x86_64_musl/root-x86/lib/ld-musl-x86_64.so.1 #11 0x00007fffffffecb8 in ?? () #12 0x00007ffff7fa5d3c in libc_start_main_stage2 (main=0x6b77814f93720b1f, argc=1431674893, argv=0x55555555a00d) at src/env/__libc_start_main.c:94 #13 0x000055555555ba5b in ?? () #14 0x0000000000000008 in ?? () #15 0x00007fffffffeeca in ?? () #16 0x00007fffffffeed6 in ?? () #17 0x00007fffffffeed9 in ?? () #18 0x00007fffffffeede in ?? () #19 0x00007fffffffeee1 in ?? () #20 0x00007fffffffeee6 in ?? () #21 0x00007fffffffeee9 in ?? () #22 0x00007fffffffeef3 in ?? () #23 0x0000000000000000 in ?? ()
The text was updated successfully, but these errors were encountered: