Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segtault in systemd-networkd #16299

Closed
pomac303 opened this issue Jun 27, 2020 · 18 comments · Fixed by #16381 or #16470
Closed

segtault in systemd-networkd #16299

pomac303 opened this issue Jun 27, 2020 · 18 comments · Fixed by #16381 or #16470
Labels
bug 🐛 Programming errors, that need preferential fixing dhcp needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer network
Milestone

Comments

@pomac303
Copy link

systemd version the issue has been seen with

systemd 245 (245)
+PAM +AUDIT -SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP -LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid

Used distribution

gentoo

Expected behaviour you didn't see

Seeing systemd-networkd not crashing?

Unexpected behaviour you saw

Systemd-networkd crashing

Steps to reproduce the problem
Unknown, happens all the time, filing this bugreport so that you can get more information.

journactl:
jun 27 22:20:03 byte systemd-networkd[707]: vnet7: Link UP
jun 27 22:20:03 byte systemd-networkd[707]: vnet7: Gained carrier
jun 27 22:20:04 byte systemd[1]: systemd-networkd.service: Main process exited, code=dumped, status=11/SEGV
jun 27 22:20:04 byte systemd[1]: systemd-networkd.service: Failed with result 'core-dump'.
jun 27 22:20:04 byte systemd[1]: systemd-networkd.service: Scheduled restart job, restart counter is at 1.
jun 27 22:20:04 byte systemd[1]: Stopped Network Service.
jun 27 22:20:04 byte systemd[1]: Starting Network Service...

dmesg:
[ 11.926344] systemd-network[707]: segfault at 8 ip 000055a60fc4d2dd sp 00007fff929ffcd0 error 4 in systemd-networkd[55a60fbe1000+87000]
[ 11.926354] Code: 83 fb 77 0f 87 a4 01 00 00 48 8b 5c 24 18 4c 01 e3 e9 53 ff ff ff 0f 1f 80 00 00 00 00 48 8b 85 08 02 00 00 41 bc 00 87 93 03 <8b> 58 08 2b 58 04 83 fb 77 0f 87 5c 01 00 00 48 8b 5c 24 18 4c 01

I also have the coredump if it would be useful

@vcaputo
Copy link
Member

vcaputo commented Jun 27, 2020

If you have the coredump, load it into gdb like gdb /path/to/systemd-networkd /path/to/coredump and paste the output of bt full in gdb.

What's shown above isn't really actionable in isolation.

@pomac303
Copy link
Author

This was odd... I wasn't expecting .reg-xstate/707 or missing symbols...

gdb /lib/systemd/systemd-networkd /root/core.systemd-network.114.eeaf36bce83842ebb0b8637976e623fa.707.1593289203000000000000
GNU gdb (Gentoo 9.2 vanilla) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
https://bugs.gentoo.org/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /lib/systemd/systemd-networkd...
(No debugging symbols found in /lib/systemd/systemd-networkd)

warning: core file may not match specified executable file.
[New LWP 707]

warning: Unexpected size of section .reg-xstate/707' in core file. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by /lib/systemd/systemd-networkd'.
Program terminated with signal SIGSEGV, Segmentation fault.

warning: Unexpected size of section `.reg-xstate/707' in core file.
#0 0x000055a60fc4d2dd in ?? ()
(gdb) bt full
#0 0x000055a60fc4d2dd in ?? ()
No symbol table info available.
#1 0x00007f2e5556b0e6 in source_dispatch () from /lib/systemd/libsystemd-shared-245.so
No symbol table info available.
#2 0x00007f2e5556d6e0 in sd_event_dispatch () from /lib/systemd/libsystemd-shared-245.so
No symbol table info available.
#3 0x00007f2e5556d8a8 in sd_event_run () from /lib/systemd/libsystemd-shared-245.so
No symbol table info available.
#4 0x00007f2e5556dac7 in sd_event_loop () from /lib/systemd/libsystemd-shared-245.so
No symbol table info available.
#5 0x000055a60fbe3111 in ?? ()
No symbol table info available.
#6 0x00007f2e55201c3a in __libc_start_main () from /lib64/libc.so.6
No symbol table info available.
#7 0x000055a60fbe356a in ?? ()
No symbol table info available.
(gdb)

And decodecode gives very little..

Code: 83 fb 77 0f 87 a4 01 00 00 48 8b 5c 24 18 4c 01 e3 e9 53 ff ff ff 0f 1f 80 00 00 00 00 48 8b 85 08 02 00 00 41 bc 00 87 93 03 <8b> 58 08 2b 58 04 83 fb 77 0f 87 5c 01 00 00 48 8b 5c 24 18 4c 01
All code

0: 83 fb 77 cmp $0x77,%ebx
3: 0f 87 a4 01 00 00 ja 0x1ad
9: 48 8b 5c 24 18 mov 0x18(%rsp),%rbx
e: 4c 01 e3 add %r12,%rbx
11: e9 53 ff ff ff jmpq 0xffffffffffffff69
16: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
1d: 48 8b 85 08 02 00 00 mov 0x208(%rbp),%rax
24: 41 bc 00 87 93 03 mov $0x3938700,%r12d
2a:* 8b 58 08 mov 0x8(%rax),%ebx <-- trapping instruction
2d: 2b 58 04 sub 0x4(%rax),%ebx
30: 83 fb 77 cmp $0x77,%ebx
33: 0f 87 5c 01 00 00 ja 0x195
39: 48 8b 5c 24 18 mov 0x18(%rsp),%rbx
3e: 4c rex.WR
3f: 01 .byte 0x1

Code starting with the faulting instruction

0: 8b 58 08 mov 0x8(%rax),%ebx
3: 2b 58 04 sub 0x4(%rax),%ebx
6: 83 fb 77 cmp $0x77,%ebx
9: 0f 87 5c 01 00 00 ja 0x16b
f: 48 8b 5c 24 18 mov 0x18(%rsp),%rbx
14: 4c rex.WR
15: 01 .byte 0x1

@vcaputo
Copy link
Member

vcaputo commented Jun 27, 2020

If you're able to easily reproduce this, rebuilding with debug symbols as this is gentoo, and getting another core dump && backtrace with the symbols would be helpful. I think you'd just re-emerge systemd w/nostrip.

However, with what you already have, it'd be useful to know if %rax is non-NULL, so could you paste the output of info registers from gdb as well.

@pomac303
Copy link
Author

(gdb) info registers
rax 0x0 0
rbx 0x7fff929ffd50 140735653346640
rcx 0x7f2e555d62c8 139836977406664
rdx 0x55a60fc891b8 94171717734840
rsi 0x0 0
rdi 0x55a610313f28 94171724594984
rbp 0x55a61035a320 0x55a61035a320
rsp 0x7fff929ffcd0 0x7fff929ffcd0
r8 0x1 1
r9 0x55a6103538d4 94171724855508
r10 0x0 0
r11 0x8 8
r12 0x3938700 60000000
r13 0x2 2
r14 0x0 0
r15 0x0 0
rip 0x55a60fc4d2dd 0x55a60fc4d2dd
eflags 0x10213 [ CF AF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0

@pomac303
Copy link
Author

the thing is, it should be built with debug already, split but still debug - will look in to it, =)

@pomac303
Copy link
Author

As a sidenote, the /lib/ path is what breaks the split debug - have a newly built version, and I'll see when it happens, don't really want to force it atm

@vcaputo
Copy link
Member

vcaputo commented Jun 28, 2020

So %rax is NULL, it segfaulted on a NULL pointer dereference. That's far more tractable than some random memory address.

I don't have time to try figure out where exactly that code is, but if you get debugging symbols I imagine this shouldn't be too bad to figure out.

At a glance, those cmps against 0x77 are suggestive of SD_DHCP_OPTION_DOMAIN_SEARCH_LIST that is 119.

@poettering
Copy link
Member

Please provide a backtrace with debug symbols, otherwise this simply isn't actionable for us. If you distro doesn't provide debug symbols then this isn't debuggable to us and there's nothing we can do.

@poettering poettering added needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer network labels Jun 28, 2020
@yuwata
Copy link
Member

yuwata commented Jun 29, 2020

Please also provide your configs (.network and .netdev files) and debugging logs.

@pomac303
Copy link
Author

pomac303 commented Jul 5, 2020

Something similar just happened...

Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/lib/systemd/systemd-networkd'.
Program terminated with signal SIGSEGV, Segmentation fault.

warning: Unexpected size of section `.reg-xstate/831639' in core file.
#0  0x0000562e7a9b7b5d in client_timeout_resend (s=<optimized out>, usec=<optimized out>, userdata=0x562e7ab14e60) at ../systemd-stable-245.5/src/libsystemd-network/sd-dhcp-client.c:1074
1074	../systemd-stable-245.5/src/libsystemd-network/sd-dhcp-client.c: No such file or directory.
(gdb) bt full
#0  0x0000562e7a9b7b5d in client_timeout_resend (s=<optimized out>, usec=<optimized out>, userdata=0x562e7ab14e60) at ../systemd-stable-245.5/src/libsystemd-network/sd-dhcp-client.c:1074
        client = 0x562e7ab14e60
        _dont_destroy_client = 0x562e7ab14e60
        next_timeout = 0
        time_now = 619511910366
        time_left = <optimized out>
        r = 0
        __PRETTY_FUNCTION__ = "client_timeout_resend"
        __func__ = "client_timeout_resend"
#1  0x00007f7753800e96 in source_dispatch (s=s@entry=0x562e7ab0c230) at ../systemd-stable-245.5/src/libsystemd/sd-event/sd-event.c:3201
        saved_type = SOURCE_TIME_BOOTTIME
        r = <optimized out>
        __PRETTY_FUNCTION__ = "source_dispatch"
        __func__ = "source_dispatch"
#2  0x00007f7753803580 in sd_event_dispatch (e=e@entry=0x562e7aac2f20) at ../systemd-stable-245.5/src/libsystemd/sd-event/sd-event.c:3634
        ref = <optimized out>
        p = <optimized out>
        r = <optimized out>
        __PRETTY_FUNCTION__ = "sd_event_dispatch"
#3  0x00007f7753803748 in sd_event_run (e=e@entry=0x562e7aac2f20, timeout=timeout@entry=18446744073709551615) at ../systemd-stable-245.5/src/libsystemd/sd-event/sd-event.c:3692
        r = 1
        __PRETTY_FUNCTION__ = "sd_event_run"
#4  0x00007f7753803967 in sd_event_loop (e=0x562e7aac2f20) at ../systemd-stable-245.5/src/libsystemd/sd-event/sd-event.c:3714
        ref = 0x562e7aac2f20
        r = <optimized out>
        __PRETTY_FUNCTION__ = "sd_event_loop"
#5  0x0000562e7a94d111 in run (argv=<optimized out>, argc=1) at ../systemd-stable-245.5/src/network/networkd.c:123
        notify_message = 0x562e7a9d2000 "STOPPING=1\nSTATUS=Shutting down..."
        m = 0x562e7aac2d40
        r = 0
        notify_message = <optimized out>
        m = <optimized out>
        r = <optimized out>
        __func__ = "run"
        __PRETTY_FUNCTION__ = "run"
        _level = <optimized out>
        _e = <optimized out>

@yuwata yuwata added bug 🐛 Programming errors, that need preferential fixing dhcp labels Jul 6, 2020
@yuwata yuwata modified the milestone: v246 Jul 6, 2020
@yuwata
Copy link
Member

yuwata commented Jul 6, 2020

Please provide journal logs and configs.

@pomac303
Copy link
Author

pomac303 commented Jul 6, 2020

(gdb) p client
$1 = (sd_dhcp_client *) 0x562e7ab14e60
(gdb) p client->lease
$2 = (sd_dhcp_lease *) 0x0

So that explains it...

@pomac303
Copy link
Author

pomac303 commented Jul 6, 2020

journal:

jul 04 23:49:37 byte systemd[1]: Starting Network Service...
jul 04 23:49:38 byte systemd-networkd[831639]: /etc/systemd/network/virtual-net.network:13: Failed to parse NTP server address 'se.pool.ntp.org', ignoring: Invalid argument
jul 04 23:49:38 byte systemd-networkd[831639]: /etc/systemd/network/virtual-net.network:14: Timezone is not valid, ignoring assignment: "Europe/Stockholm"
jul 04 23:49:38 byte systemd-networkd[831639]: virtual-net: netdev ready
jul 04 23:49:38 byte systemd-networkd[831639]: local-lan: netdev ready
jul 04 23:49:38 byte systemd-networkd[831639]: external-net: netdev ready
jul 04 23:49:38 byte systemd-networkd[831639]: vnet7: Gained IPv6LL
jul 04 23:49:38 byte systemd-networkd[831639]: vnet6: Gained IPv6LL
jul 04 23:49:38 byte systemd-networkd[831639]: vnet5: Gained IPv6LL
jul 04 23:49:38 byte systemd-networkd[831639]: vnet4: Gained IPv6LL
jul 04 23:49:38 byte systemd-networkd[831639]: vnet3: Gained IPv6LL
jul 04 23:49:38 byte systemd-networkd[831639]: vnet2: Gained IPv6LL
jul 04 23:49:38 byte systemd-networkd[831639]: vnet1: Gained IPv6LL
jul 04 23:49:38 byte systemd-networkd[831639]: vnet0: Gained IPv6LL
jul 04 23:49:38 byte systemd-networkd[831639]: virtual-net: Gained IPv6LL
jul 04 23:49:38 byte systemd-networkd[831639]: local-lan: Gained IPv6LL
jul 04 23:49:38 byte systemd-networkd[831639]: external-net: Gained IPv6LL
jul 04 23:49:38 byte systemd-networkd[831639]: eno3: Gained IPv6LL
jul 04 23:49:38 byte systemd-networkd[831639]: eno1: Gained IPv6LL
jul 04 23:49:38 byte systemd-networkd[831639]: Enumeration completed
jul 04 23:49:38 byte systemd[1]: Started Network Service.
jul 04 23:49:38 byte systemd-networkd[831639]: external-net: netdev exists, using existing without changing its parameters
jul 04 23:49:38 byte systemd-networkd[831639]: local-lan: netdev exists, using existing without changing its parameters
jul 04 23:49:38 byte systemd-networkd[831639]: virtual-net: netdev exists, using existing without changing its parameters
jul 04 23:49:38 byte systemd-networkd[831639]: virtual-net: IPv6 successfully enabled
jul 04 23:49:38 byte systemd-networkd[831639]: local-lan: IPv6 successfully enabled
jul 04 23:49:38 byte systemd-networkd[831639]: external-net: IPv6 successfully enabled
jul 04 23:49:38 byte systemd-networkd[831639]: eno3: IPv6 successfully enabled
jul 04 23:49:38 byte systemd-networkd[831639]: eno1: IPv6 successfully enabled
jul 04 23:49:38 byte systemd-networkd[831639]: eno1: Could not enable IP masquerading, ignoring: Operation not supported
jul 04 23:49:42 byte systemd-networkd[831639]: eno3: DHCPv4 address 92.34.207.214/21 via 92.34.200.1
jul 05 02:24:34 byte systemd-networkd[831639]: eno2: Lost carrier
jul 05 02:24:35 byte systemd-networkd[831639]: eno4: Lost carrier
jul 05 02:24:35 byte systemd-networkd[831639]: eno3: Lost carrier
jul 05 02:24:35 byte systemd-networkd[831639]: eno3: DHCP lease lost
jul 05 02:24:35 byte systemd-networkd[831639]: eno1: Lost carrier
jul 05 02:24:37 byte systemd-networkd[831639]: eno2: Gained carrier
jul 05 02:24:37 byte systemd-networkd[831639]: eno4: Gained carrier
jul 05 02:24:37 byte systemd-networkd[831639]: eno3: Gained carrier
jul 05 02:24:37 byte systemd-networkd[831639]: eno1: Gained carrier
jul 05 02:24:37 byte systemd-networkd[831639]: eno1: Could not enable IP masquerading, ignoring: Operation not supported
jul 05 02:25:04 byte systemd[1]: systemd-networkd.service: Main process exited, code=dumped, status=11/SEGV
jul 05 02:25:04 byte systemd[1]: systemd-networkd.service: Failed with result 'core-dump'.

@pomac303
Copy link
Author

Sorry, it seems like there was a new segfault during the night... And this is with the patches applied

warning: Unexpected size of section `.reg-xstate/694' in core file.
#0 0x0000556f5076cb7d in client_timeout_resend (s=, usec=, userdata=0x556f50ff1330) at ../systemd-stable-245.5/src/libsystemd-network/sd-dhcp-client.c:1074
1074 time_left = (client->lease->t2 - client->lease->t1) / 2;
(gdb) bt full
#0 0x0000556f5076cb7d in client_timeout_resend (s=, usec=, userdata=0x556f50ff1330) at ../systemd-stable-245.5/src/libsystemd-network/sd-dhcp-client.c:1074
client = 0x556f50ff1330
_dont_destroy_client = 0x556f50ff1330
next_timeout = 0
time_now = 299470868990
time_left =
r = 0
PRETTY_FUNCTION = "client_timeout_resend"
func = "client_timeout_resend"
#1 0x00007f1f4295ae96 in source_dispatch (s=s@entry=0x556f50ff6800) at ../systemd-stable-245.5/src/libsystemd/sd-event/sd-event.c:3201
saved_type = SOURCE_TIME_BOOTTIME
r =
PRETTY_FUNCTION = "source_dispatch"
func = "source_dispatch"
#2 0x00007f1f4295d580 in sd_event_dispatch (e=e@entry=0x556f50faade0) at ../systemd-stable-245.5/src/libsystemd/sd-event/sd-event.c:3634
ref =
p =
r =
PRETTY_FUNCTION = "sd_event_dispatch"
#3 0x00007f1f4295d748 in sd_event_run (e=e@entry=0x556f50faade0, timeout=timeout@entry=18446744073709551615) at ../systemd-stable-245.5/src/libsystemd/sd-event/sd-event.c:3692
r = 1
PRETTY_FUNCTION = "sd_event_run"

@yuwata
Copy link
Member

yuwata commented Jul 15, 2020

Hmm, could you test PR #16470? Thank you for the report and your cooperation.

@pomac303
Copy link
Author

Built and running now...

Btw, does systemd-networkd's dhcp act correctly? I get a new ip for each lease which seems odd to me, esp when I should have time left on the last lease...

@yuwata
Copy link
Member

yuwata commented Jul 15, 2020

If you experienced a strange behavior in networkd, then please enable to generate debugging logs by creating /etc/systemd/system/systemd-networkd.service.d/override.conf as

[Service]
Environment=SYSTEMD_LOG_LEVEL=debug

, and then please provide the logs.

@mamarley
Copy link
Contributor

Built and running now...

Btw, does systemd-networkd's dhcp act correctly? I get a new ip for each lease which seems odd to me, esp when I should have time left on the last lease...

That's probably because systemd-networkd is sending a DHCP RELEASE when it stops, causing the server to terminate the lease. Try putting "SendRelease=no" in the "[DHCPv4]" section. That should help.

yuwata added a commit that referenced this issue Jul 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Programming errors, that need preferential fixing dhcp needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer network
5 participants