-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Description
systemd version the issue has been seen with
systemd 246
$ systemctl --version
systemd 246 (246.1-1-arch)
+PAM +AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid
Used distribution
Arch Linux
Expected behaviour you didn't see
It was expected that after the carrier lost the interface would transition to the "unconfigured" state, so it could transition to the configured state upon gaining the carrier.
It was also expected that if interface is failing at least an error message would be provided for the cause of the failure.
Below is how it was handling the lost carrier normally on the same system:
Aug 12 19:13:14 archlinux kernel: e1000e 0000:00:05.0 enp0s5: NIC Link is Down
Aug 12 19:13:14 archlinux systemd-networkd[261]: enp0s5: Lost carrier
Aug 12 19:13:14 archlinux systemd-networkd[261]: enp0s5: DHCP lease lost
Aug 12 21:13:20 archlinux kernel: e1000e 0000:00:05.0 enp0s5: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Aug 12 21:13:20 archlinux systemd-networkd[261]: enp0s5: Gained carrier
Aug 12 21:13:20 archlinux systemd-networkd[261]: enp0s5: DHCPv4 address 10.211.55.8/24 via 10.211.55.1
Aug 12 21:13:21 archlinux systemd-resolved[327]: Using degraded feature set (UDP) for DNS server 10.211.55.1.
Unexpected behaviour you saw
I enabled the debugging output for networkd to get more insight into the problem:
Aug 18 17:06:10 archlinux systemd-networkd[63932]: rtnl: received non-static neighbor, ignoring.
Aug 18 17:06:10 archlinux systemd-networkd[63932]: enp0s5: Flags change: -LOWER_UP -RUNNING
Aug 18 17:06:10 archlinux systemd-networkd[63932]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/network1/link/_32 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=42 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
Aug 18 17:06:10 archlinux systemd-networkd[63932]: enp0s5: Lost carrier
Aug 18 17:06:10 archlinux systemd-networkd[63932]: DHCP CLIENT (0x1f42fb36): STOPPED
Aug 18 17:06:10 archlinux systemd-networkd[63932]: DHCP CLIENT (0x1f42fb36): RELEASE
Aug 18 17:06:10 archlinux systemd-networkd[63932]: enp0s5: DHCP lease lost
Aug 18 17:06:10 archlinux systemd-networkd[63932]: enp0s5: Removing route: dst: 10.211.55.1/32, src: n/a, gw: n/a, prefsrc: 10.211.55.8, scope: link, table: main, proto: dhcp, type: unicast
Aug 18 17:06:10 archlinux systemd-networkd[63932]: enp0s5: Removing route: dst: n/a, src: n/a, gw: 10.211.55.1, prefsrc: 10.211.55.8, scope: global, table: main, proto: dhcp, type: unicast
Aug 18 17:06:10 archlinux systemd-networkd[63932]: enp0s5: Removing address 10.211.55.8
Aug 18 17:06:10 archlinux systemd-networkd[63932]: enp0s5: Failed
Aug 18 17:06:10 archlinux systemd-networkd[63932]: enp0s5: State changed: configured -> failed
As you can see from the above debugging excerpt, enp0s5 lost its carrier, which triggered de-initialisation of the DHCP client (it is going through the dhcp4_remove_all() routine as part of the dhcp_lease_lost(). But one it finishes with removing routes and addresses it fails, leaving the interface in the failed state. So when the carrier is gained back the interface won't even try to get configured (there are multiple checks to skip the initialisation of failed interfaces):
Aug 18 17:06:10 archlinux systemd-networkd[63932]: rtnl: received non-static neighbor, ignoring.
Aug 18 17:06:10 archlinux systemd-networkd[63932]: enp0s5: Flags change: +LOWER_UP +RUNNING
Aug 18 17:06:10 archlinux systemd-networkd[63932]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/network1/link/_32 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=47 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
Aug 18 17:06:10 archlinux systemd-networkd[63932]: enp0s5: Gained carrier
Aug 18 17:06:10 archlinux systemd-networkd[63932]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/network1 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=48 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
After looking into the source code, my suspicion lands on the dhcp_reset_mtu() routine since it is the one that immediately follows dhcp4_remove_all() and calls sd_dhcp_lease_get_mtu() which would return an error (specifically, EINVAL) and no debugging messages if the mtu for the lease is negative or zero.
Steps to reproduce the problem
I don't know how to reproduce it elsewhere, but I can describe my setup if it helps. I run Parallels Desktop and a Arch Linux in a VM. This setup was very stable for the last 12 months, but recently I updated Parallels Desktop from 15 to 16. Such an update also affected the Guest tools (e.g. guest drivers, including the virtual network driver).
Unfortunately, on the same day I did a system update of Arch Linux and got my systemd updated from 245 to 246. I went for a coffee, so my laptop went to sleep, and an hour later I experience the issue for the first time.
What I did so far:
- I tried different NIC drivers (e1000e and virtio_net) to see whether it is an issue with the newer Parallels' provided network driver;
- I read through the source code of systemd (networkd) and compared outputs when everything was working and now.
I will further experiment and update this issue, but my primary concern is that there are code paths in networkd where it would silently switch the interface into the failed mode which makes it really hard to investigate and to find the root cause.