systemd-networkd-wait-online: Wait ALL links to gain a carrier #2037

Closed
TCB13 opened this Issue Nov 26, 2015 · 13 comments

Comments

Projects
None yet
10 participants

TCB13 commented Nov 26, 2015

Hello,

According to documentation systemd-networkd-wait-online works as follows:

it will wait for all links it is aware of and which are managed by systemd-networkd.service(8) to be fully configured or failed, and for at least one link to gain a carrier.

Is there any specific reason why and for at least one link to gain a carrier isn't actually and ALL links to gain a carrier. It looks like the current methodology has issues if we've multiple IPv4 and IPv6 links (IPs) on the same interface.

systemd-networkd-wait-online seems to stop waiting as soon as ONE link (IP) is assigned/online and the network is working.

Is there a way it's behavior can be configured to wait for all links to be online? Otherwise this can end up in binding issues at other services because some specific IP is not available to bind but network-online.target was reached.

Example, I set two units that start after network-online.target and one pings an host over IPv4 the other one over IPv6 and the following happens.

IPv4 Test Result:

server.example:~# systemctl status abootping.service
● abootping.service - A boot-time ping for network testing
   Loaded: loaded (/lib/systemd/system/abootping.service; enabled)
   Active: inactive (dead) since Thu 2015-11-26 11:01:58 CET; 45s ago
 Main PID: 433 (code=exited, status=0/SUCCESS)

Nov 26 11:01:54 server.example ping[433]: PING test-ping-host.example (89.-----.8) from 5.----.149 : 56(84) bytes of data.
Nov 26 11:01:54 server.example ping[433]: 64 bytes from test-ping-host.example (89.-----.8): icmp_seq=1 ttl=59 time=4.92 ms
Nov 26 11:01:55 server.example ping[433]: 64 bytes from test-ping-host.example (89.-----.8): icmp_seq=2 ttl=59 time=4.92 ms
Nov 26 11:01:56 server.example ping[433]: 64 bytes from test-ping-host.example (89.-----.8): icmp_seq=3 ttl=59 time=4.84 ms
Nov 26 11:01:57 server.example ping[433]: 64 bytes from test-ping-host.example (89.-----.8): icmp_seq=4 ttl=59 time=5.41 ms
Nov 26 11:01:58 server.example ping[433]: 64 bytes from test-ping-host.example (89.-----.8): icmp_seq=5 ttl=59 time=4.95 ms
Nov 26 11:01:58 server.example ping[433]: --- test-ping-host.example ping statistics ---
Nov 26 11:01:58 server.example ping[433]: 5 packets transmitted, 5 received, 0% packet loss, time 4007ms
Nov 26 11:01:58 server.example ping[433]: rtt min/avg/max/mdev = 4.843/5.010/5.414/0.209 ms

IPv6 Test Result:

server.example:~# systemctl status abootping6.service
● abootping6.service - A boot-time ping6 for network testing
   Loaded: loaded (/lib/systemd/system/abootping6.service; enabled)
   Active: failed (Result: exit-code) since Thu 2015-11-26 11:01:54 CET; 56s ago
 Main PID: 436 (code=exited, status=2)

Nov 26 11:01:54 server.example ping6[436]: ping: bind icmp socket: Cannot assign requested address
Nov 26 11:01:54 server.example systemd[1]: abootping6.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Nov 26 11:01:54 server.example systemd[1]: Failed to start A boot-time ping6 for network testing.
Nov 26 11:01:54 server.example systemd[1]: Unit abootping6.service entered failed state.

As you can see the IPv4 network is working perfectly after boot, however IPv6 is not ready yet, so ping6 and all boot processes that need IPv6 fail do bind.

Before someone asks if my networking is properly setup, if I SSH to the machine right after the boot and try to run the same ping6 command everything works fine. It really looks like systemd-networkd-wait-online is not waiting for the IPv6 addresses to be online and routable since it already has an IPv4 link working.

Thank you.

ohsix commented Nov 26, 2015

what constitutes the network being 'up' is poorly defined, mostly

http://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

if you have a more specific / better way to signal that your network is 'online', you can make it do what NetworkManager-wait-online.service for network-online.target

Contributor

arvidjaar commented Nov 26, 2015

however IPv6 is not ready yet

Is IPv6 managed by networkd?

TCB13 commented Nov 26, 2015

@ohsix and @arvidjaar I'm using systemd-networkd not NetworkManager, my current network is controlled by a file I placed at /etc/systemd/network/10-static-eth0.network as follows:

[Match]
Name=eth0

[Network]
Address=2a04:----::149/64
Address=2a04:----::242/64
Address=2a04:----::243/64
Gateway=2a04:----:0001

Address=5.----.149/25
Address=5.----.242/25
Address=5.----.243/25
Gateway=5.----.129

DNS=89.----.4
DNS=46.----.104

It really looks like systemd-networkd-wait-online is not waiting for the IPv6 addresses to be online and routable since it already has an IPv4 link working. => I guess this could be expected since and for at least one link to gain a carrier.

But how can I make sure it waits until all IPs are there, routable and working properly?

Thank you.

Slair1 commented Mar 15, 2016

I have this same problem, my services are starting and failing because my IPv6 address is not yet up, even though they wait for network-online

TCB13 commented Jul 3, 2016

I managed to find out that this issue is related to IPv6 DAD. Some race condition described here: https://www.agwa.name/blog/post/beware_the_ipv6_dad_race_condition.

Basically white DAD is still running no program can bind to a specific IPv6 address. If you try to bind at that time, they will fail.

Disabling DAD for the interface fixes the issue:

vim /etc/sysctl.conf
// net.ipv6.conf.eth0.accept_dad = 0
sysctl -p

However this is a quick fix, I would like to see DAD working properly with systemd-networkd-wait-online .

I experienced this issues as well. I am using a VPS and it is assigned with an IPv6 pool so I added specific addresses manually in /etc/network/interfaces

auto eth0
allow-hotplug eth0
iface eth0 inet static
        address x.x.x.x
        netmask 255.255.255.0
        broadcast x.x.x.255
        network x.x.x.0
        gateway x.x.x.1
        pre-up iptables-restore < /etc/network/iptables.rules
        pre-up ip6tables-restore < /etc/network/ip6tables.rules
        up /sbin/ip -6 addr add 2400:xxx:xxx:xxx::1234/64 dev eth0
        up /sbin/ip -6 addr add 2400:xxx:xxx:xxx::4321/64 dev eth0
        down /sbin/ip -6 addr del 2400:xxx:xxx:xxxx::1234/64 dev eth0
        down /sbin/ip -6 addr del 2400:xxxx:xxxx:xxx::4321/64 dev eth0

And my Nginx and Dovecot failed to bind IPv6 address on boot...

I fixed it by appending net.ipv6.conf.eth0.accept_dad = 0 to /etc/sysctl.conf, thanks @TCB13

Contributor

jgunthorpe commented Dec 21, 2016

I am also having this issue, and also only using systemd-networkd (on Ubuntu Xenial)

Here is enough detail for someone to fix it:

This is happening because systemd-networkd-wait-online completes even though IPv6 address are still in the tentative state (ie DAD is ongoing):

For instance at the moment of failure 'ip addr' says this:

 2: enp0s2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
     link/ether 4a:f6:d4:0f:14:3e brd ff:ff:ff:ff:ff:ff
     inet 10.0.0.152/24 brd 10.0.0.255 scope global dynamic enp0s2
        valid_lft 3599sec preferred_lft 3599sec
     inet6 fd83:609c:bdc8:1:48f6:d4ff:fe0f:143e/64 scope global tentative mngtmpaddr noprefixroute dynamic
        valid_lft 86399sec preferred_lft 14399sec
     inet6 fe80::48f6:d4ff:fe0f:143e/64 scope link
        valid_lft forever preferred_lft forever

The above is showing an interface that got an IPv4 address via DHCP, and an IPv6 address via radv.

However DAD is ongoing and the IPv6 address is marked 'tentative'. Those addresses cannot be used for much in Linux.

The first thing to fix is to have systemd-networkd-wait-online wait until all IPv6 addresses leave the tentative state (check ifa_flags for IFA_F_TENTATIVE), that would at least cover off cases where networkd itself sets static IPv6 addresses, and make the radvd & dhcp case more likely to work.

The second fix is to enhance networkd to know what addresses to expect (add DHCP=both?) and have systemd-networkd-wait-online wait until all expected dynamic address configuration completes.

This is a huge PITA for NFS, where I need NFS mounts to not start until the IPv6 network actually works, otherwise systemd tries the mount once, fails the mount, and then breaks the boot.

@jgunthorpe jgunthorpe referenced this issue Dec 21, 2016

Closed

Retry option for mount units #4468

0 of 1 task complete
Contributor

jgunthorpe commented Jan 9, 2017

This seems fixed in v232 at least. It looks like fe30727 largely took care of it. The 'address_is_ready' check in 'link_check_ready' does check that the radv address is !tentative, and that does block systemd-network-online.

v289 (Xenial) does not work.

I think this can be closed.

Contributor

martinpitt commented Jan 9, 2017

Thanks for checking again! Closing then.

@martinpitt martinpitt closed this Jan 9, 2017

v289 (Xenial) does not work.

@jgunthorpe what did you mean with that? Should I assume this isn't fixed on Ubuntu Xenial, and that I have to disable DAD?

Contributor

jgunthorpe commented Mar 20, 2017

@koenpunt my testing showed it isn't fixed on Xenial and IPv6 is not reliably working before network-online

braiam commented Apr 1, 2017

Here's the tracking bug for Xenial/Yakkety https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1673092

braiam commented Jun 21, 2017

Is there a patch for backporting this fix? @jgunthorpe comment says that commit fe30727 took care of it, but the commit doesn't make reference to !tentative or otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment