Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd-networkd prunes floating virtual IPs for high-availability environments #12050

Open
chr4 opened this issue Mar 20, 2019 · 7 comments

Comments

4 participants
@chr4
Copy link

commented Mar 20, 2019

Is your feature request related to a problem? Please describe.

When maintaining virtual floating IPs (VIPs) on high-available systems, I'm seeing them pruned when restarting systemd-networkd (even with no config changes) which results in a downtime.

This problem is especially severe when DHCP is used, as it is also triggered when the DHCP lease gets renewed (some cloud providers use DHCP by default to configure network interfaces). But also on static IP configurations, systemd-networkd might get restarted from time to time (automatic security patches, etc.).

I'm aware that this is a feature (which I agree is useful for desktops, etc.). In high-available environments with VIPs this backfires when using Linux distributions that rely on systemd-networkd (like Ubuntu, via netplan).

Describe the solution you'd like

I'm suggesting a flag that disables this behaviour, or would allow whitelisting certain VIPs to prevent them from purging. I've tried CriticalConnection=true, but this doesn't prevent the pruning.

Describe alternatives you've considered

  1. Using a dummy interface and rely on weak host mode (ES): This doesn't work, as the gratitious ARPs/ unsolicited neighbour adverts can't be send using a dummy interface. There's another issue with this approach, which I described in #12051
  2. Filing a patch to keepalived to support sending out NA/GARP via another interface (which is currently considered by them, see acassen/keepalived#1170, but might not work
  3. Migrating back to ifupdown
  4. Migrating to a distribution without systemd-networkd
  5. Migrating to BSD
@poettering

This comment has been minimized.

Copy link
Member

commented Mar 20, 2019

I figure instead of removing them right away we could just mark them with some short remaining lifetime.

In general though: either networkd manages an interface or it doesn't. Just leaving old configuraion on the interface will become a problem sooner or later. I mean, somebody needs to clean that up, and simply ignoring everything is not just going to be a major source of headaches.

@poettering poettering added the network label Mar 20, 2019

@chr4

This comment has been minimized.

Copy link
Author

commented Mar 20, 2019

In general though: either networkd manages an interface or it doesn't. Just leaving old configuraion on the interface will become a problem sooner or later. I mean, somebody needs to clean that up, and simply ignoring everything is not just going to be a major source of headaches.

I was wondering what's the best practise regarding floating IPs then? As far as I'm aware this is a quite common solution for implementing high-availability - especially with protocols like VRRP and CARP available. Mixing systemd-network and ifupdown (or reverting to the latter) increases the likelyhood of headaches imho...

@ssahani

This comment has been minimized.

Copy link
Contributor

commented Mar 20, 2019

It seems to me your link is getting down and then getting up . Try with IgnoreCarrierLoss

@chr4

This comment has been minimized.

Copy link
Author

commented Apr 11, 2019

I've tried that, but IgnoreCarrierLoss doesn't help.

After further discussion in acassen/keepalived#1170, there is an option to make it work using macvlans, but only when using multicast.

Currently, there seems to be no option to run systemd-networkd alongside a high availability service that doesn't create a custom interface (e.g. via macvlan) but uses existing interfaces. I suppose this doesn't only affect VRRP but also things like CARP.

I'm not sure how this would be best solved, I can't think of anything better than a flag that prevents address pruning altogether or some flag that allows whitelisting certain IPs that won't be pruned upon reload.

@ssahani

This comment has been minimized.

Copy link
Contributor

commented May 3, 2019

Could you please give a reproducer ?

@chr4

This comment has been minimized.

Copy link
Author

commented May 8, 2019

Could you please give a reproducer ?

A implified version can be reproduced by just adding an address to an interface and then restarting systemd-networkd. The addres will be purged.

ip a a 1.2.3.4/24 dev eth0
systemctl restart systemd-networkd

This becomes a problem in cases where the additional address is used as a virtual or floating IP in higih-available environments (where a master IP is swapped to a different service upon failovers, e.g. by services like Pacemaker, keepalived and alike, usually using protocols like Corosync, VRRP or CARP). Those addresses are not handled by systemd-networkd and are therefore purged upon DHCP renewals or when an update triggers a restart of systemd-networkd, therefore potentially resulting in downtimes or at least unecessary failovers.

ssahani added a commit to ssahani/systemd that referenced this issue May 8, 2019

networkd: Allow networkd to work with keepalived / highavialbility
This looks pretty legit. networkd drops the foreign addresses upon
restart. This becomes a problem in cases where the additional address
is used as a virtual or floating IP in higih-available environments
(where a master IP is swapped to a different service upon failovers,
e.g. by services like Pacemaker, keepalived and alike, usually using
protocols like Corosync, VRRP or CARP). Those addresses are not handled
by systemd-networkd and are therefore purged upon DHCP renewals or
when an update triggers a restart of systemd-networkd, therefore
potentially resulting in downtimes or at least unecessary failovers.

closes systemd#12050
@ssahani

This comment has been minimized.

Copy link
Contributor

commented May 8, 2019

Please try with #12511

@yuwata yuwata added the has-pr label May 8, 2019

ssahani added a commit to ssahani/systemd that referenced this issue May 14, 2019

networkd: Allow networkd to work with keepalived / highavialbility
This looks pretty legit. networkd drops the foreign addresses upon
restart. This becomes a problem in cases where the additional address
is used as a virtual or floating IP in higih-available environments
(where a master IP is swapped to a different service upon failovers,
e.g. by services like Pacemaker, keepalived and alike, usually using
protocols like Corosync, VRRP or CARP). Those addresses are not handled
by systemd-networkd and are therefore purged upon DHCP renewals or
when an update triggers a restart of systemd-networkd, therefore
potentially resulting in downtimes or at least unecessary failovers.

closes systemd#12050
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.