[keepalived] Clear the vips net namespace #932

mythus · 2014-04-12T11:10:32Z

I have two controllers in HA setup using Havana tag for Chef provisioning. I use Ubuntu LTS12.04 with kernel 3.11.0-19-generic.
When one of the controller is rebooted , on server boot the vrrp goes directly on BACKUP state and trigger the notify.sh script to do delete operation for each vip. After the server is fully started the ip nets exec vips returns : "seting the network namespace failed: Invalid argument" for any command which effectively left me without HA. The only way to recover from this is to delete the vips net namespace and restart keepalived to set it up again. I could not find what is the exact reason for this behavior , the commands in the notify.sh seems to work just fine. I think it's kernel/iproute2 related issue combined with the boot process because If i do not run keepalived on boot , but after the boot is completed(no vips net namespace exists) it works fine.
I worked around this by having the notify.sh to delete vips namespace if only the 169.254.x.x exist on the vip-ns interface. In this case the vips network namespace and veth interfaces are not needed because the controller is passive at the moment. I think it is better for the notify.sh to behave like this and solved the boot issue too.

Apsu · 2014-04-15T18:04:23Z

I'd have to look into this more carefully, but I can make a few comments right now.

First of all, we're not currently using or testing 12.04 with the 3.11.x line of kernels. I don't know that there's anything particularly wrong or right about them in 12.04 with respect to this kind of issue, but I Do know there are namespace bugs in certain revisions of the 3.2/3.8 lines. It's quite possible that's the cause here as well.

That said, what you're describing is a race condition; different behavior between starting keepalived "early" vs "late". This makes me wonder if your physical network configuration on the controllers is what we refer to as a "combined plane" setup. I.e., a single logical interface, in your OVS provider bridge, with an IP on the bridge/a bridge interface, used for both management/services (control plane) as well as the VM network(s) through the same bridge (data plane).

If this is the case, there are MANY known issues with a combined plane setup and OVS, not the least of which being keepalived startup race conditions. Please provider more information on your host network topology so I can better advise you.

mythus · 2014-04-16T06:12:39Z

Sorry , this appeared to be a lxc bug with umount of the MS_SHARED /run/netns . It is reported to the Ubuntu team.
Thank you

claco closed this as completed May 21, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[keepalived] Clear the vips net namespace #932

[keepalived] Clear the vips net namespace #932

mythus commented Apr 12, 2014

Apsu commented Apr 15, 2014

mythus commented Apr 16, 2014

[keepalived] Clear the vips net namespace #932

[keepalived] Clear the vips net namespace #932

Comments

mythus commented Apr 12, 2014

Apsu commented Apr 15, 2014

mythus commented Apr 16, 2014