Skip to content
This repository has been archived by the owner on Dec 13, 2022. It is now read-only.

[keepalived] Clear the vips net namespace #932

Closed
mythus opened this issue Apr 12, 2014 · 2 comments
Closed

[keepalived] Clear the vips net namespace #932

mythus opened this issue Apr 12, 2014 · 2 comments

Comments

@mythus
Copy link

mythus commented Apr 12, 2014

I have two controllers in HA setup using Havana tag for Chef provisioning. I use Ubuntu LTS12.04 with kernel 3.11.0-19-generic.
When one of the controller is rebooted , on server boot the vrrp goes directly on BACKUP state and trigger the notify.sh script to do delete operation for each vip. After the server is fully started the ip nets exec vips returns : "seting the network namespace failed: Invalid argument" for any command which effectively left me without HA. The only way to recover from this is to delete the vips net namespace and restart keepalived to set it up again. I could not find what is the exact reason for this behavior , the commands in the notify.sh seems to work just fine. I think it's kernel/iproute2 related issue combined with the boot process because If i do not run keepalived on boot , but after the boot is completed(no vips net namespace exists) it works fine.
I worked around this by having the notify.sh to delete vips namespace if only the 169.254.x.x exist on the vip-ns interface. In this case the vips network namespace and veth interfaces are not needed because the controller is passive at the moment. I think it is better for the notify.sh to behave like this and solved the boot issue too.

@Apsu
Copy link
Contributor

Apsu commented Apr 15, 2014

I'd have to look into this more carefully, but I can make a few comments right now.

First of all, we're not currently using or testing 12.04 with the 3.11.x line of kernels. I don't know that there's anything particularly wrong or right about them in 12.04 with respect to this kind of issue, but I Do know there are namespace bugs in certain revisions of the 3.2/3.8 lines. It's quite possible that's the cause here as well.

That said, what you're describing is a race condition; different behavior between starting keepalived "early" vs "late". This makes me wonder if your physical network configuration on the controllers is what we refer to as a "combined plane" setup. I.e., a single logical interface, in your OVS provider bridge, with an IP on the bridge/a bridge interface, used for both management/services (control plane) as well as the VM network(s) through the same bridge (data plane).

If this is the case, there are MANY known issues with a combined plane setup and OVS, not the least of which being keepalived startup race conditions. Please provider more information on your host network topology so I can better advise you.

@mythus
Copy link
Author

mythus commented Apr 16, 2014

Sorry , this appeared to be a lxc bug with umount of the MS_SHARED /run/netns . It is reported to the Ubuntu team.
Thank you

@claco claco closed this as completed May 21, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants