New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All VIPs managed by ipfailover always move to the just rebooted node #21188

Open
Mad-ness opened this Issue Oct 5, 2018 · 1 comment

Comments

Projects
None yet
4 participants
@Mad-ness

Mad-ness commented Oct 5, 2018

Hi,
Maybe I have missed something in documentation or didn't understand how ipfailover works so please give me a right direction. I'm following to this page https://docs.openshift.com/container-platform/3.10/admin_guide/high_availability.html.

The setup

I have OpenShift Origin 3.10 installed by openshift-ansible. The setup includes 3 masters, 2 infra (registry, routers), app nodes and glusterfs+heketi. It's a pretty standard configuration.

One more thing. Openshift is deployed on the OpenStack instances. VIPs are made as allowed_address_pairs and allowed on both infra nodes. VRRP traffic is allowed in a security group – I can see vrrp packets going in both directions between infra-1 and infra-2 using tcpdump.

What I need

I want to distribute a workload coming to *.apps.example.com between the infra nodes. So my idea is to have 2 VIPs, where each of the IPs should assign on its infra node. So every of infra nodes should have about 50/50 of traffic at each.

What I do

  1. I label the infra nodes with the 'router=us-west-ha'
  2. I create ipfailover using this command:
    oc adm ipfailover --selector="router=us-west-ha" --virtual-ips="10.0.0.43,10.0.0.46" --watch-port 443 --replicas=2 --create --preemption-strategy "preempt_delay 30" --interface eth0

How I test the ipfailover

  • After running oc adm ipfailover ... command, VIP-1 is bound at infra-1 and VIP-2 is to infra-2. It's Ok as it should be. So VIP-1 master is infra-1 and VIP-2 master is infra-2.
  • I shutdown/reboot infra-1. VIP-1 moves to infra-2. That's also fine - both VIPs are maintained by a single ipfailover pod which stays alive.
  • When infra-1 gets available again both VIPs move to this infra-1 node.
    If I reboot infra-2, both VIPs move to this infra-2 when it's up again. It doesn't matter in what order I reboot the infra nodes, both VIPs move to the just rebooted node and always stay there until the active node rebooted.

My questions

  • Why VIP-2 moves out from infra-2 to infra-1 after infra-1 comes back after a reboot? At very beginning its master is infra-2. And, if I understand this right, it should stay there until it is alive.
  • If I reboot infra-2, the both VIPs move to infra-1 and stay there until infra-2 comes back. And once infra-2 is up and running again, both VIPs move on this node.

The problems which I see

Once any of the infra nodes is rebooted/crashed/shutdown/etc and is run again, both VIPs move to this just rebooted node.

Providing that both infra nodes are up, I expect that VIP-1 is always assigned to infra-1 and VIP-2 is to infra-2 and traffic is being load balanced between both infra nodes. But actually I have that both VIPs are assigned at a single infra node after this one just rebooted.

VIP-1 is bound to infra-1 and VIP-2 is to infra-2 are just given as an example. If VIP-1 selects infra-2 as its master and VIP-2 selects infra-1, the situation doesn't change at all. Everything still works the same.

Delete and re-create ipfailover

I did the same test many times by deleting the failover oc delete dc/ipfailover sa/ipfailover and by running the creating ipfailover command again as mentioned at very beginning oc adm ipfailover .... The result is always the same. It works great until first reboot any of the infra nodes.

Testing Keepalived out of OpenShift

After a lot of attempts of the above I reinstalled OpenShift and run keepalived on bare metal, not using the OpenShift built-in ipfailover. I just configured it manually on the same infra nodes. Rebooting the infra nodes one by one, VIPs migrate between the nodes only when it really needs. And each of the VIPs assigns on its master correctly.

Additional Information

I know that a best setup would include an odd number of infra nodes to avoid the split brain situation. But in my case only two are needed and I'm not sure that split brain takes the place because of both ipfailover pods communicate to each other very well, including vrrp protocol.

So please help me to sort out in where I have misconfiguration.
Thank you,
Dmitrii

@jwforres

This comment has been minimized.

Show comment
Hide comment
@jwforres
Member

jwforres commented Oct 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment