OpenStack: HA proxy on service vm switches back and forth from/to bootstrap and production cluster #1854

jomeier · 2019-06-15T04:50:54Z

Hi,

What happened?

On the api VM with HAProxy and CoreDNS after around 10-15 Minutes HAProxy starts continuously to restart.

It looks like the haproxy-watcher triggers a switch over from the bootstrap node to the production cluster too early.

In my case the production control plane wasn't ready. The API server hasn't started leading to a broken install.

If I turn off the haproxy-watcher, I get almost through the installation procedure.

What you expected to happen?

HAProxy should only switch to production cluster if the control plane is really stable.

Greetings,

Josef

Version

$ openshift-install version
./openshift-install unreleased-master-1112-g7a54a6b120a0044cf9c07b14331760af8e3c216b-dirty
built from commit 7a54a6b120a0044cf9c07b14331760af8e3c216b
release image registry.svc.ci.openshift.org/origin/release:4.2

Platform (aws|libvirt|openstack):

OpenStack (Stein), 4GByte RAM, 2 VCPUs, 60 GByte Disk

jomeier · 2019-06-15T05:19:42Z

[core@test-cluster-njrtc-api ~]$ sudo podman run --rm -ti --net=host -v /etc/haproxy:/usr/local/etc/haproxy:ro docker.io/library/haproxy:1.7
<7>haproxy-systemd-wrapper: executing /usr/local/sbin/haproxy -p /run/haproxy.pid -db -f /usr/local/etc/haproxy/haproxy.cfg -Ds 
[WARNING] 165/051754 (9) : config : missing timeouts for proxy 'test-cluster-njrtc-api-masters'.
   | While not properly invalid, you will certainly encounter various problems
   | with such a configuration. To fix this, please ensure that all following
   | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
[WARNING] 165/051754 (9) : config : missing timeouts for proxy 'test-cluster-njrtc-api-workers'.
   | While not properly invalid, you will certainly encounter various problems
   | with such a configuration. To fix this, please ensure that all following
   | timeouts are set to a non-zero value: 'client', 'connect', 'server'.

jomeier · 2019-06-15T05:53:55Z

Maybe it has something to do with the haproxy-watcher service. Maybe the watcher script gets some nonsense back from the unhealthy cluster which triggers stuff (new haproxy config, restart of the haproxy service, ...) which is not wanted at that point in time?

mandre · 2019-06-18T09:33:41Z

Yes, I also noticed the haproxy-watcher service does weird stuff that could cause back and forth between the bootstrap and the production control plane. In my experience though, the LB setup will eventually converge to the production control plane once it is stable.
FYI we're in the process of removing the API VM (since it's single point of failure), and move the LB service to the master nodes. This part is going to be re-worked.

bjhuangr · 2019-07-10T03:26:22Z

@mandre guess DNS service will also moved to master nodes once API VM removed?

mandre · 2019-07-10T05:44:52Z

@mandre guess DNS service will also moved to master nodes once API VM removed?

That's correct, we'll be running haproxy, keepalived and coredns as static pods on the master nodes.

mandre · 2019-08-01T09:47:04Z

/label platform/openstack

mandre · 2019-08-20T09:10:10Z

/close

With #1959 we've removed the service VM that was causing the back and forth.

openshift-ci-robot · 2019-08-20T09:10:12Z

@mandre: Closing this issue.

In response to this:

/close

With #1959 we've removed the service VM that was causing the back and forth.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jomeier changed the title ~~api-vm on OpenStack: HA proxy fails after around 10 minutes and continuously crashes~~ OpenStack: HA proxy on service vm switches to early to the production cluster Jun 15, 2019

jomeier changed the title ~~OpenStack: HA proxy on service vm switches to early to the production cluster~~ OpenStack: HA proxy on service vm switches back and forth from/to bootstrap and production cluster Jun 16, 2019

openshift-ci-robot added the platform/openstack label Aug 1, 2019

openshift-ci-robot closed this as completed Aug 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenStack: HA proxy on service vm switches back and forth from/to bootstrap and production cluster #1854

OpenStack: HA proxy on service vm switches back and forth from/to bootstrap and production cluster #1854

jomeier commented Jun 15, 2019 •

edited

jomeier commented Jun 15, 2019

jomeier commented Jun 15, 2019

mandre commented Jun 18, 2019

bjhuangr commented Jul 10, 2019

mandre commented Jul 10, 2019

mandre commented Aug 1, 2019

mandre commented Aug 20, 2019

openshift-ci-robot commented Aug 20, 2019

OpenStack: HA proxy on service vm switches back and forth from/to bootstrap and production cluster #1854

OpenStack: HA proxy on service vm switches back and forth from/to bootstrap and production cluster #1854

Comments

jomeier commented Jun 15, 2019 • edited

What happened?

What you expected to happen?

Version

Platform (aws|libvirt|openstack):

jomeier commented Jun 15, 2019

jomeier commented Jun 15, 2019

mandre commented Jun 18, 2019

bjhuangr commented Jul 10, 2019

mandre commented Jul 10, 2019

mandre commented Aug 1, 2019

mandre commented Aug 20, 2019

openshift-ci-robot commented Aug 20, 2019

jomeier commented Jun 15, 2019 •

edited