-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Is this a bug report or a feature request?:
Both, probably.
What happened:
I tested with layer 2 mode and simulated node failure by shutting down the node that the load balancer IP was on. What then happened it that it took approx. 5 minutes for the IP to be switched to the other node in the cluster (1 master, 2 nodes). After much experimentation I came to the conclusion that the node being down and being "NotReady" did not initiate the switch of the IP address. The 5 minute timeout seems to be caused by the default pod eviction timeout of Kubernetes, which is 5 minutes. That means it takes 5 minutes for a pod on a node that is not available to be deleted. Default "node monitor grace period is 40 seconds, btw.". So that means it currently takes almost 6 minutes with default confguration for an IP address to be switched.
I made things a lot better by decreasing both settings like this:
- --pod-eviction-timeout=20s
- --node-monitor-grace-period=20s
in /etc/kubernetes/manifests/kube-controller-manager.yaml
This makes MetalLB switch the IP in case of node failure in the sub-minute range.
What you expected to happen:
To be honest what I would expect that the whole process takes maybe max. 5 seconds.
How to reproduce it (as minimally and precisely as possible):
Create a Kubernetes 1.11.1 cluster with kubeadm (single master, two nodes). Calico networking.
kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.7.2/manifests/metallb.yaml
kubectl apply -f metallb-cfg.yml
kubectl apply -f tutorial-2.yaml
➜ metallb-test cat metallb-cfg.yml
apiVersion: v1
kind: ConfigMap
metadata:
namespace: metallb-system
name: config
data:
config: |
address-pools:
- name: default
protocol: layer2
addresses:
- 10.115.195.206-10.115.195.208
Then
watch curl --connect-timeout 1 http://10.115.195.206
to see if the nginx app is reachable.
Then
kubectl logs -f --namespace metallb-system speaker-xxxxxxxxx
To see which node has the IP address assigned at the moment.
ssh into the machine and "poweroff".
Wait for how long it takes until the "watch curl" is successful again.
Anything else we need to know?:
Environment:
- MetalLB version: v0.7.2
- Kubernetes version: v1.11.1
- BGP router type/version: N/A
- OS (e.g. from /etc/os-release): CentOS 7
- Kernel (e.g.
uname -a):Linux cp-k8s-ghdev02-node-01.ewslab.eos.lcl 3.10.0-693.el7.x86_64 Implement BGP add-path #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux