Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-proxy ipvs doesn't update IPs after node failure #77903

Closed
petermetz opened this issue May 15, 2019 · 12 comments

Comments

Projects
None yet
4 participants
@petermetz
Copy link

commented May 15, 2019

What happened:
Simulated a node failure on my 3 master 1 worker cluster by pausing the second master in VirtualBox.
The round robin entry for a load balancer service of mine still has the pods IP that went down with the node (after an hour of waiting).
What you expected to happen:
I thought kube-proxy would get an event triggered forcing it to update the ipvs table dropping the pods IP that no longer exists. The load balancer service is an AppsCode Voyager HaProxy ingress (they map ingresses to actual load balancer services). It's nothing fancy.

How to reproduce it (as minimally and precisely as possible):

  1. Pull up a cluster with kubespray v2.8.1 via vagrant+vbox: 3 masters 1 worker
  2. Install the AppsCode Voyager ingress controller with default settings
  3. Create a voyager ingress of type load balancer: https://appscode.com/products/voyager/8.0.1/concepts/ingress-types/loadbalancer/
    3.1 Sscale the load balancer to 4 so that it has a pod on each node
  4. Simulate sudden and complete node failure on the second master node
  5. Wait however long it should be for the node to be considered dead (5 minutes by default I think)
  6. SSH into one of the nodes that are still up
  7. sudo ipvsadm --list --numeric
  8. Observe that there's an entry for the load balancer's external IP to be round robined onto all the pods and that the dead pod that sank with the failed node is still there, causing some of the requests hitting the LB service to fail.
    Anything else we need to know?:
    The node gets recognized as dead and the pod goes into a state of 'terminating' where it remains forever (> 1 hour).
    I use a vbox bridged interface on each host additionally to the ones that kubespray provisions by default. These make the boxes directly visible on my LAN, but my issue is internal cluster networking related.
    I use metallb with bgp with an OpenWRT router running Quagga peering with the metallb speakers. I think you probably don't need to know this because my issue is intra-cluster networking related, but want to be thorough here.

Environment:

  • Kubernetes version (use kubectl version): 1.13.4
  • Cloud provider or hardware configuration: VBox
  • OS (e.g: cat /etc/os-release):
NAME="Ubuntu"
VERSION="18.04.2 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.2 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
  • Kernel (e.g. uname -a):
    Linux k8s-1 4.15.0-46-generic #49-Ubuntu SMP Wed Feb 6 09:33:07 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: Kubespray
  • Network plugin and version (if this is a network-related bug): flannel
  • Others:
@petermetz

This comment has been minimized.

Copy link
Author

commented May 15, 2019

@kubernetes/sig-network-bugs

@k8s-ci-robot k8s-ci-robot added sig/network and removed needs-sig labels May 15, 2019

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented May 15, 2019

@petermetz: Reiterating the mentions to trigger a notification:
@kubernetes/sig-network-bugs

In response to this:

@kubernetes/sig-network-bugs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@athenabot

This comment has been minimized.

Copy link

commented May 15, 2019

/triage unresolved

Comment /remove-triage unresolved when the issue is assessed and confirmed.

🤖 I am a bot run by vllry. 👩‍🔬

@andrewsykim

This comment has been minimized.

Copy link
Member

commented May 16, 2019

@petermetz can you share any logs from kube-proxy?

@petermetz

This comment has been minimized.

Copy link
Author

commented May 17, 2019

@andrewsykim

FYI:
10.233.65.116 -> service's pod of voyager ingress controller
10.233.32.110 -> the service's internal IP
192.168.55.1 -> service's external IP (assigned by metallb)
10.233.65.119 -> coredns pod on the failed node

I0517 23:42:46.043950       1 graceful_termination.go:160] Trying to delete rs: 10.233.0.1:443/TCP/172.17.8.102:6443
I0517 23:42:46.045098       1 graceful_termination.go:174] Deleting rs: 10.233.0.1:443/TCP/172.17.8.102:6443
I0517 23:42:46.045245       1 graceful_termination.go:160] Trying to delete rs: 10.233.0.1:443/TCP/172.17.8.103:6443
I0517 23:42:46.045457       1 graceful_termination.go:174] Deleting rs: 10.233.0.1:443/TCP/172.17.8.103:6443
I0517 23:42:46.144543       1 graceful_termination.go:160] Trying to delete rs: 10.233.0.1:443/TCP/172.17.8.101:6443
I0517 23:42:46.144814       1 graceful_termination.go:171] Not deleting, RS 10.233.0.1:443/TCP/172.17.8.101:6443: 5 ActiveConn, 0 InactiveConn
W0517 23:42:46.938109       1 reflector.go:270] k8s.io/client-go/informers/factory.go:132: watch of *v1.Service ended with: too old resource version: 1561235 (1561243)
I0517 23:43:16.242798       1 graceful_termination.go:160] Trying to delete rs: 10.233.32.110:443/TCP/10.233.67.249:8443
I0517 23:43:16.243084       1 graceful_termination.go:171] Not deleting, RS 10.233.32.110:443/TCP/10.233.67.249:8443: 0 ActiveConn, 1 InactiveConn
I0517 23:43:16.243556       1 graceful_termination.go:160] Trying to delete rs: 10.233.32.110:56791/TCP/10.233.67.249:56791
I0517 23:43:16.243768       1 graceful_termination.go:174] Deleting rs: 10.233.32.110:56791/TCP/10.233.67.249:56791
I0517 23:43:28.325452       1 graceful_termination.go:160] Trying to delete rs: 10.233.31.221:80/TCP/10.233.65.121:8080
I0517 23:43:28.326313       1 graceful_termination.go:174] Deleting rs: 10.233.31.221:80/TCP/10.233.65.121:8080
I0517 23:43:28.435532       1 graceful_termination.go:160] Trying to delete rs: 10.233.0.3:53/UDP/10.233.65.119:53
I0517 23:43:28.435619       1 graceful_termination.go:171] Not deleting, RS 10.233.0.3:53/UDP/10.233.65.119:53: 0 ActiveConn, 8 InactiveConn
I0517 23:43:28.436047       1 graceful_termination.go:160] Trying to delete rs: 10.233.58.116:80/TCP/10.233.65.120:80
I0517 23:43:28.436248       1 graceful_termination.go:174] Deleting rs: 10.233.58.116:80/TCP/10.233.65.120:80
I0517 23:43:28.436725       1 graceful_termination.go:160] Trying to delete rs: 10.233.20.188:80/TCP/10.233.65.116:80
I0517 23:43:28.436916       1 graceful_termination.go:174] Deleting rs: 10.233.20.188:80/TCP/10.233.65.116:80
I0517 23:43:28.437414       1 graceful_termination.go:160] Trying to delete rs: 192.168.55.1:80/TCP/10.233.65.116:80
I0517 23:43:28.437617       1 graceful_termination.go:171] Not deleting, RS 192.168.55.1:80/TCP/10.233.65.116:80: 7 ActiveConn, 0 InactiveConn
I0517 23:43:28.439983       1 graceful_termination.go:160] Trying to delete rs: 192.168.66.131:32544/TCP/10.233.65.116:80
I0517 23:43:28.440155       1 graceful_termination.go:174] Deleting rs: 192.168.66.131:32544/TCP/10.233.65.116:80
I0517 23:43:28.440675       1 graceful_termination.go:160] Trying to delete rs: 10.0.2.15:32544/TCP/10.233.65.116:80
I0517 23:43:28.440766       1 graceful_termination.go:174] Deleting rs: 10.0.2.15:32544/TCP/10.233.65.116:80
I0517 23:43:28.441231       1 graceful_termination.go:160] Trying to delete rs: 10.233.64.0:32544/TCP/10.233.65.116:80
I0517 23:43:28.441359       1 graceful_termination.go:174] Deleting rs: 10.233.64.0:32544/TCP/10.233.65.116:80
I0517 23:43:28.441799       1 graceful_termination.go:160] Trying to delete rs: 10.233.64.1:32544/TCP/10.233.65.116:80
I0517 23:43:28.442070       1 graceful_termination.go:174] Deleting rs: 10.233.64.1:32544/TCP/10.233.65.116:80
I0517 23:43:28.442416       1 graceful_termination.go:160] Trying to delete rs: 127.0.0.1:32544/TCP/10.233.65.116:80
I0517 23:43:28.442711       1 graceful_termination.go:174] Deleting rs: 127.0.0.1:32544/TCP/10.233.65.116:80
I0517 23:43:28.443096       1 graceful_termination.go:160] Trying to delete rs: 172.17.8.101:32544/TCP/10.233.65.116:80
I0517 23:43:28.443338       1 graceful_termination.go:174] Deleting rs: 172.17.8.101:32544/TCP/10.233.65.116:80
I0517 23:43:28.443826       1 graceful_termination.go:160] Trying to delete rs: 172.18.0.1:32544/TCP/10.233.65.116:80
I0517 23:43:28.444010       1 graceful_termination.go:174] Deleting rs: 172.18.0.1:32544/TCP/10.233.65.116:80
I0517 23:43:28.444820       1 graceful_termination.go:160] Trying to delete rs: 10.233.0.3:9153/TCP/10.233.65.119:9153
I0517 23:43:28.444908       1 graceful_termination.go:174] Deleting rs: 10.233.0.3:9153/TCP/10.233.65.119:9153
I0517 23:43:28.446798       1 graceful_termination.go:160] Trying to delete rs: 10.233.27.58:5000/TCP/10.233.65.117:5000
I0517 23:43:28.447090       1 graceful_termination.go:174] Deleting rs: 10.233.27.58:5000/TCP/10.233.65.117:5000
I0517 23:43:28.447574       1 graceful_termination.go:160] Trying to delete rs: 10.233.0.3:53/TCP/10.233.65.119:53
I0517 23:43:28.447778       1 graceful_termination.go:174] Deleting rs: 10.233.0.3:53/TCP/10.233.65.119:53
I0517 23:43:41.233925       1 graceful_termination.go:160] Trying to delete rs: 10.233.32.110:443/TCP/10.233.67.249:8443
I0517 23:43:41.234528       1 graceful_termination.go:174] Deleting rs: 10.233.32.110:443/TCP/10.233.67.249:8443
I0517 23:43:41.234718       1 graceful_termination.go:93] lw: remote out of the list: 10.233.32.110:443/TCP/10.233.67.249:8443
I0517 23:43:41.235531       1 graceful_termination.go:160] Trying to delete rs: 10.233.0.3:53/UDP/10.233.65.119:53
I0517 23:43:41.235846       1 graceful_termination.go:171] Not deleting, RS 10.233.0.3:53/UDP/10.233.65.119:53: 0 ActiveConn, 8 InactiveConn
I0517 23:43:41.235909       1 graceful_termination.go:160] Trying to delete rs: 192.168.55.1:80/TCP/10.233.65.116:80
I0517 23:43:41.236110       1 graceful_termination.go:171] Not deleting, RS 192.168.55.1:80/TCP/10.233.65.116:80: 6 ActiveConn, 0 InactiveConn
I0517 23:44:41.236691       1 graceful_termination.go:160] Trying to delete rs: 10.233.0.3:53/UDP/10.233.65.119:53
I0517 23:44:41.237031       1 graceful_termination.go:171] Not deleting, RS 10.233.0.3:53/UDP/10.233.65.119:53: 0 ActiveConn, 8 InactiveConn
I0517 23:44:41.237103       1 graceful_termination.go:160] Trying to delete rs: 192.168.55.1:80/TCP/10.233.65.116:80
I0517 23:44:41.237779       1 graceful_termination.go:174] Deleting rs: 192.168.55.1:80/TCP/10.233.65.116:80
I0517 23:44:41.237919       1 graceful_termination.go:93] lw: remote out of the list: 192.168.55.1:80/TCP/10.233.65.116:80
I0517 23:45:41.241233       1 graceful_termination.go:160] Trying to delete rs: 10.233.0.3:53/UDP/10.233.65.119:53
I0517 23:45:41.241643       1 graceful_termination.go:171] Not deleting, RS 10.233.0.3:53/UDP/10.233.65.119:53: 0 ActiveConn, 7 InactiveConn
I0517 23:46:41.244267       1 graceful_termination.go:160] Trying to delete rs: 10.233.0.3:53/UDP/10.233.65.119:53
I0517 23:46:41.244449       1 graceful_termination.go:171] Not deleting, RS 10.233.0.3:53/UDP/10.233.65.119:53: 0 ActiveConn, 7 InactiveConn
I0517 23:47:41.247946       1 graceful_termination.go:160] Trying to delete rs: 10.233.0.3:53/UDP/10.233.65.119:53
I0517 23:47:41.248866       1 graceful_termination.go:171] Not deleting, RS 10.233.0.3:53/UDP/10.233.65.119:53: 0 ActiveConn, 4 InactiveConn
I0517 23:48:41.249137       1 graceful_termination.go:160] Trying to delete rs: 10.233.0.3:53/UDP/10.233.65.119:53
I0517 23:48:41.249575       1 graceful_termination.go:174] Deleting rs: 10.233.0.3:53/UDP/10.233.65.119:53
I0517 23:48:41.249726       1 graceful_termination.go:93] lw: remote out of the list: 10.233.0.3:53/UDP/10.233.65.119:53
@andrewsykim

This comment has been minimized.

Copy link
Member

commented May 18, 2019

@petermetz are you able to confirm that this is only reproducible for UDP backends? If so #77802 might fix this.

@petermetz

This comment has been minimized.

Copy link
Author

commented May 18, 2019

@andrewsykim I can't, my service is TCP (HTTP)

@andrewsykim

This comment has been minimized.

Copy link
Member

commented May 18, 2019

@petermetz sorry I was confused because I thought the "Not deleting RS" error was only occuring for your DNS backends using UDP, I see now it applies for both. It looks like those backends aren't being deleted because IPVS sees it has active connections (connections left in ESTABLISHED state) even though the backened no longer exists. There's some logic in there to try to gracefully terminate a backend but the timeout for waiting til active connections is 0 is pretty long. cc @lbernail possible similar situation as #77802 for TCP case?

@andrewsykim

This comment has been minimized.

Copy link
Member

commented May 18, 2019

Worth noting here that the weight on the IPVS backend is set to 0, the entry just isn't removed until it sees there are no connections (for up to 15m). Can you confirm if the weight on those IPVS backends is 0 (via sudo ipvsadm -L)?

@petermetz

This comment has been minimized.

Copy link
Author

commented May 18, 2019

@andrewsykim No worries and thanks for taking the time to respond and investigate.
I can confirm that the weights do get updated to zero for the dead pods. In the test I performed just now I even witnessed the entries getting deleted properly so it seems I'm not consistently able to reproduce the issue either. It may have to do something with the HA master setup (3 master 1 worker) because the node I'm shutting down is a master.

@petermetz

This comment has been minimized.

Copy link
Author

commented May 19, 2019

@andrewsykim Re-ran the test again just now and I have to report that the ipvs weights don't get updated either. I have the dead pods entries with weight 1 in the ipvs rules 20 minutes after the node went down that contained the pods. The difference compared to my previous test run:

  • I shut down the third master node, not the second
  • I scaled the service from 4 pods to 8

When I look at the logs of the kube-proxy pod running on my first master, it shows that it's unable to connect to the kubernetes API and that it's trying to hit the IP of the third master (which is down as per my test).
So I can see now that my issue isn't necessarily with the way kube-proxy maintains ipvs rules but instead the way it handles API failover in a multi-master scenario. I'm not sure yet if this is something with kube-proxy itself or the way kubespray sets up the HA multi-master failover.
What is concerning is that the kube apiserver and kube controller pods on the third master node appear to be in state 'running' when I do kubectl get pods,svc,node --all-namespaces --output=wide from the first master which is wrong because the node has been dead for 20+ minutes.

TCP  192.168.55.1:80 rr
  -> 10.233.64.163:80             Masq    1      0          25
  -> 10.233.64.164:80             Masq    1      0          25
  -> 10.233.65.136:80             Masq    1      0          24
  -> 10.233.65.137:80             Masq    1      0          24
  -> 10.233.66.169:80             Masq    1      13         0
  -> 10.233.66.171:80             Masq    1      12         0
  -> 10.233.67.19:80              Masq    1      0          25
  -> 10.233.67.20:80              Masq    1      0          24

These are the dead pods entries extracted from the output above

  -> 10.233.66.169:80             Masq    1      13         0
  -> 10.233.66.171:80             Masq    1      12         0

kube-proxy logs:

E0519 00:28:11.468227       1 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Endpoints: Get https://172.17.8.103:6443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 172.17.8.103:6443: connect: no route to host
E0519 00:28:11.469216       1 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Service: Get https://172.17.8.103:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 172.17.8.103:6443: connect: no route to host
E0519 00:28:14.545646       1 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Endpoints: Get https://172.17.8.103:6443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 172.17.8.103:6443: connect: no route to host
E0519 00:28:14.546172       1 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Service: Get https://172.17.8.103:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 172.17.8.103:6443: connect: no route to host
E0519 00:28:17.613326       1 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Service: Get https://172.17.8.103:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 172.17.8.103:6443: connect: no route to host
E0519 00:28:17.613700       1 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Endpoints: Get https://172.17.8.103:6443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 172.17.8.103:6443: connect: no route to host
E0519 00:28:20.684131       1 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Endpoints: Get https://172.17.8.103:6443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 172.17.8.103:6443: connect: no route to host
E0519 00:28:20.684532       1 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Service: Get https://172.17.8.103:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 172.17.8.103:6443: connect: no route to host
E0519 00:28:23.757822       1 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Endpoints: Get https://172.17.8.103:6443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 172.17.8.103:6443: connect: no route to host
E0519 00:28:23.758383       1 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Service: Get https://172.17.8.103:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 172.17.8.103:6443: connect: no route to host
E0519 00:28:26.827865       1 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Service: Get https://172.17.8.103:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 172.17.8.103:6443: connect: no route to host
E0519 00:28:26.827882       1 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Endpoints: Get https://172.17.8.103:6443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 172.17.8.103:6443: connect: no route to host
E0519 00:28:29.901647       1 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Endpoints: Get https://172.17.8.103:6443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 172.17.8.103:6443: connect: no route to host
E0519 00:28:29.901647       1 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Service: Get https://172.17.8.103:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 172.17.8.103:6443: connect: no route to host
E0519 00:28:32.973089       1 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Service: Get https://172.17.8.103:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 172.17.8.103:6443: connect: no route to host
E0519 00:28:32.973137       1 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Endpoints: Get https://172.17.8.103:6443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 172.17.8.103:6443: connect: no route to host
E0519 00:28:36.043681       1 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Endpoints: Get https://172.17.8.103:6443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 172.17.8.103:6443: connect: no route to host
E0519 00:28:36.044547       1 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Service: Get https://172.17.8.103:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 172.17.8.103:6443: connect: no route to host
E0519 00:28:39.117288       1 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Service: Get https://172.17.8.103:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 172.17.8.103:6443: connect: no route to host
E0519 00:28:39.118974       1 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Endpoints: Get https://172.17.8.103:6443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 172.17.8.103:6443: connect: no route to host
E0519 00:28:42.193213       1 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Endpoints: Get https://172.17.8.103:6443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 172.17.8.103:6443: connect: no route to host
E0519 00:28:42.193704       1 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Service: Get https://172.17.8.103:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 172.17.8.103:6443: connect: no route to host
@petermetz

This comment has been minimized.

Copy link
Author

commented May 19, 2019

Finally I ended up at this one, realizing that this issue is just a duplicate of that: #18174

@petermetz petermetz closed this May 19, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.