control plane load balancing does not work #454

willzhang · 2022-09-21T15:14:26Z

Describe the bug
control plane load balancing does not work

To Reproduce
Steps to reproduce the behavior:

root@master1:~# cat kubeadm.yaml 
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.72.30
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///var/run/containerd/containerd.sock
  imagePullPolicy: IfNotPresent
  name: master1
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiServer:
  timeoutForControlPlane: 4m0s
  extraArgs:
    authorization-mode: Node,RBAC
  certSANs:
  - apiserver.k8s.local
  - master1
  - master2
  - master3
  - worker1
  - 192.168.72.30
  - 192.168.72.31
  - 192.168.72.32
  - 192.168.72.33
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: 1.25.0
controlPlaneEndpoint: apiserver.k8s.local:6443
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
  podSubnet: 10.244.0.0/16
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"

init cluster with

 kubeadm init --upload-certs --config kubeadm.yaml

Expected behavior
control plane load balancing with ipvs and vip.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

OS/Distro: Ubuntu 2204
Kubernetes Version: v.1.25.0
Kube-vip Version: 0.5.0

Kube-vip.yaml:

root@master1:~# cat /etc/kubernetes/manifests/kube-vip.yaml 
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  name: kube-vip
  namespace: kube-system
spec:
  containers:
  - args:
    - manager
    env:
    - name: vip_arp
      value: "true"
    - name: port
      value: "6443"
    - name: vip_interface
      value: ens160
    - name: vip_cidr
      value: "32"
    - name: cp_enable
      value: "true"
    - name: cp_namespace
      value: kube-system
    - name: vip_ddns
      value: "false"
    - name: svc_enable
      value: "true"
    - name: vip_leaderelection
      value: "true"
    - name: vip_leaseduration
      value: "5"
    - name: vip_renewdeadline
      value: "3"
    - name: vip_retryperiod
      value: "1"
    - name: lb_enable
      value: "true"
    - name: lb_port
      value: "6443"
    - name: lb_fwdmethod
      value: local
    - name: address
      value: apiserver.k8s.local
    - name: prometheus_server
      value: :2112
    image: ghcr.io/kube-vip/kube-vip:v0.5.0
    imagePullPolicy: IfNotPresent
    name: kube-vip
    resources: {}
    securityContext:
      capabilities:
        add:
        - NET_ADMIN
        - NET_RAW
    volumeMounts:
    - mountPath: /etc/kubernetes/admin.conf
      name: kubeconfig
  hostAliases:
  - hostnames:
    - kubernetes
    ip: 127.0.0.1
  hostNetwork: true
  volumes:
  - hostPath:
      path: /etc/kubernetes/admin.conf
    name: kubeconfig
status: {}

Additional context

can not see ipvs loadbalaning with vip 192.168.72.200

root@master1:~# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.96.0.1:443 rr
  -> 192.168.72.30:6443           Masq    1      0          0         
TCP  10.96.0.10:53 rr
TCP  10.96.0.10:9153 rr
UDP  10.96.0.10:53 rr
root@master1:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:ad:69:e1 brd ff:ff:ff:ff:ff:ff
    altname enp3s0
    inet 192.168.72.30/24 brd 192.168.72.255 scope global ens160
       valid_lft forever preferred_lft forever
    inet 192.168.72.200/32 scope global deprecated dynamic ens160
       valid_lft 59sec preferred_lft 0sec
    inet6 fe80::250:56ff:fead:69e1/64 scope link 
       valid_lft forever preferred_lft forever
3: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default 
    link/ether 46:88:38:90:21:f1 brd ff:ff:ff:ff:ff:ff
    inet 10.96.0.1/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.96.0.10/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
root@master1:~#

kube-vip pod logs

root@master1:~# kubectl -n kube-system  logs  kube-vip-master1 | more
time="2022-09-21T14:34:44Z" level=info msg="Starting kube-vip.io [v0.5.0]"
time="2022-09-21T14:34:44Z" level=info msg="namespace [kube-system], Mode: [ARP], Features(s): Control Plane:[true], Services:[true]"
time="2022-09-21T14:34:44Z" level=info msg="prometheus HTTP server started"
time="2022-09-21T14:34:44Z" level=info msg="Starting Kube-vip Manager with the ARP engine"
time="2022-09-21T14:34:44Z" level=info msg="beginning services leadership, namespace [kube-system], lock name [plndr-svcs-lock], id [master1]"
I0921 14:34:44.427389       1 leaderelection.go:248] attempting to acquire leader lease kube-system/plndr-svcs-lock...
time="2022-09-21T14:34:44Z" level=info msg="Beginning cluster membership, namespace [kube-system], lock name [plndr-cp-lock], id [master1]"
I0921 14:34:44.434777       1 leaderelection.go:248] attempting to acquire leader lease kube-system/plndr-cp-lock...
E0921 14:34:44.435347       1 leaderelection.go:330] error retrieving resource lock kube-system/plndr-cp-lock: Get "https://kubernetes:6443/apis/coordination.k8s.io/v1/nam
espaces/kube-system/leases/plndr-cp-lock": dial tcp 127.0.0.1:6443: connect: connection refused
E0921 14:34:44.435313       1 leaderelection.go:330] error retrieving resource lock kube-system/plndr-svcs-lock: Get "https://kubernetes:6443/apis/coordination.k8s.io/v1/n
amespaces/kube-system/leases/plndr-svcs-lock": dial tcp 127.0.0.1:6443: connect: connection refused
I0921 14:34:48.449944       1 leaderelection.go:258] successfully acquired lease kube-system/plndr-cp-lock
time="2022-09-21T14:34:48Z" level=info msg="Node [master1] is assuming leadership of the cluster"
time="2022-09-21T14:34:48Z" level=info msg="starting the DNS updater for the address apiserver.k8s.local"
I0921 14:34:48.450467       1 leaderelection.go:258] successfully acquired lease kube-system/plndr-svcs-lock
time="2022-09-21T14:34:48Z" level=info msg="Starting IPVS LoadBalancer"
time="2022-09-21T14:34:48Z" level=info msg="IPVS Loadbalancer enabled for 1.2.1"
time="2022-09-21T14:34:48Z" level=info msg="Gratuitous Arp broadcast will repeat every 3 seconds for [192.168.72.200]"
time="2022-09-21T14:34:48Z" level=info msg="Kube-Vip is watching nodes for control-plane labels"
time="2022-09-21T14:34:48Z" level=info msg="setting 192.168.72.200 as an IP"
time="2022-09-21T14:34:51Z" level=info msg="setting 192.168.72.200 as an IP"
time="2022-09-21T14:34:52Z" level=error msg="Error querying backends file does not exist"
time="2022-09-21T14:34:52Z" level=info msg="Created Load-Balancer services on [192.168.72.200:6443]"
time="2022-09-21T14:34:52Z" level=info msg="Added backend for [192.168.72.200:6443] on [192.168.72.30:6443]"
time="2022-09-21T14:34:54Z" level=info msg="setting 192.168.72.200 as an IP"
time="2022-09-21T14:34:57Z" level=info msg="setting 192.168.72.200 as an IP"
time="2022-09-21T14:35:00Z" level=info msg="setting 192.168.72.200 as an IP"
time="2022-09-21T14:35:03Z" level=info msg="setting 192.168.72.200 as an IP"
time="2022-09-21T14:35:06Z" level=info msg="setting 192.168.72.200 as an IP"

The text was updated successfully, but these errors were encountered:

lwabish · 2023-01-11T08:04:34Z

same here

lwabish · 2023-01-13T01:18:53Z

Kubekey uses kube-vip to deploy ha cluster, same issue found and can be solved by add node cidr into kube-proxy configuration.
Refer to : kubekey-1702
Maybe we should add some instruction on kube vip's website to remind users of this bug

cuiliang0302 · 2023-08-11T14:35:59Z

me too

os：Rocky Linux release 9.2
kernel：5.14.0-284.18.1.el9_2.x86_64
kubernetes：1.27.4
containerd：1.6.20
kube-vip：0.6.0

ii2day · 2023-12-15T08:54:12Z

Has anyone solved this problem yet?

smokes2345 · 2023-12-24T01:28:42Z

I'm still fuzzy on the details, but in the issue linked by @lwabish, it looks like the issue was resolved by telling the proxy service to ignore the subnet the control plane nodes live on. my guess is there is something like a race condition being generated. i added that subnet to the no_proxy config for my kubespray deployment, but it does not seem to have made a difference after running the playbook again.

blackliner · 2024-01-12T17:14:10Z

I had a similar issue, but using kubespray and metallb. The LB IP from the CP was gone, and I got the same error messages as above. Fortunately, Kubespray has a way to specify this exclusion --> https://github.com/kubernetes-sigs/kubespray/blob/747d8bb4c2d31669b2d7eed2b38bc4da2c689fab/roles/kubernetes/control-plane/defaults/main/kube-proxy.yml#L68

Correction, after applying the config changes and rerunning the kubespray playbooks, the error still occurs. I need to double check if the config made it's way through or not...

time="2024-01-12T19:37:34Z" level=error msg="Error querying backends file does not exist"
time="2024-01-12T19:37:34Z" level=info msg="Created Load-Balancer services on [10.128.5.1:6443]"
time="2024-01-12T19:37:34Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.14:6443]"
time="2024-01-12T19:37:34Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.13:6443]"
time="2024-01-12T19:37:39Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.12:6443]"
time="2024-01-12T19:38:05Z" level=error msg="Error querying backends file does not exist"
time="2024-01-12T19:38:05Z" level=info msg="Created Load-Balancer services on [10.128.5.1:6443]"
time="2024-01-12T19:38:05Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.14:6443]"
time="2024-01-12T19:38:05Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.13:6443]"
time="2024-01-12T19:38:10Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.12:6443]"
time="2024-01-12T19:38:30Z" level=error msg="Error querying backends file does not exist"
time="2024-01-12T19:38:30Z" level=info msg="Created Load-Balancer services on [10.128.5.1:6443]"
time="2024-01-12T19:38:30Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.12:6443]"
time="2024-01-12T19:38:36Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.14:6443]"
time="2024-01-12T19:38:36Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.13:6443]

EDIT2: the kube-proxy configMap was not modified yet. But I doupt it is actually the issue, since there are no logs mentioning proxy deleting this IP (10.128.5.1), and I have watch -n 0 ip a show bond0 running on all three nodes, and one of them has the right IP all the time

rraj-gautam · 2024-01-18T20:42:58Z

Kubekey uses kube-vip to deploy ha cluster, same issue found and can be solved by add node cidr into kube-proxy configuration. Refer to : kubekey-1702 Maybe we should add some instruction on kube vip's website to remind users of this bug

This worked for me. My kubeadm-config.yaml

apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: "192.168.2.160" #control plane node local ip
  bindPort: 6443
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
imageRepository: registry.k8s.io
kubernetesVersion: 1.28.2
kubeProxyArgs: ["--ipvs-exclude-cidrs=192.168.2.0/24"] ###### cidr of node network #######
controlPlaneEndpoint: "192.168.2.159:6443" # loadbalancer VIP 
networking:
  serviceSubnet: 10.96.0.0/12
  podSubnet: "10.32.0.0/12"  
apiServer:
  timeoutForControlPlane: 4m0s
  certSANs:
      - "master01"
      - "master02"
      - "192.168.2.160"
      - "192.168.2.161"
      - "192.168.2.159"
      - "127.0.0.1"
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
    serverCertSANs:
      - "master01"
      - "master02"
      - "192.168.2.160"
      - "192.168.2.161"
      - "192.168.2.159"
      - "127.0.0.1"
    peerCertSANs:
      - "master01"
      - "master02"
      - "192.168.2.160"
      - "192.168.2.161"
      - "192.168.2.159"
      - "127.0.0.1"
---
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: "systemd"
  #cgroupDriver: cgroupfs

lubronzhan · 2024-01-24T20:29:54Z

Yeah we should better add this to the doc

thebsdbox · 2024-03-01T11:08:22Z

This is now part of the documentation.

Divinii mentioned this issue Jan 13, 2024

Kube-VIP for Cilium KubeProxyReplacement - error retrieving resource lock kube-system/plndr-cp-lock: Get dial tcp 10.43.0.1:443: i/o timeout #721

Open

lubronzhan mentioned this issue Jan 24, 2024

Document known issues of 454 kube-vip/website#39

Merged

lubronzhan mentioned this issue Feb 6, 2024

control plane load balancing conflicts with ipvs based service #745

Closed

thebsdbox closed this as completed Mar 1, 2024

prezha mentioned this issue Mar 9, 2024

enable loadbalancer for ha kubernetes/minikube#18345

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

control plane load balancing does not work #454

control plane load balancing does not work #454

willzhang commented Sep 21, 2022 •

edited

lwabish commented Jan 11, 2023

lwabish commented Jan 13, 2023

cuiliang0302 commented Aug 11, 2023

ii2day commented Dec 15, 2023

smokes2345 commented Dec 24, 2023

blackliner commented Jan 12, 2024 •

edited

rraj-gautam commented Jan 18, 2024

lubronzhan commented Jan 24, 2024

thebsdbox commented Mar 1, 2024

control plane load balancing does not work #454

control plane load balancing does not work #454

Comments

willzhang commented Sep 21, 2022 • edited

lwabish commented Jan 11, 2023

lwabish commented Jan 13, 2023

cuiliang0302 commented Aug 11, 2023

ii2day commented Dec 15, 2023

smokes2345 commented Dec 24, 2023

blackliner commented Jan 12, 2024 • edited

rraj-gautam commented Jan 18, 2024

lubronzhan commented Jan 24, 2024

thebsdbox commented Mar 1, 2024

willzhang commented Sep 21, 2022 •

edited

blackliner commented Jan 12, 2024 •

edited