Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

control plane load balancing does not work #454

Closed
willzhang opened this issue Sep 21, 2022 · 9 comments
Closed

control plane load balancing does not work #454

willzhang opened this issue Sep 21, 2022 · 9 comments

Comments

@willzhang
Copy link

willzhang commented Sep 21, 2022

Describe the bug
control plane load balancing does not work

To Reproduce
Steps to reproduce the behavior:

root@master1:~# cat kubeadm.yaml 
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.72.30
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///var/run/containerd/containerd.sock
  imagePullPolicy: IfNotPresent
  name: master1
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiServer:
  timeoutForControlPlane: 4m0s
  extraArgs:
    authorization-mode: Node,RBAC
  certSANs:
  - apiserver.k8s.local
  - master1
  - master2
  - master3
  - worker1
  - 192.168.72.30
  - 192.168.72.31
  - 192.168.72.32
  - 192.168.72.33
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: 1.25.0
controlPlaneEndpoint: apiserver.k8s.local:6443
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
  podSubnet: 10.244.0.0/16
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"

init cluster with

 kubeadm init --upload-certs --config kubeadm.yaml

Expected behavior
control plane load balancing with ipvs and vip.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • OS/Distro: Ubuntu 2204
  • Kubernetes Version: v.1.25.0
  • Kube-vip Version: 0.5.0

Kube-vip.yaml:

root@master1:~# cat /etc/kubernetes/manifests/kube-vip.yaml 
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  name: kube-vip
  namespace: kube-system
spec:
  containers:
  - args:
    - manager
    env:
    - name: vip_arp
      value: "true"
    - name: port
      value: "6443"
    - name: vip_interface
      value: ens160
    - name: vip_cidr
      value: "32"
    - name: cp_enable
      value: "true"
    - name: cp_namespace
      value: kube-system
    - name: vip_ddns
      value: "false"
    - name: svc_enable
      value: "true"
    - name: vip_leaderelection
      value: "true"
    - name: vip_leaseduration
      value: "5"
    - name: vip_renewdeadline
      value: "3"
    - name: vip_retryperiod
      value: "1"
    - name: lb_enable
      value: "true"
    - name: lb_port
      value: "6443"
    - name: lb_fwdmethod
      value: local
    - name: address
      value: apiserver.k8s.local
    - name: prometheus_server
      value: :2112
    image: ghcr.io/kube-vip/kube-vip:v0.5.0
    imagePullPolicy: IfNotPresent
    name: kube-vip
    resources: {}
    securityContext:
      capabilities:
        add:
        - NET_ADMIN
        - NET_RAW
    volumeMounts:
    - mountPath: /etc/kubernetes/admin.conf
      name: kubeconfig
  hostAliases:
  - hostnames:
    - kubernetes
    ip: 127.0.0.1
  hostNetwork: true
  volumes:
  - hostPath:
      path: /etc/kubernetes/admin.conf
    name: kubeconfig
status: {}

Additional context

can not see ipvs loadbalaning with vip 192.168.72.200

root@master1:~# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.96.0.1:443 rr
  -> 192.168.72.30:6443           Masq    1      0          0         
TCP  10.96.0.10:53 rr
TCP  10.96.0.10:9153 rr
UDP  10.96.0.10:53 rr
root@master1:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:ad:69:e1 brd ff:ff:ff:ff:ff:ff
    altname enp3s0
    inet 192.168.72.30/24 brd 192.168.72.255 scope global ens160
       valid_lft forever preferred_lft forever
    inet 192.168.72.200/32 scope global deprecated dynamic ens160
       valid_lft 59sec preferred_lft 0sec
    inet6 fe80::250:56ff:fead:69e1/64 scope link 
       valid_lft forever preferred_lft forever
3: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default 
    link/ether 46:88:38:90:21:f1 brd ff:ff:ff:ff:ff:ff
    inet 10.96.0.1/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.96.0.10/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
root@master1:~# 

kube-vip pod logs

root@master1:~# kubectl -n kube-system  logs  kube-vip-master1 | more
time="2022-09-21T14:34:44Z" level=info msg="Starting kube-vip.io [v0.5.0]"
time="2022-09-21T14:34:44Z" level=info msg="namespace [kube-system], Mode: [ARP], Features(s): Control Plane:[true], Services:[true]"
time="2022-09-21T14:34:44Z" level=info msg="prometheus HTTP server started"
time="2022-09-21T14:34:44Z" level=info msg="Starting Kube-vip Manager with the ARP engine"
time="2022-09-21T14:34:44Z" level=info msg="beginning services leadership, namespace [kube-system], lock name [plndr-svcs-lock], id [master1]"
I0921 14:34:44.427389       1 leaderelection.go:248] attempting to acquire leader lease kube-system/plndr-svcs-lock...
time="2022-09-21T14:34:44Z" level=info msg="Beginning cluster membership, namespace [kube-system], lock name [plndr-cp-lock], id [master1]"
I0921 14:34:44.434777       1 leaderelection.go:248] attempting to acquire leader lease kube-system/plndr-cp-lock...
E0921 14:34:44.435347       1 leaderelection.go:330] error retrieving resource lock kube-system/plndr-cp-lock: Get "https://kubernetes:6443/apis/coordination.k8s.io/v1/nam
espaces/kube-system/leases/plndr-cp-lock": dial tcp 127.0.0.1:6443: connect: connection refused
E0921 14:34:44.435313       1 leaderelection.go:330] error retrieving resource lock kube-system/plndr-svcs-lock: Get "https://kubernetes:6443/apis/coordination.k8s.io/v1/n
amespaces/kube-system/leases/plndr-svcs-lock": dial tcp 127.0.0.1:6443: connect: connection refused
I0921 14:34:48.449944       1 leaderelection.go:258] successfully acquired lease kube-system/plndr-cp-lock
time="2022-09-21T14:34:48Z" level=info msg="Node [master1] is assuming leadership of the cluster"
time="2022-09-21T14:34:48Z" level=info msg="starting the DNS updater for the address apiserver.k8s.local"
I0921 14:34:48.450467       1 leaderelection.go:258] successfully acquired lease kube-system/plndr-svcs-lock
time="2022-09-21T14:34:48Z" level=info msg="Starting IPVS LoadBalancer"
time="2022-09-21T14:34:48Z" level=info msg="IPVS Loadbalancer enabled for 1.2.1"
time="2022-09-21T14:34:48Z" level=info msg="Gratuitous Arp broadcast will repeat every 3 seconds for [192.168.72.200]"
time="2022-09-21T14:34:48Z" level=info msg="Kube-Vip is watching nodes for control-plane labels"
time="2022-09-21T14:34:48Z" level=info msg="setting 192.168.72.200 as an IP"
time="2022-09-21T14:34:51Z" level=info msg="setting 192.168.72.200 as an IP"
time="2022-09-21T14:34:52Z" level=error msg="Error querying backends file does not exist"
time="2022-09-21T14:34:52Z" level=info msg="Created Load-Balancer services on [192.168.72.200:6443]"
time="2022-09-21T14:34:52Z" level=info msg="Added backend for [192.168.72.200:6443] on [192.168.72.30:6443]"
time="2022-09-21T14:34:54Z" level=info msg="setting 192.168.72.200 as an IP"
time="2022-09-21T14:34:57Z" level=info msg="setting 192.168.72.200 as an IP"
time="2022-09-21T14:35:00Z" level=info msg="setting 192.168.72.200 as an IP"
time="2022-09-21T14:35:03Z" level=info msg="setting 192.168.72.200 as an IP"
time="2022-09-21T14:35:06Z" level=info msg="setting 192.168.72.200 as an IP"
@lwabish
Copy link

lwabish commented Jan 11, 2023

same here

@lwabish
Copy link

lwabish commented Jan 13, 2023

Kubekey uses kube-vip to deploy ha cluster, same issue found and can be solved by add node cidr into kube-proxy configuration.
Refer to : kubekey-1702
Maybe we should add some instruction on kube vip's website to remind users of this bug

@cuiliang0302
Copy link

me too

os:Rocky Linux release 9.2
kernel:5.14.0-284.18.1.el9_2.x86_64
kubernetes:1.27.4
containerd:1.6.20
kube-vip:0.6.0

@ii2day
Copy link
Contributor

ii2day commented Dec 15, 2023

Has anyone solved this problem yet?

@smokes2345
Copy link

I'm still fuzzy on the details, but in the issue linked by @lwabish, it looks like the issue was resolved by telling the proxy service to ignore the subnet the control plane nodes live on. my guess is there is something like a race condition being generated. i added that subnet to the no_proxy config for my kubespray deployment, but it does not seem to have made a difference after running the playbook again.

@blackliner
Copy link

blackliner commented Jan 12, 2024

I had a similar issue, but using kubespray and metallb. The LB IP from the CP was gone, and I got the same error messages as above. Fortunately, Kubespray has a way to specify this exclusion --> https://github.com/kubernetes-sigs/kubespray/blob/747d8bb4c2d31669b2d7eed2b38bc4da2c689fab/roles/kubernetes/control-plane/defaults/main/kube-proxy.yml#L68

Correction, after applying the config changes and rerunning the kubespray playbooks, the error still occurs. I need to double check if the config made it's way through or not...

time="2024-01-12T19:37:34Z" level=error msg="Error querying backends file does not exist"
time="2024-01-12T19:37:34Z" level=info msg="Created Load-Balancer services on [10.128.5.1:6443]"
time="2024-01-12T19:37:34Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.14:6443]"
time="2024-01-12T19:37:34Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.13:6443]"
time="2024-01-12T19:37:39Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.12:6443]"
time="2024-01-12T19:38:05Z" level=error msg="Error querying backends file does not exist"
time="2024-01-12T19:38:05Z" level=info msg="Created Load-Balancer services on [10.128.5.1:6443]"
time="2024-01-12T19:38:05Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.14:6443]"
time="2024-01-12T19:38:05Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.13:6443]"
time="2024-01-12T19:38:10Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.12:6443]"
time="2024-01-12T19:38:30Z" level=error msg="Error querying backends file does not exist"
time="2024-01-12T19:38:30Z" level=info msg="Created Load-Balancer services on [10.128.5.1:6443]"
time="2024-01-12T19:38:30Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.12:6443]"
time="2024-01-12T19:38:36Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.14:6443]"
time="2024-01-12T19:38:36Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.13:6443]

EDIT2: the kube-proxy configMap was not modified yet. But I doupt it is actually the issue, since there are no logs mentioning proxy deleting this IP (10.128.5.1), and I have watch -n 0 ip a show bond0 running on all three nodes, and one of them has the right IP all the time
image

@rraj-gautam
Copy link

Kubekey uses kube-vip to deploy ha cluster, same issue found and can be solved by add node cidr into kube-proxy configuration. Refer to : kubekey-1702 Maybe we should add some instruction on kube vip's website to remind users of this bug

This worked for me. My kubeadm-config.yaml

apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: "192.168.2.160" #control plane node local ip
  bindPort: 6443
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
imageRepository: registry.k8s.io
kubernetesVersion: 1.28.2
kubeProxyArgs: ["--ipvs-exclude-cidrs=192.168.2.0/24"] ###### cidr of node network #######
controlPlaneEndpoint: "192.168.2.159:6443" # loadbalancer VIP 
networking:
  serviceSubnet: 10.96.0.0/12
  podSubnet: "10.32.0.0/12"  
apiServer:
  timeoutForControlPlane: 4m0s
  certSANs:
      - "master01"
      - "master02"
      - "192.168.2.160"
      - "192.168.2.161"
      - "192.168.2.159"
      - "127.0.0.1"
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
    serverCertSANs:
      - "master01"
      - "master02"
      - "192.168.2.160"
      - "192.168.2.161"
      - "192.168.2.159"
      - "127.0.0.1"
    peerCertSANs:
      - "master01"
      - "master02"
      - "192.168.2.160"
      - "192.168.2.161"
      - "192.168.2.159"
      - "127.0.0.1"
---
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: "systemd"
  #cgroupDriver: cgroupfs

@lubronzhan
Copy link
Contributor

Yeah we should better add this to the doc

@thebsdbox
Copy link
Collaborator

This is now part of the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants