Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial setup of a k8s cluster with kubespray breaks if kube-vip is enabled #11229

Closed
Mazorius opened this issue May 22, 2024 · 9 comments · Fixed by #11422
Closed

Initial setup of a k8s cluster with kubespray breaks if kube-vip is enabled #11229

Mazorius opened this issue May 22, 2024 · 9 comments · Fixed by #11422
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@Mazorius
Copy link

What happened?

Running an initial cluster creation breaks always on registering first master if kube-vip is enabled.

What did you expect to happen?

In the initial phase kube-vip does not block the registration of the first control-plane.

How can we reproduce it (as minimally and precisely as possible)?

Deploy a minimal cluster in a fresh environment and activate kube-vip beforehand via addons.yml.

OS

Linux 5.15.0-102-generic x86_64
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Version of Ansible

ansible [core 2.16.7]
config file = ansible.cfg
configured module search path = ['library']
ansible python module location = venv/lib/python3.12/site-packages/ansible
ansible collection location = /Users/****/.ansible/collections:/usr/share/ansible/collections:/etc/ansible/collections:collections
executable location = venv/bin/ansible
python version = 3.12.3 (main, Apr 9 2024, 08:09:14) [Clang 15.0.0 (clang-1500.3.9.4)] (venv/bin/python)
jinja version = 3.1.4
libyaml = True

Version of Python

Python 3.12.3

Version of Kubespray (commit)

Collection (2.25.0)

Network plugin used

calico

Full inventory with variables

all:
children:
bastion:
hosts:
bastion:
ansible_host: 10.12.3.61
ip: 10.12.3.61
kube_control_plane:
hosts:
hk8scpfra1:
ansible_host: 10.12.3.11
ip: 10.12.3.11
hk8scpfra2:
ansible_host: 10.12.3.12
ip: 10.12.3.12
hk8scpfra3:
ansible_host: 10.12.3.13
ip: 10.12.3.13
worker_node:
hosts:
hk8swfra1:
ansible_host: 10.12.3.21
ip: 10.12.3.21
hk8swfra2:
ansible_host: 10.12.3.22
ip: 10.12.3.22
hk8swfra3:
ansible_host: 10.12.3.23
ip: 10.12.3.23
vars:
node_labels:
node-role.kubernetes.io/worker: ""
node.cluster.x-k8s.io/nodegroup: worker
database_node:
hosts:
hk8sdbfra1:
ansible_host: 10.12.3.31
ip: 10.12.3.31
hk8sdbfra2:
ansible_host: 10.12.3.32
ip: 10.12.3.32
hk8sdbfra3:
ansible_host: 10.12.3.33
ip: 10.12.3.33
vars:
node_taints:
- 'dedicated=database:NoSchedule'
node_labels:
node-role.kubernetes.io/database: ""
node.cluster.x-k8s.io/nodegroup: database
monitor_node:
hosts:
hk8smfra1:
ansible_host: 10.12.3.41
ip: 10.12.3.41
hk8smfra2:
ansible_host: 10.12.3.42
ip: 10.12.3.42
hk8smfra3:
ansible_host: 10.12.3.43
ip: 10.12.3.43
vars:
node_taints:
- 'dedicated=monitor:NoSchedule'
node_labels:
node-role.kubernetes.io/monitor: ""
node.cluster.x-k8s.io/nodegroup: monitor
teleport_node:
hosts:
hk8stfra1:
ansible_host: 10.12.3.51
ip: 10.12.3.51
hk8stfra2:
ansible_host: 10.12.3.52
ip: 10.12.3.52
hk8stfra3:
ansible_host: 10.12.3.53
ip: 10.12.3.53
vars:
node_taints:
- 'dedicated=teleport:NoSchedule'
node_labels:
node-role.kubernetes.io/teleport: ""
node.cluster.x-k8s.io/nodegroup: teleport
k8s_cluster:
children:
kube_control_plane:
worker_node:
database_node:
monitor_node:
teleport_node:
etcd:
children:
kube_control_plane:
kube_node:
children:
worker_node:
database_node:
monitor_node:
teleport_node:
calico_rr:
hosts: {}

Command used to invoke ansible

ansible-playbook --inventory inventory-local.yml --become --become-user=root --private-key=~/.ssh/key_2024-04-10 cluster.yml

Output of ansible run

kubeadm | Initialize first master

failed

Anything else we need to know

Kubelet log shows connection timeout to Apiserver endpoint.

@Mazorius Mazorius added the kind/bug Categorizes issue or PR as related to a bug. label May 22, 2024
@vladvetu
Copy link

vladvetu commented May 26, 2024

Same for me. On a fresh cluster deployment if kube-vip it's enabled the deployment fails.
Variables used in setting up kube-vip:

# Kube VIP
kube_vip_enabled: true
kube_vip_arp_enabled: true
kube_vip_controlplane_enabled: true
kube_vip_address: "{{ hostvars[groups['kube_control_plane'][0]]['virtual_ip_addresses'][0] }}" # evaluates to an IP
loadbalancer_apiserver:
  address: "{{ kube_vip_address }}"
  port: 6443
kube_vip_interface: ens192
kube_vip_services_enabled: false
kube_vip_dns_mode: first
kube_vip_cp_detect: false
kube_vip_leasename: plndr-cp-lock
kube_vip_enable_node_labeling: true
kube_vip_lb_enable: true

This are the logs from the kube-vip container:

E0526 16:28:30.201192       1 leaderelection.go:332] error retrieving resource lock kube-system/plndr-cp-lock: leases.coordination.k8s.io "plndr-cp-lock" is forbidden: User "kubernetes-admin" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"

E0526 16:28:32.303578       1 leaderelection.go:332] error retrieving resource lock kube-system/plndr-cp-lock: leases.coordination.k8s.io "plndr-cp-lock" is forbidden: User "kubernetes-admin" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"

And this from the journaclt:

May 26 16:30:11 k8s-g1-cplane-1-56a631.example.com kubelet[14529]: I0526 16:30:11.764300   14529 csi_plugin.go:880] Failed to contact API server when waiting for CSINode publishing: Get "https://lb-apiserver.kubernetes.local:6443/apis/storage.k8s.io/v1/csinodes/k8s-g1-cplane-1-56a631.example.com": dial tcp 172.19.20.99:6443: connect: no route to host

May 26 16:30:11 k8s-g1-cplane-1-56a631.example.com kubelet[14529]: W0526 16:30:11.764339   14529 reflector.go:539] k8s.io/client-go@v0.0.0/tools/cache/reflector.go:229: failed to list *v1.Node: Get "https://lb-apiserver.kubernetes.local:6443/api/v1/nodes?fieldSelector=metadata.name%3Dk8s-g1-cplane-1-56a631.example.com&limit=500&resourceVersion=0": dial tcp 172.19.20.99:6443: connect: no route to host

I redacted the domain with example.com

Workaround:

  1. Deploy a fresh cluster without kube-vip the deployed succeeds.
  2. Enable kube-vip and re-run cluster.yml the cluster deployment succeeds and kube-vip works as expected.

@raider444
Copy link

Same issue for me.

@wandersonlima
Copy link

kube-vip requires workarounds to support k8s v1.29+

@sathieu
Copy link
Contributor

sathieu commented May 28, 2024

I would be great to add kube-vip to the test matrix also ...

sathieu added a commit to sathieu/kubespray that referenced this issue May 28, 2024
Fixes: kubernetes-sigs#11229
Signed-off-by: Mathieu Parent <mathieu.parent@insee.fr>
@sathieu
Copy link
Contributor

sathieu commented May 28, 2024

Proposed PR: #11242

sathieu added a commit to sathieu/kubespray that referenced this issue May 28, 2024
See kubernetes-sigs#11229

Signed-off-by: Mathieu Parent <mathieu.parent@insee.fr>
sathieu added a commit to sathieu/kubespray that referenced this issue May 30, 2024
See kubernetes-sigs#11229

Signed-off-by: Mathieu Parent <mathieu.parent@insee.fr>
sathieu added a commit to sathieu/kubespray that referenced this issue May 30, 2024
See kubernetes-sigs#11229

Signed-off-by: Mathieu Parent <mathieu.parent@insee.fr>
sathieu added a commit to sathieu/kubespray that referenced this issue May 30, 2024
See kubernetes-sigs#11229

Signed-off-by: Mathieu Parent <mathieu.parent@insee.fr>
sathieu added a commit to sathieu/kubespray that referenced this issue Jun 3, 2024
See kubernetes-sigs#11229

Signed-off-by: Mathieu Parent <mathieu.parent@insee.fr>
sathieu added a commit to sathieu/kubespray that referenced this issue Jun 3, 2024
See kubernetes-sigs#11229

Signed-off-by: Mathieu Parent <mathieu.parent@insee.fr>
sathieu added a commit to sathieu/kubespray that referenced this issue Jun 6, 2024
See kubernetes-sigs#11229

Signed-off-by: Mathieu Parent <mathieu.parent@insee.fr>
@theLockesmith
Copy link

Workaround:

1. Deploy a fresh cluster without kube-vip the deployed succeeds.

2. Enable kube-vip and re-run cluster.yml the cluster deployment succeeds and kube-vip works as expected.

Thank you for saving my sanity!

sathieu added a commit to sathieu/kubespray that referenced this issue Jun 27, 2024
See kubernetes-sigs#11229

Signed-off-by: Mathieu Parent <mathieu.parent@insee.fr>
sathieu added a commit to sathieu/kubespray that referenced this issue Jun 27, 2024
See kubernetes-sigs#11229

Signed-off-by: Mathieu Parent <mathieu.parent@insee.fr>
@WladyX
Copy link

WladyX commented Jul 8, 2024

I edited roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2 in the kubespray docker like in the PR and it worked ok.
Thanks!

@sathieu
Copy link
Contributor

sathieu commented Aug 22, 2024

Quoting kube-vip/kube-vip#684 (comment):

Without ControlPlaneKubeletLocalMode and when referncing admin.conf for kube-vip:

* The kubelet got started

* The kubelet wanted to bootstrap itself by using the control-plane IP

* This failed until kube-vip comes up

* The kubelet can't start kube-vip because the `admin.conf` does not yet exist

With ControlPlaneKubeletLocalMode and when referncing admin.conf for kube-vip:

* The kubelet got started

* The kubelet bootstraps itself using the local control-plane IP (not depending on kube-vip being up)

* The admin.conf gets created

* The kubelet should be able to start kube-vip now

So, a better solution is available now.

@Cloud-Mak
Copy link

Quoting kube-vip/kube-vip#684 (comment):

Without ControlPlaneKubeletLocalMode and when referncing admin.conf for kube-vip:

* The kubelet got started

* The kubelet wanted to bootstrap itself by using the control-plane IP

* This failed until kube-vip comes up

* The kubelet can't start kube-vip because the `admin.conf` does not yet exist

With ControlPlaneKubeletLocalMode and when referncing admin.conf for kube-vip:

* The kubelet got started

* The kubelet bootstraps itself using the local control-plane IP (not depending on kube-vip being up)

* The admin.conf gets created

* The kubelet should be able to start kube-vip now

So, a better solution is available now.

Nope. This isn't working. This 684 is still any issue

kube-vip/kube-vip#684 (comment)
kube-vip/kube-vip#684 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
8 participants