3 control-plane node setup not starting #2744

sreejithpunnapuzha · 2022-05-10T14:49:44Z

I am trying to launch a 3 controller and 1 worker kind cluster on ubuntu 22.04. its failing to start. Below is the config yaml that i used to create cluster.

a cluster with 3 control-plane nodes and 1 workers

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:

role: control-plane
role: control-plane
role: control-plane
role: worker

cluster create is failing with the below error. (logs are trimmed)

kind create cluster --name multi --config multi-controller-worker.yaml
Creating cluster "multi" ...
✓ Ensuring node image (kindest/node:v1.23.4) 🖼
✓ Preparing nodes 📦 📦 📦 📦
✓ Configuring the external load balancer ⚖️
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining more control-plane nodes 🎮
✗ Joining worker nodes 🚜
ERROR: failed to create cluster: failed to join node with kubeadm: command "docker exec --privileged multi-worker kubeadm join --config /kind/kubeadm.conf --skip-phases=preflight --v=6" failed with error: exit status 1
Command Output: I0510 14:08:38.916853 132 join.go:413] [preflight] found NodeName empty; using OS hostname as NodeName
I0510 14:08:38.924953 132 joinconfiguration.go:76] loading configuration from "/kind/kubeadm.conf"
I0510 14:08:38.926027 132 controlplaneprepare.go:220] [download-certs] Skipping certs download
I0510 14:08:38.926052 132 join.go:530] [preflight] Discovering cluster-info
I0510 14:08:38.926062 132 token.go:80] [discovery] Created cluster-info discovery client, requesting info from "multi-external-load-balancer:6443"
I0510 14:08:38.941343 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s 200 OK in 14 milliseconds
I0510 14:08:38.942048 132 token.go:105] [discovery] Cluster info signature and contents are valid and no TLS pinning was specified, will use API Server "multi-external-load-balancer:6443"
I0510 14:08:38.942071 132 discovery.go:52] [discovery] Using provided TLSBootstrapToken as authentication credentials for the join process
I0510 14:08:38.942078 132 join.go:544] [preflight] Fetching init configuration
I0510 14:08:38.942081 132 join.go:590] [preflight] Retrieving KubeConfig objects
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
I0510 14:08:38.969023 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s 200 OK in 26 milliseconds
I0510 14:08:38.973664 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/namespaces/kube-system/configmaps/kube-proxy?timeout=10s 200 OK in 3 milliseconds
I0510 14:08:38.974834 132 kubelet.go:91] attempting to download the KubeletConfiguration from the new format location (UnversionedKubeletConfigMap=true)
I0510 14:08:38.976505 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config?timeout=10s 403 Forbidden in 1 milliseconds
I0510 14:08:38.989390 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config?timeout=10s 403 Forbidden in 0 milliseconds
I0510 14:08:39.048096 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config?timeout=10s 403 Forbidden in 1 milliseconds
I0510 14:08:39.311348 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config?timeout=10s 403 Forbidden in 2 milliseconds
I0510 14:08:39.311647 132 kubelet.go:94] attempting to download the KubeletConfiguration from the DEPRECATED location (UnversionedKubeletConfigMap=false)
I0510 14:08:39.314451 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config-1.23?timeout=10s 200 OK in 2 milliseconds
I0510 14:08:39.315764 132 interface.go:432] Looking for default routes with IPv4 addresses
I0510 14:08:39.315785 132 interface.go:437] Default route transits interface "eth0"
I0510 14:08:39.315942 132 interface.go:209] Interface eth0 is up
I0510 14:08:39.315996 132 interface.go:257] Interface "eth0" has 3 addresses :[172.18.0.5/16 fc00:f853:ccd:e793::5/64 fe80::42:acff:fe12:5/64].
I0510 14:08:39.316007 132 interface.go:224] Checking addr 172.18.0.5/16.
I0510 14:08:39.316012 132 interface.go:231] IP found 172.18.0.5
I0510 14:08:39.316029 132 interface.go:263] Found valid IPv4 address 172.18.0.5 for interface "eth0".
I0510 14:08:39.316034 132 interface.go:443] Found active IP 172.18.0.5
I0510 14:08:39.321562 132 kubelet.go:119] [kubelet-start] writing bootstrap kubelet config file at /etc/kubernetes/bootstrap-kubelet.conf
I0510 14:08:39.322302 132 kubelet.go:134] [kubelet-start] writing CA certificate at /etc/kubernetes/pki/ca.crt
I0510 14:08:39.322814 132 loader.go:372] Config loaded from file: /etc/kubernetes/bootstrap-kubelet.conf
I0510 14:08:39.323308 132 kubelet.go:155] [kubelet-start] Checking for an existing Node in the cluster with name "multi-worker" and status "Ready"
I0510 14:08:39.326531 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/nodes/multi-worker?timeout=10s 404 Not Found in 3 milliseconds
I0510 14:08:39.326759 132 kubelet.go:170] [kubelet-start] Stopping the kubelet
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
I0510 14:08:44.471944 132 loader.go:372] Config loaded from file: /etc/kubernetes/kubelet.conf
I0510 14:08:44.473636 132 loader.go:372] Config loaded from file: /etc/kubernetes/kubelet.conf
I0510 14:08:44.474330 132 kubelet.go:218] [kubelet-start] preserving the crisocket information for the node
I0510 14:08:44.474383 132 patchnode.go:31] [patchnode] Uploading the CRI Socket information "unix:///run/containerd/containerd.sock" to the Node API object "multi-worker" as an annotation
I0510 14:08:44.474432 132 cert_rotation.go:137] Starting client certificate rotation controller
I0510 14:08:45.125555 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/nodes/multi-worker?timeout=10s 404 Not Found in 150 milliseconds
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
I0510 14:09:19.486825 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/nodes/multi-worker?timeout=10s 404 Not Found in 11 milliseconds
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
I0510 14:09:24.479348 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/nodes/multi-worker?timeout=10s 404 Not Found in 4 milliseconds
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
I0510 14:09:34.480058 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/nodes/multi-worker?timeout=10s 404 Not Found in 5 milliseconds
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
I0510 14:09:54.481436 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/nodes/multi-worker?timeout=10s 404 Not Found in 4 milliseconds
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
I0510 14:10:34.483231 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/nodes/multi-worker?timeout=10s 404 Not Found in 8 milliseconds
nodes "multi-worker" not found
error uploading crisocket
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/join.runKubeletStartJoinPhase
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/join/kubelet.go:220
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:234
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:421
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdJoin.func1
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:178
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:856
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:974
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:902
k8s.io/kubernetes/cmd/kubeadm/app.Run
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
/usr/local/go/src/runtime/proc.go:255
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1581
error execution phase kubelet-start
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:235
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:421
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdJoin.func1
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:178
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:856
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:974
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:902
k8s.io/kubernetes/cmd/kubeadm/app.Run
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
/usr/local/go/src/runtime/proc.go:255
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1581

Environment:

kind version: (use kind version): v0.12.0
Kubernetes version: (use kubectl version): v1.24.0
Docker version: (use docker info): Server Version: 20.10.12
OS (e.g. from /etc/os-release): Ubuntu 22.04 LTS

The expectation is that kind should be able to create cluster with multiple controller and multiple workers

The text was updated successfully, but these errors were encountered:

aojea · 2022-05-10T14:54:01Z

nodes "multi-worker" not found
error uploading crisocket

do you have enough resources, that setup will require a some amount of RAM and CPU

sreejithpunnapuzha · 2022-05-10T14:57:43Z

the node is with 2 cpu and 10gb ram. wont this be sufficient to start the cluster? i have successfully created 1 controller and 3 worker on this same node with out any issues. the problem comes when i increase the controller to 3 and worker to 1

aojea · 2022-05-10T15:00:56Z

I can't say with this data, but just try creating a cluster with less components so you can discard is a resource problem

sreejithpunnapuzha · 2022-05-10T15:10:23Z

1 controller and 3 worker cluster is coming up fine.

kind create cluster --name single --config=single-controller-multi-worker.yaml
Creating cluster "single" ...
✓ Ensuring node image (kindest/node:v1.23.4) 🖼
✓ Preparing nodes 📦 📦 📦 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining worker nodes 🚜
Set kubectl context to "kind-single"
You can now use your cluster with:

kubectl cluster-info --context kind-single

Not sure what to do next? 😅 Check out https://kind.sigs.k8s.io/docs/user/quick-start/

kubectl get nodes
NAME STATUS ROLES AGE VERSION
single-control-plane Ready control-plane,master 3m31s v1.23.4
single-worker Ready 2m59s v1.23.4
single-worker2 Ready 2m59s v1.23.4
single-worker3 Ready 3m12s v1.23.4

this is the kind config that i used to create this cluster

cat single-controller-multi-worker.yaml

a cluster with 1 control-plane nodes and 3 workers

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:

role: control-plane
role: worker
role: worker
role: worker

aojea · 2022-05-10T15:19:41Z

3 control-planes = 3 etcd 💣 😄

sreejithpunnapuzha · 2022-05-10T15:27:02Z

i tested this with kind version v0.11.1 on the same node and its working as expected.

./kind-linux-amd64 create cluster --name multi --config multi-controller-worker.yaml
Creating cluster "multi" ...
✓ Ensuring node image (kindest/node:v1.21.1) 🖼
✓ Preparing nodes 📦 📦 📦 📦
✓ Configuring the external load balancer ⚖️
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining more control-plane nodes 🎮
✓ Joining worker nodes 🚜
Set kubectl context to "kind-multi"
You can now use your cluster with:

kubectl cluster-info --context kind-multi

Thanks for using kind! 😊

kubectl get nodes
NAME STATUS ROLES AGE VERSION
multi-control-plane Ready control-plane,master 2m11s v1.21.1
multi-control-plane2 Ready control-plane,master 104s v1.21.1
multi-control-plane3 Ready control-plane,master 47s v1.21.1
multi-worker Ready 34s v1.21.1

./kind-linux-amd64 version
kind v0.11.1 go1.16.4 linux/amd64

cat multi-controller-worker.yaml

a cluster with 3 control-plane nodes and 1 workers

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:

role: control-plane
role: control-plane
role: control-plane
role: worker

BenTheElder · 2022-05-10T17:08:34Z

This is most likely a resource issue, resource has many dimensions besides CPU and RAM, e.g. disk IO for the kubernetes apiservers (by way of etcd) which are write heavy.

If you run kind create cluster with --retain it will prevent cleanup, and then you can run kind export logs to dump lots of cluster logs, then kind delete cluster to cleanup the cluster. If you share the logs we can see more detail about why kubelet is unhealthy.

BenTheElder · 2022-05-10T17:09:33Z

FWIW: I highly recommend single node clusters unless you have a specific use case that absolutely requires the three control planes.

sreejithpunnapuzha · 2022-05-16T14:56:56Z

@BenTheElder thanks "--retain" helped me to retain the cluster. seems like kube-proxy is failing to start with error "command failed" err="failed complete: too many open files". i will do some more troubleshooting.

sreejithpunnapuzha · 2022-05-16T15:20:39Z

i googled and found that the above issue is caused by ulimit. I have updated the ulimit for max_user_watches and max_user_instances to a higher value and now the kind cluster is coming up without any problem.

echo fs.inotify.max_user_watches=655360 | sudo tee -a /etc/sysctl.conf
echo fs.inotify.max_user_instances=1280 | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

closing this issue since the above commands fixed this.

BenTheElder · 2022-05-16T18:26:31Z

FWIW: https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files

Prompted by seeing failures which could be caused by kubernetes-sigs/kind#2744 Signed-off-by: Andrew Bayer <andrew.bayer@gmail.com>

strawgate · 2023-10-27T20:18:07Z

i googled and found that the above issue is caused by ulimit. I have updated the ulimit for max_user_watches and max_user_instances to a higher value and now the kind cluster is coming up without any problem.

echo fs.inotify.max_user_watches=655360 | sudo tee -a /etc/sysctl.conf echo fs.inotify.max_user_instances=1280 | sudo tee -a /etc/sysctl.conf sudo sysctl -p

closing this issue since the above commands fixed this.

I ran into this issue when attempting to deploy a large kind cluster with 25 worker nodes and this also fixed my issue.

GH-Djeff · 2024-01-20T13:46:31Z

This works for me too:
echo fs.inotify.max_user_watches=655360 | sudo tee -a /etc/sysctl.conf
echo fs.inotify.max_user_instances=1280 | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

sreejithpunnapuzha added the kind/bug Categorizes issue or PR as related to a bug. label May 10, 2022

sreejithpunnapuzha closed this as completed May 16, 2022

timcharper mentioned this issue Jun 23, 2022

Failure in joining worker nodes while creating kind cluster #1652

Closed

abayer mentioned this issue Jul 6, 2022

Add env files for Kind-in-Prow integration test jobs tektoncd/pipeline#5077

Merged

5 tasks

abayer added a commit to abayer/plumbing that referenced this issue Jul 6, 2022

Drop node count for Pipeline kind tests from 3 to 2

4b7eee9

Prompted by seeing failures which could be caused by kubernetes-sigs/kind#2744 Signed-off-by: Andrew Bayer <andrew.bayer@gmail.com>

abayer mentioned this issue Jul 6, 2022

Drop node count for Pipeline kind tests from 3 to 2 tektoncd/plumbing#1128

Merged

2 tasks

tekton-robot pushed a commit to tektoncd/plumbing that referenced this issue Jul 6, 2022

Drop node count for Pipeline kind tests from 3 to 2

fe4a6a9

Prompted by seeing failures which could be caused by kubernetes-sigs/kind#2744 Signed-off-by: Andrew Bayer <andrew.bayer@gmail.com>

stmcginnis mentioned this issue Oct 27, 2023

Unable to create second cluster #3398

Closed

weizhoublue mentioned this issue Dec 9, 2023

optimize kind setup cilium/cilium#29758

Merged

kuba2915 mentioned this issue Dec 29, 2023

Failed to JSON parse a line from worker stream due to unexpected EOF(b'') ansible/awx#14693

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3 control-plane node setup not starting #2744

3 control-plane node setup not starting #2744

sreejithpunnapuzha commented May 10, 2022

aojea commented May 10, 2022

sreejithpunnapuzha commented May 10, 2022 •

edited

aojea commented May 10, 2022

sreejithpunnapuzha commented May 10, 2022

aojea commented May 10, 2022

sreejithpunnapuzha commented May 10, 2022

BenTheElder commented May 10, 2022

BenTheElder commented May 10, 2022

sreejithpunnapuzha commented May 16, 2022

sreejithpunnapuzha commented May 16, 2022

BenTheElder commented May 16, 2022

strawgate commented Oct 27, 2023

GH-Djeff commented Jan 20, 2024

3 control-plane node setup not starting #2744

3 control-plane node setup not starting #2744

Comments

sreejithpunnapuzha commented May 10, 2022

a cluster with 3 control-plane nodes and 1 workers

aojea commented May 10, 2022

sreejithpunnapuzha commented May 10, 2022 • edited

aojea commented May 10, 2022

sreejithpunnapuzha commented May 10, 2022

a cluster with 1 control-plane nodes and 3 workers

aojea commented May 10, 2022

sreejithpunnapuzha commented May 10, 2022

a cluster with 3 control-plane nodes and 1 workers

BenTheElder commented May 10, 2022

BenTheElder commented May 10, 2022

sreejithpunnapuzha commented May 16, 2022

sreejithpunnapuzha commented May 16, 2022

BenTheElder commented May 16, 2022

strawgate commented Oct 27, 2023

GH-Djeff commented Jan 20, 2024

sreejithpunnapuzha commented May 10, 2022 •

edited