Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3 control-plane node setup not starting #2744

Closed
sreejithpunnapuzha opened this issue May 10, 2022 · 13 comments
Closed

3 control-plane node setup not starting #2744

sreejithpunnapuzha opened this issue May 10, 2022 · 13 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@sreejithpunnapuzha
Copy link

I am trying to launch a 3 controller and 1 worker kind cluster on ubuntu 22.04. its failing to start. Below is the config yaml that i used to create cluster.

a cluster with 3 control-plane nodes and 1 workers

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:

  • role: control-plane
  • role: control-plane
  • role: control-plane
  • role: worker

cluster create is failing with the below error. (logs are trimmed)

kind create cluster --name multi --config multi-controller-worker.yaml
Creating cluster "multi" ...
✓ Ensuring node image (kindest/node:v1.23.4) 🖼
✓ Preparing nodes 📦 📦 📦 📦
✓ Configuring the external load balancer ⚖️
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining more control-plane nodes 🎮
✗ Joining worker nodes 🚜
ERROR: failed to create cluster: failed to join node with kubeadm: command "docker exec --privileged multi-worker kubeadm join --config /kind/kubeadm.conf --skip-phases=preflight --v=6" failed with error: exit status 1
Command Output: I0510 14:08:38.916853 132 join.go:413] [preflight] found NodeName empty; using OS hostname as NodeName
I0510 14:08:38.924953 132 joinconfiguration.go:76] loading configuration from "/kind/kubeadm.conf"
I0510 14:08:38.926027 132 controlplaneprepare.go:220] [download-certs] Skipping certs download
I0510 14:08:38.926052 132 join.go:530] [preflight] Discovering cluster-info
I0510 14:08:38.926062 132 token.go:80] [discovery] Created cluster-info discovery client, requesting info from "multi-external-load-balancer:6443"
I0510 14:08:38.941343 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s 200 OK in 14 milliseconds
I0510 14:08:38.942048 132 token.go:105] [discovery] Cluster info signature and contents are valid and no TLS pinning was specified, will use API Server "multi-external-load-balancer:6443"
I0510 14:08:38.942071 132 discovery.go:52] [discovery] Using provided TLSBootstrapToken as authentication credentials for the join process
I0510 14:08:38.942078 132 join.go:544] [preflight] Fetching init configuration
I0510 14:08:38.942081 132 join.go:590] [preflight] Retrieving KubeConfig objects
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
I0510 14:08:38.969023 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s 200 OK in 26 milliseconds
I0510 14:08:38.973664 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/namespaces/kube-system/configmaps/kube-proxy?timeout=10s 200 OK in 3 milliseconds
I0510 14:08:38.974834 132 kubelet.go:91] attempting to download the KubeletConfiguration from the new format location (UnversionedKubeletConfigMap=true)
I0510 14:08:38.976505 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config?timeout=10s 403 Forbidden in 1 milliseconds
I0510 14:08:38.989390 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config?timeout=10s 403 Forbidden in 0 milliseconds
I0510 14:08:39.048096 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config?timeout=10s 403 Forbidden in 1 milliseconds
I0510 14:08:39.311348 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config?timeout=10s 403 Forbidden in 2 milliseconds
I0510 14:08:39.311647 132 kubelet.go:94] attempting to download the KubeletConfiguration from the DEPRECATED location (UnversionedKubeletConfigMap=false)
I0510 14:08:39.314451 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config-1.23?timeout=10s 200 OK in 2 milliseconds
I0510 14:08:39.315764 132 interface.go:432] Looking for default routes with IPv4 addresses
I0510 14:08:39.315785 132 interface.go:437] Default route transits interface "eth0"
I0510 14:08:39.315942 132 interface.go:209] Interface eth0 is up
I0510 14:08:39.315996 132 interface.go:257] Interface "eth0" has 3 addresses :[172.18.0.5/16 fc00:f853:ccd:e793::5/64 fe80::42:acff:fe12:5/64].
I0510 14:08:39.316007 132 interface.go:224] Checking addr 172.18.0.5/16.
I0510 14:08:39.316012 132 interface.go:231] IP found 172.18.0.5
I0510 14:08:39.316029 132 interface.go:263] Found valid IPv4 address 172.18.0.5 for interface "eth0".
I0510 14:08:39.316034 132 interface.go:443] Found active IP 172.18.0.5
I0510 14:08:39.321562 132 kubelet.go:119] [kubelet-start] writing bootstrap kubelet config file at /etc/kubernetes/bootstrap-kubelet.conf
I0510 14:08:39.322302 132 kubelet.go:134] [kubelet-start] writing CA certificate at /etc/kubernetes/pki/ca.crt
I0510 14:08:39.322814 132 loader.go:372] Config loaded from file: /etc/kubernetes/bootstrap-kubelet.conf
I0510 14:08:39.323308 132 kubelet.go:155] [kubelet-start] Checking for an existing Node in the cluster with name "multi-worker" and status "Ready"
I0510 14:08:39.326531 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/nodes/multi-worker?timeout=10s 404 Not Found in 3 milliseconds
I0510 14:08:39.326759 132 kubelet.go:170] [kubelet-start] Stopping the kubelet
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
I0510 14:08:44.471944 132 loader.go:372] Config loaded from file: /etc/kubernetes/kubelet.conf
I0510 14:08:44.473636 132 loader.go:372] Config loaded from file: /etc/kubernetes/kubelet.conf
I0510 14:08:44.474330 132 kubelet.go:218] [kubelet-start] preserving the crisocket information for the node
I0510 14:08:44.474383 132 patchnode.go:31] [patchnode] Uploading the CRI Socket information "unix:///run/containerd/containerd.sock" to the Node API object "multi-worker" as an annotation
I0510 14:08:44.474432 132 cert_rotation.go:137] Starting client certificate rotation controller
I0510 14:08:45.125555 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/nodes/multi-worker?timeout=10s 404 Not Found in 150 milliseconds
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
I0510 14:09:19.486825 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/nodes/multi-worker?timeout=10s 404 Not Found in 11 milliseconds
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
I0510 14:09:24.479348 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/nodes/multi-worker?timeout=10s 404 Not Found in 4 milliseconds
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
I0510 14:09:34.480058 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/nodes/multi-worker?timeout=10s 404 Not Found in 5 milliseconds
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
I0510 14:09:54.481436 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/nodes/multi-worker?timeout=10s 404 Not Found in 4 milliseconds
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
I0510 14:10:34.483231 132 round_trippers.go:553] GET https://multi-external-load-balancer:6443/api/v1/nodes/multi-worker?timeout=10s 404 Not Found in 8 milliseconds
nodes "multi-worker" not found
error uploading crisocket
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/join.runKubeletStartJoinPhase
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/join/kubelet.go:220
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:234
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:421
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdJoin.func1
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:178
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:856
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:974
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:902
k8s.io/kubernetes/cmd/kubeadm/app.Run
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
/usr/local/go/src/runtime/proc.go:255
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1581
error execution phase kubelet-start
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:235
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:421
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdJoin.func1
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:178
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:856
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:974
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:902
k8s.io/kubernetes/cmd/kubeadm/app.Run
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
/usr/local/go/src/runtime/proc.go:255
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1581

Environment:

  • kind version: (use kind version): v0.12.0
  • Kubernetes version: (use kubectl version): v1.24.0
  • Docker version: (use docker info): Server Version: 20.10.12
  • OS (e.g. from /etc/os-release): Ubuntu 22.04 LTS

The expectation is that kind should be able to create cluster with multiple controller and multiple workers

@sreejithpunnapuzha sreejithpunnapuzha added the kind/bug Categorizes issue or PR as related to a bug. label May 10, 2022
@aojea
Copy link
Contributor

aojea commented May 10, 2022

nodes "multi-worker" not found
error uploading crisocket

do you have enough resources, that setup will require a some amount of RAM and CPU

@sreejithpunnapuzha
Copy link
Author

sreejithpunnapuzha commented May 10, 2022

the node is with 2 cpu and 10gb ram. wont this be sufficient to start the cluster? i have successfully created 1 controller and 3 worker on this same node with out any issues. the problem comes when i increase the controller to 3 and worker to 1

@aojea
Copy link
Contributor

aojea commented May 10, 2022

I can't say with this data, but just try creating a cluster with less components so you can discard is a resource problem

@sreejithpunnapuzha
Copy link
Author

1 controller and 3 worker cluster is coming up fine.

kind create cluster --name single --config=single-controller-multi-worker.yaml
Creating cluster "single" ...
✓ Ensuring node image (kindest/node:v1.23.4) 🖼
✓ Preparing nodes 📦 📦 📦 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining worker nodes 🚜
Set kubectl context to "kind-single"
You can now use your cluster with:

kubectl cluster-info --context kind-single

Not sure what to do next? 😅 Check out https://kind.sigs.k8s.io/docs/user/quick-start/

kubectl get nodes
NAME STATUS ROLES AGE VERSION
single-control-plane Ready control-plane,master 3m31s v1.23.4
single-worker Ready 2m59s v1.23.4
single-worker2 Ready 2m59s v1.23.4
single-worker3 Ready 3m12s v1.23.4

this is the kind config that i used to create this cluster

cat single-controller-multi-worker.yaml

a cluster with 1 control-plane nodes and 3 workers

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:

  • role: control-plane
  • role: worker
  • role: worker
  • role: worker

@aojea
Copy link
Contributor

aojea commented May 10, 2022

3 control-planes = 3 etcd 💣 😄

@sreejithpunnapuzha
Copy link
Author

i tested this with kind version v0.11.1 on the same node and its working as expected.

./kind-linux-amd64 create cluster --name multi --config multi-controller-worker.yaml
Creating cluster "multi" ...
✓ Ensuring node image (kindest/node:v1.21.1) 🖼
✓ Preparing nodes 📦 📦 📦 📦
✓ Configuring the external load balancer ⚖️
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining more control-plane nodes 🎮
✓ Joining worker nodes 🚜
Set kubectl context to "kind-multi"
You can now use your cluster with:

kubectl cluster-info --context kind-multi

Thanks for using kind! 😊

kubectl get nodes
NAME STATUS ROLES AGE VERSION
multi-control-plane Ready control-plane,master 2m11s v1.21.1
multi-control-plane2 Ready control-plane,master 104s v1.21.1
multi-control-plane3 Ready control-plane,master 47s v1.21.1
multi-worker Ready 34s v1.21.1

./kind-linux-amd64 version
kind v0.11.1 go1.16.4 linux/amd64

cat multi-controller-worker.yaml

a cluster with 3 control-plane nodes and 1 workers

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:

  • role: control-plane
  • role: control-plane
  • role: control-plane
  • role: worker

@BenTheElder
Copy link
Member

This is most likely a resource issue, resource has many dimensions besides CPU and RAM, e.g. disk IO for the kubernetes apiservers (by way of etcd) which are write heavy.

If you run kind create cluster with --retain it will prevent cleanup, and then you can run kind export logs to dump lots of cluster logs, then kind delete cluster to cleanup the cluster. If you share the logs we can see more detail about why kubelet is unhealthy.

@BenTheElder
Copy link
Member

FWIW: I highly recommend single node clusters unless you have a specific use case that absolutely requires the three control planes.

@sreejithpunnapuzha
Copy link
Author

@BenTheElder thanks "--retain" helped me to retain the cluster. seems like kube-proxy is failing to start with error "command failed" err="failed complete: too many open files". i will do some more troubleshooting.

@sreejithpunnapuzha
Copy link
Author

i googled and found that the above issue is caused by ulimit. I have updated the ulimit for max_user_watches and max_user_instances to a higher value and now the kind cluster is coming up without any problem.

echo fs.inotify.max_user_watches=655360 | sudo tee -a /etc/sysctl.conf
echo fs.inotify.max_user_instances=1280 | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

closing this issue since the above commands fixed this.

@BenTheElder
Copy link
Member

FWIW: https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files

abayer added a commit to abayer/plumbing that referenced this issue Jul 6, 2022
Prompted by seeing failures which could be caused by kubernetes-sigs/kind#2744

Signed-off-by: Andrew Bayer <andrew.bayer@gmail.com>
tekton-robot pushed a commit to tektoncd/plumbing that referenced this issue Jul 6, 2022
Prompted by seeing failures which could be caused by kubernetes-sigs/kind#2744

Signed-off-by: Andrew Bayer <andrew.bayer@gmail.com>
@strawgate
Copy link

i googled and found that the above issue is caused by ulimit. I have updated the ulimit for max_user_watches and max_user_instances to a higher value and now the kind cluster is coming up without any problem.

echo fs.inotify.max_user_watches=655360 | sudo tee -a /etc/sysctl.conf echo fs.inotify.max_user_instances=1280 | sudo tee -a /etc/sysctl.conf sudo sysctl -p

closing this issue since the above commands fixed this.

I ran into this issue when attempting to deploy a large kind cluster with 25 worker nodes and this also fixed my issue.

@GH-Djeff
Copy link

This works for me too:
echo fs.inotify.max_user_watches=655360 | sudo tee -a /etc/sysctl.conf
echo fs.inotify.max_user_instances=1280 | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

5 participants