-
-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU spikes on the first control plane machine, when the second machine tries to join #214
Comments
This is the problem I described here https://twitter.com/nickwperry/status/1385687098322214919 VMWare recommended workaround of backing off the LeaderElection params from default 15, 10, 2 to 30, 20, 4 works in our environments. Running oversized control plane nodes (8vcpu) also mitigates successfully. |
I think the first port of call is to start with larger params.. (we introduce a lot of |
another option for this is to support etcd learner mode in kubeadm and CAPI https://etcd.io/docs/v3.3/learning/learner/#background cc @fabriziopandini |
☝️ That seems like a great idea.. it would certainly stop the etcd/api-server flakiness. |
There is an existing RFE for joining as learner - kubernetes/kubeadm#1793 |
From the kubeadm side the idea is still on the table, but I don't see this happening soon unless someone picks up the work.
|
@nickperry could you give me details of the work around you used for this? I'm struggling to build a cluster and seeing this behaviour - thanks |
@sammcgeown workaround is to make the LeaderElection timings more relaxed. On VMWare TKGM 1.3.x you can do this in a YTT overlay file before building your workload cluster. Copy the attached overlay file to ~/.tanzu/tkg/providers/infrastructure-vsphere/ytt/vsphere-overlay.yaml (ensure you remove the .txt suffix). This workaround was provided by VMware R&D and relaxes the LeaderElection timings to 30, 20, 4. |
@nickperry thank you - that's perfect. For an installation of kube-vip with kubeadm I'm just editing the |
@sammcgeown are you seeing the problem when manually deploying with kubeadm on top of existing VMs, rather than using CAPI? Probably quite a useful data point if so. Static pods for etcd or independently managed etcd? |
Yes - Ubuntu 21.04 on Raspberry Pi 4s, with static pods for etcd. |
Default timeouts have been improved and there have been no further discussions around this issue for some time. |
in some cases, when using kube-vip as a control plane endpoint to bootstrap clusters, the resource consumption (especially CPU) spikes in the first control plane machine when the second CP machine tries to join. This leads to a situation where a quorum loss happens, which ultimately makes the bootstrapping fail.
By lowering the leader election params, clusters are able to bootstrap. Let's use this issue to track ways to mitigate this
The text was updated successfully, but these errors were encountered: