Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU spikes on the first control plane machine, when the second machine tries to join #214

Closed
yastij opened this issue Jun 2, 2021 · 12 comments
Labels
bug Something isn't working control plane outside-cluster STALE

Comments

@yastij
Copy link
Collaborator

yastij commented Jun 2, 2021

in some cases, when using kube-vip as a control plane endpoint to bootstrap clusters, the resource consumption (especially CPU) spikes in the first control plane machine when the second CP machine tries to join. This leads to a situation where a quorum loss happens, which ultimately makes the bootstrapping fail.

By lowering the leader election params, clusters are able to bootstrap. Let's use this issue to track ways to mitigate this

@yastij yastij added bug Something isn't working control plane outside-cluster labels Jun 2, 2021
@nickperry
Copy link

This is the problem I described here https://twitter.com/nickwperry/status/1385687098322214919

VMWare recommended workaround of backing off the LeaderElection params from default 15, 10, 2 to 30, 20, 4 works in our environments. Running oversized control plane nodes (8vcpu) also mitigates successfully.

@thebsdbox
Copy link
Collaborator

I think the first port of call is to start with larger params.. (we introduce a lot of etcd thrashing when we join a second member to the cluster, causing the API-server to fall over) we can then reduce the params once the cluster is stable and enjoy faster failover.

@yastij
Copy link
Collaborator Author

yastij commented Jun 10, 2021

another option for this is to support etcd learner mode in kubeadm and CAPI https://etcd.io/docs/v3.3/learning/learner/#background cc @fabriziopandini

@thebsdbox
Copy link
Collaborator

☝️ That seems like a great idea.. it would certainly stop the etcd/api-server flakiness.

@nickperry
Copy link

There is an existing RFE for joining as learner - kubernetes/kubeadm#1793

@fabriziopandini
Copy link

From the kubeadm side the idea is still on the table, but I don't see this happening soon unless someone picks up the work.
As far as I remember about etcd learner mode, there was two main issue to be addressed:

  • only one learner mode was allowed at time. This imply some coordination for the parallel join scenario
  • there was no easy/automatic mechanism for promoting learners to actual members
    But might be my informations are a little bit outdated...

@sammcgeown
Copy link
Contributor

This is the problem I described here https://twitter.com/nickwperry/status/1385687098322214919

VMWare recommended workaround of backing off the LeaderElection params from default 15, 10, 2 to 30, 20, 4 works in our environments. Running oversized control plane nodes (8vcpu) also mitigates successfully.

@nickperry could you give me details of the work around you used for this? I'm struggling to build a cluster and seeing this behaviour - thanks

@nickperry
Copy link

nickperry commented Jun 14, 2021

@sammcgeown workaround is to make the LeaderElection timings more relaxed. On VMWare TKGM 1.3.x you can do this in a YTT overlay file before building your workload cluster. Copy the attached overlay file to ~/.tanzu/tkg/providers/infrastructure-vsphere/ytt/vsphere-overlay.yaml (ensure you remove the .txt suffix).

This workaround was provided by VMware R&D and relaxes the LeaderElection timings to 30, 20, 4.
vsphere-overlay.yaml.txt

@sammcgeown
Copy link
Contributor

@nickperry thank you - that's perfect. For an installation of kube-vip with kubeadm I'm just editing the /etc/kubernetes/manifests/kube-vip.yaml to match those settings. It appears to be working for me now!

@nickperry
Copy link

@sammcgeown are you seeing the problem when manually deploying with kubeadm on top of existing VMs, rather than using CAPI? Probably quite a useful data point if so. Static pods for etcd or independently managed etcd?

@sammcgeown
Copy link
Contributor

Yes - Ubuntu 21.04 on Raspberry Pi 4s, with static pods for etcd.

@thebsdbox
Copy link
Collaborator

Default timeouts have been improved and there have been no further discussions around this issue for some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working control plane outside-cluster STALE
Projects
None yet
Development

No branches or pull requests

5 participants