switch to DaemonSet to run on all control plane nodes #310

deitch · 2022-08-19T15:47:47Z

Partially addresses #304, specifically the second issue raised.

Rather than running it as a Deployment with replicas=1, we run this as a DaemonSet with nodeAffinity restricting it to control plane nodes. This means that if you have 3 such nodes, you will get 3 copies.

The built-in leader election of k8s.io/cloud-provider ensures that only one processes events at a time.

If you have multiple (as this provides), and one dies (of greater concern, the node it is running on dies, taking CPEM with it), then one of the remaining ones will take over quickly, and work will continue as normal. This includes the important part of letting the apiserver know that the node is gone, which is a CPEM responsibility.

Important note:: This does not solve the issue of the node that dies being host both to the CPEM (whether lone before this PR or current leader after this PR) and the EIP (when using an EIP managed by CPEM for apiserver access. In that case, this will not help. As the node goes down, so does CPEM, so nothing can switch the EIP to a functioning node. That is CPEM's responsibility, but it is down, too. Leader election would help, but leader election depends on access to the k8s apiserver, which depends on the EIP, which points to the node that just went down.

That will need to be addressed via some other solution; see the tracking issue linked at the beginning.

Separately, while also dealing with the deployment templates, it also fixes a deprecated annotation. This used to exist:

      annotations:
         scheduler.alpha.kubernetes.io/critical-pod: ''

But has been deprecated since 1.16. The replacement to use:

priorityClassName: system-cluster-critical

as part of the podspec has been adopted.

deitch · 2022-08-19T15:48:30Z

Still doing some testing, so hang on with it.

Signed-off-by: Avi Deitcher <avi@deitcher.net>

deitch · 2022-08-19T16:28:46Z

This is good to go. Looking for a review

switch to DaemonSet to run on all control plane nodes

f56429e

Signed-off-by: Avi Deitcher <avi@deitcher.net>

deitch force-pushed the daemonset branch from d0e88e3 to f56429e Compare August 19, 2022 16:26

deitch requested review from cprivitere, displague and ocobles August 19, 2022 16:28

cprivitere approved these changes Aug 19, 2022

View reviewed changes

cprivitere merged commit f39d4a7 into master Aug 19, 2022

cprivitere deleted the daemonset branch August 19, 2022 18:02

ctreatma mentioned this pull request Jan 9, 2024

critical-pod is non functional in 1.16+, use priorityClassName instead #342

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

switch to DaemonSet to run on all control plane nodes #310

switch to DaemonSet to run on all control plane nodes #310

deitch commented Aug 19, 2022 •

edited

deitch commented Aug 19, 2022

deitch commented Aug 19, 2022

switch to DaemonSet to run on all control plane nodes #310

switch to DaemonSet to run on all control plane nodes #310

Conversation

deitch commented Aug 19, 2022 • edited

deitch commented Aug 19, 2022

deitch commented Aug 19, 2022

deitch commented Aug 19, 2022 •

edited