Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch to DaemonSet to run on all control plane nodes #310

Merged
merged 1 commit into from
Aug 19, 2022
Merged

Conversation

deitch
Copy link
Contributor

@deitch deitch commented Aug 19, 2022

Partially addresses #304, specifically the second issue raised.

Rather than running it as a Deployment with replicas=1, we run this as a DaemonSet with nodeAffinity restricting it to control plane nodes. This means that if you have 3 such nodes, you will get 3 copies.

The built-in leader election of k8s.io/cloud-provider ensures that only one processes events at a time.

If you have multiple (as this provides), and one dies (of greater concern, the node it is running on dies, taking CPEM with it), then one of the remaining ones will take over quickly, and work will continue as normal. This includes the important part of letting the apiserver know that the node is gone, which is a CPEM responsibility.

Important note:: This does not solve the issue of the node that dies being host both to the CPEM (whether lone before this PR or current leader after this PR) and the EIP (when using an EIP managed by CPEM for apiserver access. In that case, this will not help. As the node goes down, so does CPEM, so nothing can switch the EIP to a functioning node. That is CPEM's responsibility, but it is down, too. Leader election would help, but leader election depends on access to the k8s apiserver, which depends on the EIP, which points to the node that just went down.

That will need to be addressed via some other solution; see the tracking issue linked at the beginning.


Separately, while also dealing with the deployment templates, it also fixes a deprecated annotation. This used to exist:

      annotations:
         scheduler.alpha.kubernetes.io/critical-pod: ''

But has been deprecated since 1.16. The replacement to use:

priorityClassName: system-cluster-critical

as part of the podspec has been adopted.

@deitch
Copy link
Contributor Author

deitch commented Aug 19, 2022

Still doing some testing, so hang on with it.

Signed-off-by: Avi Deitcher <avi@deitcher.net>
@deitch
Copy link
Contributor Author

deitch commented Aug 19, 2022

This is good to go. Looking for a review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants