-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using Helm controller to initialise cilium #151
Comments
@brandond Any ideas on the above? Our Cilium deployment on Kube-Hetzner is not working correctly, and we need to know what to do ASAP, please. Guidance would be genuinely appreciated 🙏 |
@PurpleBooth there isn't a good way to add arbitrary tolerations to the controller at the moment, no. Is the NotReady state on the kubelet that's present until the CNI comes up not sufficient for what you're trying to do? You need to inject another taint that mirrors the kubelet's CNI deployment status? @mysticaltech your ask seems to be unrelated do what @PurpleBooth is trying to accomplish; I'm not sure what kube-hetzner has to do with k3s or our helm controller. |
@brandond We use both! Thanks for the details. Will let @PurpleBooth answer her part. |
I ran into what I believe is a variant of the same issue today, but in a different situation. I made a mistake while deploying the configuration for the rke2-cilium chart which took down Cillium agent in my single node, but now the helm-install pod cannot run due to the It would be really helpful to manage the tolerations for the controller and the pods it generates, and ship sensible defaults in K3S/RKE2 stop avoid such "lock out" situations. |
The CNI HelmCharts are bootstrap charts, which run with host network and tolerate most things, including the NotReady taint. Are you sure this is the root cause of the failure to recover? helm-controller/pkg/controllers/chart/chart.go Lines 525 to 553 in f9103f6
|
@brandond I believe so, I see a pending helm install pod like such:
I took a stab at adding support for customizing tolerations: #221 |
Oh right, I'd forgotten about the original topic of this issue. Why does cilium add a custom Looks like https://docs.cilium.io/en/stable/installation/taints/ covers their thinking, but I don't agree that its useful given how RKE2 deploys cilium. I would recommend turning this off with the |
cc @thomasferrandiz @rbrtbnfgl in case y'all have notes on how to best pass this through to the cilium subchart, and thoughts on whether or not this is something we should disable by default - given its propensity to break everything else in the cluster if the operator does add this taint. The other option is to go around and add cilium taint tolerations to everything that we need to run when the CNI is not up, but that sounds like a lot of work to support a questionable decision on the part of the CNI maintainer. |
@brandond thanks for the recommendation, I will try to add the operator arguments and see if it solves my problem. |
@brandond I agree, it would make sense to disable cilium's taint by default. We can change that in the next cilium update PR. |
We're using the Helm Controller to initialise cilium, and we'd like to use
node.cilium.io/agent-not-ready
to prevent scheduling on the nodes before cilium is ready using initial taints. However this prevents thebootstrap: true
flag from working. What is the best way to achieve this?We'd like to avoid having to restart any unmanaged containers. If needed I could try submitting a PR allowing the customisation of the taint tolerations, but I don't want do that if it's not a direction you'd like to go in.
(Kinda hoping there's a cool feature I have missed to avoid this all together 😄 )
The text was updated successfully, but these errors were encountered: