-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bootstrap channel DaemonSets have updateStrategy: RollingUpdate #7971
Comments
I think that this is a good idea but we should look more into the implications of doing so. We'd need to make it clear to users that the DaemonSet updates wont take effect on their nodes until they have performed a Another situation where this would have helped is #7926 cc: @mikesplain |
Additionally we could add logic in |
In the specific case of weave, they guarantee backwards compatibility between adjacent major versions, per https://www.weave.works/docs/net/latest/operational-guide/tasks/#cluster-upgrade. I am less familiar with the other CNI plugins as we don't use them here. It's possible this isn't a suitable approach for all of them. |
I would guess that most CNIs make a similar guarantee, but to be safe we could consider making this change on a per DaemonSet basis. I can add it to the agenda for the next office hours and we can solidify a plan there. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle rotten |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
1. What
kops
version are you running? The commandkops version
, will displaythis information.
Version 1.14.1 (git-b7c25f9a9)
2. What Kubernetes version are you running?
kubectl version
will print theversion if a cluster is running or provide the Kubernetes version specified as
a
kops
flag.Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.0", GitCommit:"2bd9643cee5b3b3a5ecbd3af49d09018f0773c77", GitTreeState:"archive", BuildDate:"2019-09-21T09:43:39Z", GoVersion:"go1.12.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.10", GitCommit:"e3c134023df5dea457638b614ee17ef234dc34a6", GitTreeState:"clean", BuildDate:"2019-07-08T03:40:54Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
kops update cluster --target cloudformation --out /tmp
using kops 1.14.1, on a cluster using weave for networking that was last updated with kops 1.12.3.5. What happened after the commands executed?
As soon as the new version of weave was installed, the cluster initiated a rolling restart of the weave pods, which caused a brief but visible network interruption on each node.
6. What did you expect to happen?
I would not expect any weave pods to be replaced until I ask kops to perform a rolling update, given that they are essential for the pods on a node to function correctly.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest.You may want to remove your cluster name and other sensitive information.
https://gist.github.com/stevenjm/c7eccd8b3fc20c675df61dfa3d6e9ecc
9. Anything else do we need to know?
In the case of weave, this is an explicit setting that appears to have been inherited from upstream. I suspect this makes sense for weave upstream, as they can't assume the cluster is managed in such a way that nodes can easily be updated in place, but kops allows us to make this assumption and set updatePolicy to OnDelete. This will let a
kops rolling-update
replace these pods for us.For this to work correctly, the node's user data must include a checksum of the bootstrap channel so that kops knows that nodes needs to be replaced when weave is updated. See also my suggested approach to fixing #7970.
The text was updated successfully, but these errors were encountered: