Bootstrap channel DaemonSets have updateStrategy: RollingUpdate #7971

stevenjm · 2019-11-20T15:52:14Z

1. What kops version are you running? The command kops version, will display
this information.

Version 1.14.1 (git-b7c25f9a9)

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.0", GitCommit:"2bd9643cee5b3b3a5ecbd3af49d09018f0773c77", GitTreeState:"archive", BuildDate:"2019-09-21T09:43:39Z", GoVersion:"go1.12.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.10", GitCommit:"e3c134023df5dea457638b614ee17ef234dc34a6", GitTreeState:"clean", BuildDate:"2019-07-08T03:40:54Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

3. What cloud provider are you using?

AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

kops update cluster --target cloudformation --out /tmp using kops 1.14.1, on a cluster using weave for networking that was last updated with kops 1.12.3.

5. What happened after the commands executed?

As soon as the new version of weave was installed, the cluster initiated a rolling restart of the weave pods, which caused a brief but visible network interruption on each node.

6. What did you expect to happen?

I would not expect any weave pods to be replaced until I ask kops to perform a rolling update, given that they are essential for the pods on a node to function correctly.

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

https://gist.github.com/stevenjm/c7eccd8b3fc20c675df61dfa3d6e9ecc

9. Anything else do we need to know?

In the case of weave, this is an explicit setting that appears to have been inherited from upstream. I suspect this makes sense for weave upstream, as they can't assume the cluster is managed in such a way that nodes can easily be updated in place, but kops allows us to make this assumption and set updatePolicy to OnDelete. This will let a kops rolling-update replace these pods for us.

For this to work correctly, the node's user data must include a checksum of the bootstrap channel so that kops knows that nodes needs to be replaced when weave is updated. See also my suggested approach to fixing #7970.

The text was updated successfully, but these errors were encountered:

rifelpet · 2019-11-22T14:01:46Z

I think that this is a good idea but we should look more into the implications of doing so. We'd need to make it clear to users that the DaemonSet updates wont take effect on their nodes until they have performed a rolling-update, but I think in general it would be good to have the new pod definitions only get used on new nodes. The only concern might be if a CNI upgrade isnt backwards compatible, this would increase the amount of time that the cluster is "split brained" but I suppose we could deal with that when the time comes.

Another situation where this would have helped is #7926

cc: @mikesplain

rifelpet · 2019-11-22T23:07:23Z

Additionally we could add logic in kops rolling-update that not only marks nodes as NeedsUpdate if they arent on the latest LaunchConfiguration or LaunchTemplate but also if the node's DaemonSet pods are not of the latest version of the DaemonSet

stevenjm · 2019-11-25T12:55:30Z

The only concern might be if a CNI upgrade isnt backwards compatible, this would increase the amount of time that the cluster is "split brained" but I suppose we could deal with that when the time comes.

In the specific case of weave, they guarantee backwards compatibility between adjacent major versions, per https://www.weave.works/docs/net/latest/operational-guide/tasks/#cluster-upgrade.

I am less familiar with the other CNI plugins as we don't use them here. It's possible this isn't a suitable approach for all of them.

rifelpet · 2019-11-25T17:33:07Z

I would guess that most CNIs make a similar guarantee, but to be safe we could consider making this change on a per DaemonSet basis. I can add it to the agenda for the next office hours and we can solidify a plan there.

fejta-bot · 2020-02-23T18:13:35Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

stevenjm · 2020-03-11T13:51:18Z

/remove-lifecycle stale

fejta-bot · 2020-06-09T13:57:48Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-07-09T14:39:16Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

stevenjm · 2020-08-05T13:01:04Z

/remove-lifecycle rotten

fejta-bot · 2020-11-03T13:13:23Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

rifelpet · 2020-11-03T14:13:11Z

/remove-lifecycle stale

stevenjm mentioned this issue Nov 20, 2019

Protokube automatically updates software installed from channels #7970

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 23, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 11, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 9, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 9, 2020

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 5, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 3, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 3, 2020

olemarkus mentioned this issue Nov 4, 2020

Make it possible to use OnDelete update strategy on addon daemonset #10167

Merged

k8s-ci-robot closed this as completed in #10167 Nov 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bootstrap channel DaemonSets have updateStrategy: RollingUpdate #7971

Bootstrap channel DaemonSets have updateStrategy: RollingUpdate #7971

stevenjm commented Nov 20, 2019 •

edited

Loading

rifelpet commented Nov 22, 2019

rifelpet commented Nov 22, 2019

stevenjm commented Nov 25, 2019

rifelpet commented Nov 25, 2019

fejta-bot commented Feb 23, 2020

stevenjm commented Mar 11, 2020

fejta-bot commented Jun 9, 2020

fejta-bot commented Jul 9, 2020

stevenjm commented Aug 5, 2020

fejta-bot commented Nov 3, 2020

rifelpet commented Nov 3, 2020

Bootstrap channel DaemonSets have updateStrategy: RollingUpdate #7971

Bootstrap channel DaemonSets have updateStrategy: RollingUpdate #7971

Comments

stevenjm commented Nov 20, 2019 • edited Loading

rifelpet commented Nov 22, 2019

rifelpet commented Nov 22, 2019

stevenjm commented Nov 25, 2019

rifelpet commented Nov 25, 2019

fejta-bot commented Feb 23, 2020

stevenjm commented Mar 11, 2020

fejta-bot commented Jun 9, 2020

fejta-bot commented Jul 9, 2020

stevenjm commented Aug 5, 2020

fejta-bot commented Nov 3, 2020

rifelpet commented Nov 3, 2020

stevenjm commented Nov 20, 2019 •

edited

Loading