DaemonSets: add SurgingRollingUpdate strategy #51161

diegs · 2017-08-23T00:27:34Z

What this PR does / why we need it:

Adds a new update strategy for DaemonSets called SurgingRollingUpdate. It is the complement of the existing RollingUpdate strategy: instead of deleting pods up to MaxUnavailable and then waiting for new pods to be created, it creates new pods up to MaxSurge and then cleans up the old pods once the new ones are running successfully.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Fixes #48841
Implements kubernetes/enhancements#373

Release note:

DaemonSets: New SurgingRollingUpdate strategy creates new pods up to MaxSurge before deleting old pods.

cc @aaronlevy @lpabon

k8s-ci-robot · 2017-08-23T00:27:42Z

Hi @diegs. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-github-robot · 2017-08-23T00:28:20Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: diegs
We suggest the following additional approver: smarterclayton

Assign the PR to them by writing /assign @smarterclayton in a comment when ready.

Associated issue: 48841

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

calebamiles · 2017-08-23T00:28:37Z

/ok-to-test

k8s-ci-robot · 2017-08-23T01:18:06Z

@diegs: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
pull-kubernetes-unit	`e56baff`	link	`/test pull-kubernetes-unit`
pull-kubernetes-verify	`e56baff`	link	`/test pull-kubernetes-verify`
pull-kubernetes-bazel	`e56baff`	link	`/test pull-kubernetes-bazel`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

luxas · 2017-08-23T08:04:28Z

@kubernetes/sig-cluster-lifecycle-pr-reviews @kubernetes/sig-apps-pr-reviews
/assign @janetkuo @Kargakis @kow3ns @roberthbailey @luxas

lukemarsden · 2017-08-23T12:00:01Z

This is needed so that SIG-cluster-lifecycle can get self-hosted upgrades working in time for 1.8. Please prioritize reviewing this 😄 ❤️

Thanks!

lpabon

Few documentation comments but LGTM. good tests.

lpabon · 2017-08-23T13:21:45Z

pkg/apis/extensions/types.go

+	// of DaemonSet pods at the start of the update (ex: 10%). The absolute number is calculated from
+	// the percentage by rounding up. This cannot be 0. The default value is 1. Example: when this is
+	// set to 30%, at most 30% of the total number of nodes that should be running the daemon pod
+	// (i.e. status.desiredNumberScheduled) can have 2 pods running at any given time. The update


This may need to be reworded, it is a little confusing:

Example: when this is set to 30%, at most 30% of the total number of nodes that should be running the daemon pod (i.e. status.desiredNumberScheduled) can have 2 pods running at any given time...

Where does the 2 pods running calculation come from?

lpabon · 2017-08-23T13:24:09Z

pkg/apis/extensions/v1beta1/defaults.go

+	if updateStrategy.Type == extensionsv1beta1.SurgingRollingUpdateDaemonSetStrategyType {
+		if updateStrategy.SurgingRollingUpdate == nil {
+			rollingUpdate := extensionsv1beta1.SurgingRollingUpdateDaemonSet{}
+			updateStrategy.SurgingRollingUpdate = &rollingUpdate


Just curious, why set to the variable rollingUpdate which is only used in this scope?

lpabon · 2017-08-23T13:35:17Z

pkg/controller/daemon/update.go

+			}
+		}
+
+		if newPod == nil && numSurge < maxSurge {


Please add some comments to these two conditions. The first is pretty simple, but the second could use some explanation.

lpabon · 2017-08-23T13:42:59Z

pkg/controller/daemon/update_test.go

+			pod0 := newPod("pod-0", "node-0", simpleDaemonSetLabel, nil)
+			pod1 := newPod("pod-1", "node-1", simpleDaemonSetLabel, nil)
+			mapping["node-0"] = []*v1.Pod{pod0, pod1}
+			mapping["node-1"] = []*v1.Pod{}


Nice, good test 👍

liggitt · 2017-08-23T13:58:32Z

pkg/apis/extensions/types.go

@@ -413,6 +421,10 @@ const (
 	// Replace the old daemons by new ones using rolling update i.e replace them on each node one after the other.
 	RollingUpdateDaemonSetStrategyType DaemonSetUpdateStrategyType = "RollingUpdate"

+	// Replace the old daemons by new ones using rolling update i.e replace them on each node one
+	// after the other, creating the new pod and then killing the old one.


this strategy wouldn't work well for pods that make use of resources like host ports, right? (since the new pod would not be able to bind to the port while the old one was still running, and therefore wouldn't ever become healthy, assuming its health check actually indicated whether it was successfully bound and running). that doesn't mean the strategy isn't useful, but that limitation should probably be called out somewhere.

This is not different from using a Rolling deployment with a RWO volume. Worth calling out though.

liggitt · 2017-08-23T13:59:05Z

pkg/apis/extensions/types.go

@@ -437,6 +449,22 @@ type RollingUpdateDaemonSet struct {
 	MaxUnavailable intstr.IntOrString
 }

+// Spec to control the desired behavior of a daemon set surging rolling update.
+type SurgingRollingUpdateDaemonSet struct {


deployments put both MaxUnavailable and MaxSurge in the rolling update strategy parameters. is there a reason not to do the same here?

There were some discussion about that, but sig-apps didn't want to break current API and behavior, hence this new strategy. Keeping things as stable as possible...

wouldn't current behavior be modeled as a default of 0 for maxsurge?

That was proposed, but shot down at some stage
cc @janetkuo @erictune @roberthbailey

see #48841 as well

One reason for a separate strategy is to minimize the risk of introducing bugs in the existing paths. Also, bearing in mind that the only use case we have today for maxSurge in daemonsets is self-hosting, we decided to go down the separate strategy path.

kow3ns

Is there any design doc associated with what will be implemented here?

erictune · 2017-08-23T19:42:31Z

Prefer to review this in the context of a design doc.

erictune · 2017-08-23T19:42:43Z

See comments on #48841 (comment)

diegs · 2017-08-24T01:10:54Z

Thanks for the reviews and questions. Please see #48841 (comment)

k8s-github-robot · 2017-08-26T08:14:06Z

@diegs PR needs rebase

resouer · 2017-10-23T08:46:03Z

What's the status of this PR? Any future update?

roberthbailey · 2017-10-23T17:29:38Z

The discussion moved to the design proposal. The short version is that this feature has been put on hold and won't be part of the 1.9 release. We may reconsider adding it in the future. See kubernetes/community#977 (comment)

diegs added 3 commits August 21, 2017 22:35

SurgingRollingUpdate: create new DaemonSet strategy.

5d69f7c

SurgingRollingUpdate: run ./hack/update-all.sh.

ea6719b

SurgingRollingUpdate: implement the strategy.

e56baff

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 23, 2017

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 23, 2017

diegs changed the title ~~Daemonset surge~~ DaemonSets: add SurgingRollingUpdate strategy Aug 23, 2017

k8s-github-robot assigned liggitt and resouer Aug 23, 2017

k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 23, 2017

diegs mentioned this pull request Aug 23, 2017

DaemonSet to support "add first, then delete" rolling update #48841

Closed

k8s-ci-robot assigned janetkuo, 0xmichalis, kow3ns, luxas and roberthbailey Aug 23, 2017

k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels Aug 23, 2017

lpabon approved these changes Aug 23, 2017

View reviewed changes

liggitt reviewed Aug 23, 2017

View reviewed changes

kow3ns reviewed Aug 23, 2017

View reviewed changes

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 26, 2017

roberthbailey removed their assignment Oct 9, 2017

luxas closed this Oct 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DaemonSets: add SurgingRollingUpdate strategy #51161

DaemonSets: add SurgingRollingUpdate strategy #51161

diegs commented Aug 23, 2017

k8s-ci-robot commented Aug 23, 2017

k8s-github-robot commented Aug 23, 2017

calebamiles commented Aug 23, 2017

k8s-ci-robot commented Aug 23, 2017 •

edited

Loading

luxas commented Aug 23, 2017

lukemarsden commented Aug 23, 2017

lpabon left a comment

lpabon Aug 23, 2017

lpabon Aug 23, 2017

lpabon Aug 23, 2017

lpabon Aug 23, 2017

liggitt Aug 23, 2017

0xmichalis Aug 23, 2017

liggitt Aug 23, 2017

luxas Aug 23, 2017

liggitt Aug 23, 2017

luxas Aug 23, 2017

0xmichalis Aug 23, 2017 •

edited

Loading

kow3ns left a comment

erictune commented Aug 23, 2017

erictune commented Aug 23, 2017

diegs commented Aug 24, 2017

k8s-github-robot commented Aug 26, 2017

resouer commented Oct 23, 2017

roberthbailey commented Oct 23, 2017

DaemonSets: add SurgingRollingUpdate strategy #51161

DaemonSets: add SurgingRollingUpdate strategy #51161

Conversation

diegs commented Aug 23, 2017

k8s-ci-robot commented Aug 23, 2017

k8s-github-robot commented Aug 23, 2017

calebamiles commented Aug 23, 2017

k8s-ci-robot commented Aug 23, 2017 • edited Loading

luxas commented Aug 23, 2017

lukemarsden commented Aug 23, 2017

lpabon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

0xmichalis Aug 23, 2017 • edited Loading

Choose a reason for hiding this comment

kow3ns left a comment

Choose a reason for hiding this comment

erictune commented Aug 23, 2017

erictune commented Aug 23, 2017

diegs commented Aug 24, 2017

k8s-github-robot commented Aug 26, 2017

resouer commented Oct 23, 2017

roberthbailey commented Oct 23, 2017

k8s-ci-robot commented Aug 23, 2017 •

edited

Loading

0xmichalis Aug 23, 2017 •

edited

Loading