Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DaemonSets: add SurgingRollingUpdate strategy #51161

Closed
wants to merge 3 commits into from

Conversation

diegs
Copy link
Contributor

@diegs diegs commented Aug 23, 2017

What this PR does / why we need it:

Adds a new update strategy for DaemonSets called SurgingRollingUpdate. It is the complement of the existing RollingUpdate strategy: instead of deleting pods up to MaxUnavailable and then waiting for new pods to be created, it creates new pods up to MaxSurge and then cleans up the old pods once the new ones are running successfully.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Fixes #48841
Implements kubernetes/enhancements#373

Release note:

DaemonSets: New SurgingRollingUpdate strategy creates new pods up to MaxSurge before deleting old pods.

cc @aaronlevy @lpabon

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 23, 2017
@k8s-ci-robot
Copy link
Contributor

Hi @diegs. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 23, 2017
@diegs diegs changed the title Daemonset surge DaemonSets: add SurgingRollingUpdate strategy Aug 23, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: diegs
We suggest the following additional approver: smarterclayton

Assign the PR to them by writing /assign @smarterclayton in a comment when ready.

Associated issue: 48841

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Aug 23, 2017
@calebamiles
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 23, 2017
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Aug 23, 2017

@diegs: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-unit e56baff link /test pull-kubernetes-unit
pull-kubernetes-verify e56baff link /test pull-kubernetes-verify
pull-kubernetes-bazel e56baff link /test pull-kubernetes-bazel

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@luxas
Copy link
Member

luxas commented Aug 23, 2017

@kubernetes/sig-cluster-lifecycle-pr-reviews @kubernetes/sig-apps-pr-reviews
/assign @janetkuo @Kargakis @kow3ns @roberthbailey @luxas

@k8s-ci-robot k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels Aug 23, 2017
@lukemarsden
Copy link
Contributor

This is needed so that SIG-cluster-lifecycle can get self-hosted upgrades working in time for 1.8. Please prioritize reviewing this 😄 ❤️

Thanks!

Copy link
Contributor

@lpabon lpabon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few documentation comments but LGTM. good tests.

// of DaemonSet pods at the start of the update (ex: 10%). The absolute number is calculated from
// the percentage by rounding up. This cannot be 0. The default value is 1. Example: when this is
// set to 30%, at most 30% of the total number of nodes that should be running the daemon pod
// (i.e. status.desiredNumberScheduled) can have 2 pods running at any given time. The update
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may need to be reworded, it is a little confusing:

Example: when this is set to 30%, at most 30% of the total number of nodes that
should be running the daemon pod (i.e. status.desiredNumberScheduled) can have
2 pods running at any given time...

Where does the 2 pods running calculation come from?

if updateStrategy.Type == extensionsv1beta1.SurgingRollingUpdateDaemonSetStrategyType {
if updateStrategy.SurgingRollingUpdate == nil {
rollingUpdate := extensionsv1beta1.SurgingRollingUpdateDaemonSet{}
updateStrategy.SurgingRollingUpdate = &rollingUpdate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why set to the variable rollingUpdate which is only used in this scope?

}
}

if newPod == nil && numSurge < maxSurge {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some comments to these two conditions. The first is pretty simple, but the second could use some explanation.

pod0 := newPod("pod-0", "node-0", simpleDaemonSetLabel, nil)
pod1 := newPod("pod-1", "node-1", simpleDaemonSetLabel, nil)
mapping["node-0"] = []*v1.Pod{pod0, pod1}
mapping["node-1"] = []*v1.Pod{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, good test 👍

@@ -413,6 +421,10 @@ const (
// Replace the old daemons by new ones using rolling update i.e replace them on each node one after the other.
RollingUpdateDaemonSetStrategyType DaemonSetUpdateStrategyType = "RollingUpdate"

// Replace the old daemons by new ones using rolling update i.e replace them on each node one
// after the other, creating the new pod and then killing the old one.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this strategy wouldn't work well for pods that make use of resources like host ports, right? (since the new pod would not be able to bind to the port while the old one was still running, and therefore wouldn't ever become healthy, assuming its health check actually indicated whether it was successfully bound and running). that doesn't mean the strategy isn't useful, but that limitation should probably be called out somewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not different from using a Rolling deployment with a RWO volume. Worth calling out though.

@@ -437,6 +449,22 @@ type RollingUpdateDaemonSet struct {
MaxUnavailable intstr.IntOrString
}

// Spec to control the desired behavior of a daemon set surging rolling update.
type SurgingRollingUpdateDaemonSet struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deployments put both MaxUnavailable and MaxSurge in the rolling update strategy parameters. is there a reason not to do the same here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were some discussion about that, but sig-apps didn't want to break current API and behavior, hence this new strategy. Keeping things as stable as possible...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't current behavior be modeled as a default of 0 for maxsurge?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was proposed, but shot down at some stage
cc @janetkuo @erictune @roberthbailey

see #48841 as well

Copy link
Contributor

@0xmichalis 0xmichalis Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One reason for a separate strategy is to minimize the risk of introducing bugs in the existing paths. Also, bearing in mind that the only use case we have today for maxSurge in daemonsets is self-hosting, we decided to go down the separate strategy path.

Copy link
Member

@kow3ns kow3ns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any design doc associated with what will be implemented here?

@erictune
Copy link
Member

Prefer to review this in the context of a design doc.

@erictune
Copy link
Member

See comments on #48841 (comment)

@diegs
Copy link
Contributor Author

diegs commented Aug 24, 2017

Thanks for the reviews and questions. Please see #48841 (comment)

@k8s-github-robot
Copy link

@diegs PR needs rebase

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 26, 2017
@roberthbailey roberthbailey removed their assignment Oct 9, 2017
@resouer
Copy link
Contributor

resouer commented Oct 23, 2017

What's the status of this PR? Any future update?

@roberthbailey
Copy link
Contributor

The discussion moved to the design proposal. The short version is that this feature has been put on hold and won't be part of the 1.9 release. We may reconsider adding it in the future. See kubernetes/community#977 (comment)

@luxas luxas closed this Oct 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DaemonSet to support "add first, then delete" rolling update