-
Notifications
You must be signed in to change notification settings - Fork 39.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DaemonSets: add SurgingRollingUpdate strategy #51161
Conversation
Hi @diegs. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: diegs Assign the PR to them by writing Associated issue: 48841 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/ok-to-test |
@diegs: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@kubernetes/sig-cluster-lifecycle-pr-reviews @kubernetes/sig-apps-pr-reviews |
This is needed so that SIG-cluster-lifecycle can get self-hosted upgrades working in time for 1.8. Please prioritize reviewing this 😄 ❤️ Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few documentation comments but LGTM. good tests.
// of DaemonSet pods at the start of the update (ex: 10%). The absolute number is calculated from | ||
// the percentage by rounding up. This cannot be 0. The default value is 1. Example: when this is | ||
// set to 30%, at most 30% of the total number of nodes that should be running the daemon pod | ||
// (i.e. status.desiredNumberScheduled) can have 2 pods running at any given time. The update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may need to be reworded, it is a little confusing:
Example: when this is set to 30%, at most 30% of the total number of nodes that
should be running the daemon pod (i.e. status.desiredNumberScheduled) can have
2 pods running at any given time...
Where does the 2 pods running
calculation come from?
if updateStrategy.Type == extensionsv1beta1.SurgingRollingUpdateDaemonSetStrategyType { | ||
if updateStrategy.SurgingRollingUpdate == nil { | ||
rollingUpdate := extensionsv1beta1.SurgingRollingUpdateDaemonSet{} | ||
updateStrategy.SurgingRollingUpdate = &rollingUpdate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why set to the variable rollingUpdate
which is only used in this scope?
} | ||
} | ||
|
||
if newPod == nil && numSurge < maxSurge { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some comments to these two conditions. The first is pretty simple, but the second could use some explanation.
pod0 := newPod("pod-0", "node-0", simpleDaemonSetLabel, nil) | ||
pod1 := newPod("pod-1", "node-1", simpleDaemonSetLabel, nil) | ||
mapping["node-0"] = []*v1.Pod{pod0, pod1} | ||
mapping["node-1"] = []*v1.Pod{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, good test 👍
@@ -413,6 +421,10 @@ const ( | |||
// Replace the old daemons by new ones using rolling update i.e replace them on each node one after the other. | |||
RollingUpdateDaemonSetStrategyType DaemonSetUpdateStrategyType = "RollingUpdate" | |||
|
|||
// Replace the old daemons by new ones using rolling update i.e replace them on each node one | |||
// after the other, creating the new pod and then killing the old one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this strategy wouldn't work well for pods that make use of resources like host ports, right? (since the new pod would not be able to bind to the port while the old one was still running, and therefore wouldn't ever become healthy, assuming its health check actually indicated whether it was successfully bound and running). that doesn't mean the strategy isn't useful, but that limitation should probably be called out somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not different from using a Rolling deployment with a RWO volume. Worth calling out though.
@@ -437,6 +449,22 @@ type RollingUpdateDaemonSet struct { | |||
MaxUnavailable intstr.IntOrString | |||
} | |||
|
|||
// Spec to control the desired behavior of a daemon set surging rolling update. | |||
type SurgingRollingUpdateDaemonSet struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deployments put both MaxUnavailable and MaxSurge in the rolling update strategy parameters. is there a reason not to do the same here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were some discussion about that, but sig-apps didn't want to break current API and behavior, hence this new strategy. Keeping things as stable as possible...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't current behavior be modeled as a default of 0 for maxsurge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was proposed, but shot down at some stage
cc @janetkuo @erictune @roberthbailey
see #48841 as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One reason for a separate strategy is to minimize the risk of introducing bugs in the existing paths. Also, bearing in mind that the only use case we have today for maxSurge in daemonsets is self-hosting, we decided to go down the separate strategy path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any design doc associated with what will be implemented here?
Prefer to review this in the context of a design doc. |
See comments on #48841 (comment) |
Thanks for the reviews and questions. Please see #48841 (comment) |
@diegs PR needs rebase |
What's the status of this PR? Any future update? |
The discussion moved to the design proposal. The short version is that this feature has been put on hold and won't be part of the 1.9 release. We may reconsider adding it in the future. See kubernetes/community#977 (comment) |
What this PR does / why we need it:
Adds a new update strategy for DaemonSets called
SurgingRollingUpdate
. It is the complement of the existingRollingUpdate
strategy: instead of deleting pods up toMaxUnavailable
and then waiting for new pods to be created, it creates new pods up toMaxSurge
and then cleans up the old pods once the new ones are running successfully.Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #Fixes #48841
Implements kubernetes/enhancements#373
Release note:
cc @aaronlevy @lpabon