-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to surge during rolling update #8313
Conversation
WIP because it depends on #8271 landing first. |
/retest |
/test pull-kops-bazel-test |
/retest |
2 similar comments
/retest |
/retest |
/assign @justinsb |
|
||
if maxConcurrency == 0 { | ||
klog.Infof("Rolling updates for InstanceGroup %s are disabled", r.CloudGroup.InstanceGroup.Name) | ||
return nil | ||
} | ||
|
||
if r.CloudGroup.InstanceGroup.Spec.Role == api.InstanceGroupRoleMaster && maxSurge != 0 { | ||
// Masters are incapable of surging because they rely on registering themselves through |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how chatty it would be if users don't set any values, but a warning might be great here if we're going to override user settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should put on the MaxSurge
field's documentation Does not have any effect on instance groups with role "master".
(or "...does not apply to...")
I could add an api validation that throws a field.Forbidden
if MaxSurge
is explicitly set to a non-zero value on an InstanceGroup with role "Master", because that just doesn't make sense.
For the case where the user set a nonzero default MaxSurge
on the Cluster but didn't override that at the InstanceGroup level, I believe the best user experience would be for kops to silently do the right thing. We shouldn't make the user have to explicitly override the value on each of their master InstanceGroups just to get rid of log noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - I missed the override point. I agree with either/both of the suggestions, I also don't consider them a blocker to merging this.
I think this looks good - it's a clever idea to track the surge state using a tag on the infrastructure level - that's where we basically came unstuck previously. I'm going to try to think a little bit more about this today (and we should probably talk about it during office hours), but I'm inclined to merge ... particularly if I can satisfy myself it works on non-AWS also :-) |
Other cloud providers have the option of either tagging/detaching like AWS does or temporarily increasing the desired size of the underlying ASG. When I designed out an interface to be agnostic to this implementation choice yet handle all the failure cases, it ended up looking the same as the detach interface in this PR. |
Thanks @johngmyers - this is really great stuff /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: johngmyers, justinsb The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
One of my added tests has a flake. I was able to reproduce it once in 100 runs. Will investigate. |
/hold |
/hold cancel |
/retest |
/lgtm |
It'd be great to have some documentation for this feature as well. perhaps in docs/instance_groups.md ? |
Adds a
MaxSurge
field to the cluster/instancegroupRollingUpdate
struct./kind feature
/area rolling-update