Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to surge during rolling update #8313

Merged
merged 15 commits into from Mar 4, 2020
Merged

Conversation

johngmyers
Copy link
Member

Adds a MaxSurge field to the cluster/instancegroup RollingUpdate struct.

/kind feature
/area rolling-update

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. area/rolling-update cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 11, 2020
@johngmyers
Copy link
Member Author

WIP because it depends on #8271 landing first.

@johngmyers
Copy link
Member Author

/retest

@johngmyers
Copy link
Member Author

/test pull-kops-bazel-test

@johngmyers
Copy link
Member Author

/retest

2 similar comments
@johngmyers
Copy link
Member Author

/retest

@hakman
Copy link
Member

hakman commented Jan 26, 2020

/retest

@johngmyers johngmyers changed the title WIP Option to surge during rolling update Option to surge during rolling update Jan 28, 2020
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 28, 2020
@johngmyers
Copy link
Member Author

/assign @justinsb


if maxConcurrency == 0 {
klog.Infof("Rolling updates for InstanceGroup %s are disabled", r.CloudGroup.InstanceGroup.Name)
return nil
}

if r.CloudGroup.InstanceGroup.Spec.Role == api.InstanceGroupRoleMaster && maxSurge != 0 {
// Masters are incapable of surging because they rely on registering themselves through
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how chatty it would be if users don't set any values, but a warning might be great here if we're going to override user settings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should put on the MaxSurge field's documentation Does not have any effect on instance groups with role "master". (or "...does not apply to...")

I could add an api validation that throws a field.Forbidden if MaxSurge is explicitly set to a non-zero value on an InstanceGroup with role "Master", because that just doesn't make sense.

For the case where the user set a nonzero default MaxSurge on the Cluster but didn't override that at the InstanceGroup level, I believe the best user experience would be for kops to silently do the right thing. We shouldn't make the user have to explicitly override the value on each of their master InstanceGroups just to get rid of log noise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - I missed the override point. I agree with either/both of the suggestions, I also don't consider them a blocker to merging this.

pkg/instancegroups/instancegroups.go Outdated Show resolved Hide resolved
@justinsb
Copy link
Member

I think this looks good - it's a clever idea to track the surge state using a tag on the infrastructure level - that's where we basically came unstuck previously.

I'm going to try to think a little bit more about this today (and we should probably talk about it during office hours), but I'm inclined to merge ... particularly if I can satisfy myself it works on non-AWS also :-)

@johngmyers
Copy link
Member Author

Other cloud providers have the option of either tagging/detaching like AWS does or temporarily increasing the desired size of the underlying ASG. When I designed out an interface to be agnostic to this implementation choice yet handle all the failure cases, it ended up looking the same as the detach interface in this PR.

@justinsb
Copy link
Member

justinsb commented Mar 4, 2020

Thanks @johngmyers - this is really great stuff

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 4, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johngmyers, justinsb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 4, 2020
@johngmyers
Copy link
Member Author

/retest

@johngmyers
Copy link
Member Author

One of my added tests has a flake. I was able to reproduce it once in 100 runs. Will investigate.

@johngmyers
Copy link
Member Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 4, 2020
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 4, 2020
@johngmyers
Copy link
Member Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 4, 2020
@johngmyers
Copy link
Member Author

/retest

@rifelpet
Copy link
Member

rifelpet commented Mar 4, 2020

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 4, 2020
@rifelpet
Copy link
Member

rifelpet commented Mar 4, 2020

It'd be great to have some documentation for this feature as well. perhaps in docs/instance_groups.md ?

@k8s-ci-robot k8s-ci-robot merged commit a5dabf5 into kubernetes:master Mar 4, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Mar 4, 2020
@johngmyers johngmyers deleted the surge branch March 4, 2020 18:23
@johngmyers
Copy link
Member Author

@rifelpet see #8673

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/rolling-update cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants