Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to increase concurrency of rolling update within instancegroup #8271

Merged
merged 11 commits into from
Jan 28, 2020

Conversation

johngmyers
Copy link
Member

Adds a RollingUpdate field to instancegroup and cluster for configuring the rolling update strategy. Starts with MaxUnavailable, mostly cribbed from machine-api's MachineDeployment. Unlike MachineDeployment, permits disabling rolling updates for an instancegroup.

/kind feature
/area rolling-update
Fixes #1718, #7685

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. area/rolling-update cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 4, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @johngmyers. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jan 4, 2020
@rifelpet
Copy link
Member

rifelpet commented Jan 5, 2020

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 5, 2020
@johngmyers
Copy link
Member Author

/retest

@johngmyers
Copy link
Member Author

Looks like my tests are a little flaky. Perhaps a GC is getting in and throwing off the timings. I will investigate.
/test pull-kops-bazel-test

@johngmyers
Copy link
Member Author

/test pull-kops-verify-boilerplate

@johngmyers
Copy link
Member Author

/retest

pkg/instancegroups/settings.go Show resolved Hide resolved
unavailable, err := intstr.GetValueFromIntOrPercent(rollingUpdate.MaxUnavailable, numInstances, false)
if err != nil {
// If unparseable use the default value
unavailable = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we might want to log a warning here. Ideally we would have already caught this in validation though.

} {
t.Run(tc.name, func(t *testing.T) {
defaultCluster := &kops.RollingUpdate{}
setFieldValue(defaultCluster, tc.name, tc.defaultValue)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I know where this is going... it is usually easier and clearer just to write it out, but ... let's see once the chain lands!

pkg/instancegroups/instancegroups.go Outdated Show resolved Hide resolved
} else if rollingUpdateData.CloudOnly {
if maxConcurrency == 0 {
klog.Infof("Rolling updates for InstanceGroup %s are disabled", r.CloudGroup.InstanceGroup.Name)
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return an error here (as we did not update the IG)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to continue to update the subsequent instancegroups. The admin specifically wanted to have kops skip this instancegroup, so it isn't an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue here is the placement. Where it is currently it will validate and taint before returning. Should this be moved up before the initial validation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be misunderstanding, but I think you're saying that maxUnavailable: 0 means "skip this IG for rolling updates". I'm not sure I follow that implication, but I also don't know what maxUnavailable: 0 would mean!

Is maxUnavailable: 0 something you're actually using? Because we could just treat it as a validation error until we have maxSurge support (I guess that maxUnavailable=0 and maxSurge=0 is a validation error on a Deployment?)

I'm not overly worry about the tainting, TBH, given that!

pkg/instancegroups/instancegroups.go Outdated Show resolved Hide resolved
@justinsb
Copy link
Member

LGTM in general, a few nits if you agree with them, but the only two I'd like to reach resolution on now is whether we should pass the channel in to drainTerminateAndWait, and having a comment on the sweep block about how it avoids validate calls.

@johngmyers
Copy link
Member Author

/test pull-kops-e2e-kubernetes-aws

@justinsb
Copy link
Member

Thanks for the tweaks @johngmyers - this LGTM. I do wonder if we should just make maxUnavailable: 0 into a validation error for now - are you using it? But anyway we can do that in a separate PR - I don't want to drag this one out any more - thanks for your patience!

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 28, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johngmyers, justinsb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 28, 2020
@johngmyers
Copy link
Member Author

@justinsb maxUnavailable: 0 is intended to address #7685. Do you have an issue with providing that feature, or just this PR's implementation of it? Perhaps we should reopen #7685 and discuss there?

@k8s-ci-robot k8s-ci-robot merged commit e56c507 into kubernetes:master Jan 28, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Jan 28, 2020
@johngmyers johngmyers deleted the max-unavailable branch January 28, 2020 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/rolling-update cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallelize and improve rolling updates even more
4 participants