Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dual-stack: make nodeipam compatible with existing single-stack clusters when dual-stack feature gate become enabled by default #90439

Merged

Conversation

SataQiu
Copy link
Member

@SataQiu SataQiu commented Apr 24, 2020

What type of PR is this?
/kind bug

What this PR does / why we need it:
Dual-stack: make nodeipam compatible with existing single-stack clusters when dual-stack feature gate become enabled by default

Which issue(s) this PR fixes:

Fixes #90251

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

dual-stack: make nodeipam compatible with existing single-stack clusters when dual-stack feature gate become enabled by default

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Apr 24, 2020
@SataQiu
Copy link
Member Author

SataQiu commented Apr 24, 2020

/cc @danwinship @aojea @aramase

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 24, 2020
@aojea
Copy link
Member

aojea commented Apr 24, 2020

Thanks for the PR but my personal opinion is to postpone all the dual-stack patches that are not addressing specific bugs to the resolution of the KEP, this is addressing a bug in the upgrade process but that's exactly one of the points that are being discussed
kubernetes/enhancements#1679 (comment)

@SataQiu
Copy link
Member Author

SataQiu commented Apr 24, 2020

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Apr 24, 2020
@SataQiu SataQiu changed the title Dual-stack: using node-cidr-mask-size as both IPv4 and IPv6 cidr mask when there is only --node-cidr-mask-size flag specified in a dual-stack cluster Dual-stack: make nodeipam compatible with existing single-stack clusters when dual-stack feature gate become enabled by default Apr 24, 2020
@SataQiu SataQiu force-pushed the dual-stack-node-cidr-20200424 branch from a1d9481 to 1ced4a1 Compare April 24, 2020 09:58
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 24, 2020
Copy link
Contributor

@danwinship danwinship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code needs a little bit more reorganizing than this. It was wrong to have separate setNodeCIDRMaskSizes / setNodeCIDRMaskSizesDualStack behavior based on whether the feature gate is set. The distinction in behavior should be based solely on whether the configuration is single-stack or dual-stack.

So looking at serviceCIDR / secondaryServiceCIDR is right, except I think it would make more sense to look at clusterCIDRs instead. (In theory either both should be single stack or both should be dual stack and so it doesn't matter, but anyway, the node CIDRs are allocated out of the cluster CIDRs, so logically it makes more sense to be looking at that.)

For backward compatibility with both before the dual-stack code was added, and with after the dual-stack code was added but before now, the behavior needs to be:

  • if clusterCIDRs contains only an IPv4 CIDR then:
    • you can pass --node-cidr-mask-size regardless of the feature gate setting, without getting any warnings, and that will set nodeCIDRMaskSizeIPv4
    • you can pass --node-cidr-mask-size-ipv4 at least if the feature gate is enabled, and that will set nodeCIDRMaskSizeIPv4. (It would be fine to make it so that you are allowed to use --node-cidr-mask-size-ipv4 when the feature gate is not enabled too...)
    • you get an error if you pass --node-cidr-mask-size-ipv6, because there's no IPv6 cluster CIDR
  • if clusterCIDRs contains only an IPv6 CIDR then exactly as above with "4" and "6" swapped
  • if clusterCIDRs contains both an IPv4 CIDR and an IPv6 CIDR, then you have to pass --node-cidr-mask-size-ipv4 and --node-cidr-mask-size-ipv6, and it's an error to pass --node-cidr-mask-size.

cmd/kube-controller-manager/app/core.go Outdated Show resolved Hide resolved
cmd/kube-controller-manager/app/core.go Outdated Show resolved Hide resolved
@SataQiu SataQiu force-pushed the dual-stack-node-cidr-20200424 branch from 1ced4a1 to db71cd4 Compare April 25, 2020 07:21
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 25, 2020
@SataQiu
Copy link
Member Author

SataQiu commented Apr 25, 2020

/test pull-kubernetes-e2e-kind-ipv6

@SataQiu SataQiu force-pushed the dual-stack-node-cidr-20200424 branch from db71cd4 to e518935 Compare April 27, 2020 08:45
@SataQiu
Copy link
Member Author

SataQiu commented May 29, 2020

/test pull-kubernetes-node-e2e
/test pull-kubernetes-e2e-kind-ipv6

@SataQiu
Copy link
Member Author

SataQiu commented Jun 1, 2020

Hi @aojea @danwinship Is it ok with you now?

@SataQiu SataQiu requested a review from aojea June 1, 2020 02:47
Copy link
Contributor

@danwinship danwinship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so actually at this point maybe it would make sense to merge setNodeCIDRMaskSizes and getNodeCIDRMaskSizes into a single function?

So when I said that I was thinking that you could just do a single loop over clusterCIDRs and pick the right flag to use to assign the nodeMaskCIDRs value for each one, and it would simplify the code. But I guess that doesn't really work, and the previous organization, where getNodeCIDRMaskSizes does the error checking and setNodeCIDRMaskSizes does the assignment is probably cleaner.

cmd/kube-controller-manager/app/core.go Outdated Show resolved Hide resolved
cmd/kube-controller-manager/app/core.go Outdated Show resolved Hide resolved
@thockin
Copy link
Member

thockin commented Jul 2, 2020

I'm going to approve, but I want @khenidak to decide if this is too big of a rebase pain for his mega changes that are coming.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: SataQiu, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 2, 2020
@danwinship
Copy link
Contributor

This doesn't touch Services at all; it's just about --allocate-node-cidrs. So there should be no conflicts.

…ers when dual-stack feature gate become enabled by default

Signed-off-by: SataQiu <1527062125@qq.com>
@SataQiu SataQiu force-pushed the dual-stack-node-cidr-20200424 branch from b50500b to ec1efc3 Compare July 10, 2020 08:34
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 10, 2020
@@ -156,13 +156,13 @@ func startNodeIpamController(ctx ControllerContext) (http.Handler, bool, error)
}

var nodeCIDRMaskSizeIPv4, nodeCIDRMaskSizeIPv6 int
if utilfeature.DefaultFeatureGate.Enabled(features.IPv6DualStack) {
if dualStack {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dualStack will be true when the --cluster-cidr is configured as dual-stack (at least one from each IPFamily) regardless of the IPv6DualStack feature gate is enabled or not. So even IPv6DualStack feature gate is enabled by default, if the user does not set --cluster-cidr as dual-stack, we will not enter this if branch, --node-cidr-mask-size is valid as before.

@danwinship
Copy link
Contributor

oh, well, that's simple.

So this version means that if there were users who were using nodeipam, and who enabled IPv6DualStack but didn't have a dual-stack configuration, then things will break for them again. (They'll need to revert back from --node-cidr-mask-size-ipv4 to --node-cidr-mask-size.) Given that this only affects people who enabled an alpha feature gate and then didn't even use that feature, that seems like a reasonable break.

/hold cancel
/lgtm

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 10, 2020
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 10, 2020
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@SataQiu
Copy link
Member Author

SataQiu commented Sep 10, 2020

Hi @danwinship This PR has been waiting for a very long time, can we merge it now?

@SataQiu
Copy link
Member Author

SataQiu commented Sep 10, 2020

@dims Can you help set the milestone label for the PR?

@aojea
Copy link
Member

aojea commented Sep 10, 2020

@SataQiu did something happen to this PR? I can't see most of the changes 🤔 , is it my browser?

@dims
Copy link
Member

dims commented Sep 10, 2020

/test pull-kubernetes-e2e-kind-ipv6
/milestone v1.20

@k8s-ci-robot k8s-ci-robot added this to the v1.20 milestone Sep 10, 2020
@danwinship
Copy link
Contributor

danwinship commented Sep 10, 2020

did something happen to this PR? I can't see most of the changes thinking , is it my browser?

no, it just turned out that the final fix was much simpler than the intermediate versions; the old (current git) code does "if the IPv6DualStack feature gate is disabled, you need to use the old CIDR flags, and if the feature gate is enabled, you need to use the new CIDR flags". Eventually @SataQiu figured out that all we needed to do was change that do "if the --cluster-cidr is single stack, you need to use the old CIDR flags, and if the --cluster-cidr is dual stack you need to use the new CIDR flags".

In theory we could also allow using --node-cidr-mask-size-ipv4 in single-stack IPv4 clusters and --node-cidr-mask-size-ipv6 in single-stack IPv6 clusters, but we don't need to, and this PR as is fixes the API break.

@aojea
Copy link
Member

aojea commented Sep 10, 2020

👍 thanks

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

2 similar comments
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 56b9a69 into kubernetes:master Sep 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DualStack: IPv6DualStack feature gate causes kube-controller-manager API break
8 participants