Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make kubelet set alpha.kubernetes.io/provided-node-ip unconditionally #109794

Merged
merged 1 commit into from Jul 14, 2022

Conversation

mdbooth
Copy link
Contributor

@mdbooth mdbooth commented May 4, 2022

What type of PR is this?

/kind bug

What this PR does / why we need it:

The easier of two possible fixes for an issue where node addresses flap during an upgrade to an external CCM. This fix causes kubelet to apply the alpha.kubernetes.io/provided-node-ip annotation unconditionally (not only when --cloud-provider=external). This does not exclude a future fix involving the root cause of the issue, which is that kubelet and cloud-controller-manager both attempt to manage node addresses when --cloud-provider is not set to external.

Which issue(s) this PR fixes:

Fixes #109793

Special notes for your reviewer:

Does this PR introduce a user-facing change?

The node annotation alpha.kubernetes.io/provided-node-ip is no longer set ONLY when `--cloud-provider=external`.  Now, it is set on kubelet startup if the `--cloud-provider` flag is set at all, including the deprecated in-tree providers.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 4, 2022
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 4, 2022
@pacoxu pacoxu added this to Triage in SIG Node PR Triage May 5, 2022
// node controller in the cluster but kubelet is still running
// the in-tree provider. Adding this annotation in all cases
// ensures that while Addresses flap between the competing
// controllers, they at least flap consistently.
Copy link
Contributor

@elmiko elmiko May 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice comment, thanks Matt

@nckturner
Copy link
Contributor

nckturner commented May 11, 2022

/assign @nckturner

@nckturner
Copy link
Contributor

nckturner commented May 11, 2022

cc @cheftako

Copy link
Contributor

@elmiko elmiko left a comment

i am not super familiar with code here, but this generally makes sense to me
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 11, 2022
@nckturner
Copy link
Contributor

nckturner commented May 16, 2022

Just to make future references to this PR slightly easier to read, can you add a note to the "What this PR does / why we need it:" section, i.e.

The easier of two possible fixes for an issue where node addresses flap during an upgrade to an external CCM. This fix causes kubelet to apply the alpha.kubernetes.io/provided-node-ip annotation unconditionally (not only when --cloud-provider=external. This does not exclude a future fix involving the root cause of the issue, which is that kubelet and cloud-controller-manager both attempt to manage node addresses when --cloud-provider is not set to external.

@nckturner
Copy link
Contributor

nckturner commented May 16, 2022

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 16, 2022
@nckturner
Copy link
Contributor

nckturner commented May 16, 2022

/lgtm

@nckturner
Copy link
Contributor

nckturner commented May 16, 2022

@nckturner
Copy link
Contributor

nckturner commented May 16, 2022

/assign @yujuhong

@nckturner
Copy link
Contributor

nckturner commented Jun 8, 2022

ping @yujuhong @kubernetes/sig-node-bugs @kubernetes/sig-node-pr-reviews, who is the correct reviewer/approver from sig-node for this?

@dchen1107 dchen1107 added this to the v1.25 milestone Jun 14, 2022
@k8s-ci-robot k8s-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 23, 2022
@aojea
Copy link
Member

aojea commented Jun 24, 2022

if externalCloudProvider || cloud != nil {
     do annotation
} 

I'll make this change.

thanks for your comprehension, defer approval for people more familiar with cloud-provider codebase

/assign @cheftako @thockin @andrewsykim

@pacoxu pacoxu moved this from Needs Reviewer to Needs Approver in SIG Node PR Triage Jun 27, 2022
@markyjackson-taulia
Copy link
Member

markyjackson-taulia commented Jun 28, 2022

Hi @mdbooth My name is Marky Jackson and I am one of the k8s 1.25 bug triage shadow assigned to track this body of work. Just checking in to see if this is still on track for k8s 1.25?

@mdbooth
Copy link
Contributor Author

mdbooth commented Jun 29, 2022

Hi @mdbooth My name is Marky Jackson and I am one of the k8s 1.25 bug triage shadow assigned to track this body of work. Just checking in to see if this is still on track for k8s 1.25?

I certainly hope so! Nothing to do from my side that I'm aware of beyond soliciting approvals.

@dchen1107
Copy link
Member

dchen1107 commented Jul 1, 2022

Another PR I came across might try to address a similar issue caused by the external cloud provider migration: #110704 cc @haiqianz

Copy link
Member

@dchen1107 dchen1107 left a comment

One nit on the related doc. Otherwise LGTM

pkg/kubelet/nodestatus/setters.go Show resolved Hide resolved
@dchen1107
Copy link
Member

dchen1107 commented Jul 1, 2022

/lgtm

@nckturner
Copy link
Contributor

nckturner commented Jul 7, 2022

/lgtm

Who can approve? @dchen1107?

@aojea
Copy link
Member

aojea commented Jul 10, 2022

I just found there is a KEP to handle this in/out tree cloud provider migration problem https://github.com/kubernetes/enhancements/tree/master/keps/sig-cloud-provider/2436-controller-manager-leader-migration , can not be used for this?

@mdbooth
Copy link
Contributor Author

mdbooth commented Jul 10, 2022

I just found there is a KEP to handle this in/out tree cloud provider migration problem https://github.com/kubernetes/enhancements/tree/master/keps/sig-cloud-provider/2436-controller-manager-leader-migration , can not be used for this?

Unfortunately not. We're effectively migrating the 'Node controller' from running locally in every kubelet to running centrally in a single Node controller. We'd need a separate leader election for each individual kubelet.

If we wanted to add more annotations I suppose the kubelet could annotate the Node to indicate who is expected to manage it, and the Node controller could be taught to ignore kubelet-managed Nodes. However, this annotation would have an extremely limited useful lifetime.

@nckturner
Copy link
Contributor

nckturner commented Jul 10, 2022

@aojea This PR involves the kubelet and the CCM, (the above KEP refers to a lock between the KCM and CCM) and is specific to the period of time when migration is taking place (old kubelets are still running and reconciling Node Addresses, and the new CCM also begins reconciling node addresses). In order for the above leader migration lock to be used for this, kubelet would need to take that migration lock before doing its node address management. Might be possible, but I think regardless we should still make the above change, because they are not mutually exclusive and it is a simple remediation to the flapping address behavior that is described above.

@dchen1107
Copy link
Member

dchen1107 commented Jul 14, 2022

@nckturner thanks for open pr for doc updates: kubernetes/website#35027

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jul 14, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, mdbooth

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 14, 2022
@k8s-ci-robot k8s-ci-robot merged commit 21149f1 into kubernetes:master Jul 14, 2022
13 of 14 checks passed
SIG Node PR Triage automation moved this from Needs Approver to Done Jul 14, 2022
@sftim
Copy link
Contributor

sftim commented Jul 14, 2022

@nckturner I recommend adding a changelog entry (not NONE) for this change. The updated behavior is visible to end users.

@aojea
Copy link
Member

aojea commented Jul 15, 2022

@nckturner I recommend adding a changelog entry (not NONE) for this change. The updated behavior is visible to end users.

agreed

@nckturner
Copy link
Contributor

nckturner commented Jul 15, 2022

@mdbooth can you add a changelog entry like:

The node annotation alpha.kubernetes.io/provided-node-ip is no longer set ONLY when `--cloud-provider=external`.  Now, it is set on kubelet startup if the `--cloud-provider` flag is set at all, including the deprecated in-tree providers.

@aojea
Copy link
Member

aojea commented Jul 16, 2022

/release-note-edit release-note The node annotation alpha.kubernetes.io/provided-node-ip is no longer set ONLY when `--cloud-provider=external`. Now, it is set on kubelet startup if the `--cloud-provider` flag is set at all, including the deprecated in-tree providers.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

Enabling external CCM causes Node.Addresses to flap