Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use strategic merge patch on Node.Status.Addresses #79391

Merged
merged 2 commits into from
Jul 3, 2019

Conversation

danwinship
Copy link
Contributor

@danwinship danwinship commented Jun 25, 2019

What type of PR is this?
/kind bug

What this PR does / why we need it:
On a node with multiple IP addresses of the same type (eg, multiple InternalIP addresses), when the set of active addresses changed, kubelet would sometimes randomly reorder them. This could cause the node to suddenly declare a different primary node IP, causing various breakage.

This is because the kubelet sync code uses strategic merge patch to update Node.Status, but Node.Status.Addresses is incorrectly annotated with a patchStrategy that only works when there is only a single address of the same type.

Even though the annotation is wrong for the API, fixing it is an API break, particularly since changing it would result in even worse behavior when the clients and servers were different versions. (The client would generate a strategic patch according to one interpretation which the server would then apply according to a different interpretation.)

So this fixes PatchNodeStatus() by having it edit the automatically-generated strategic merge patch to force it to use a "replace" strategy on Status.Addressess.

Which issue(s) this PR fixes:
none, but the newly-added unit test demonstrates the bug. (It fails against the old code.)
(Downstream bug is https://bugzilla.redhat.com/show_bug.cgi?id=1696628 but it's mostly private)

Does this PR introduce a user-facing change?:

Kubelet should now more reliably report the same primary node IP even if the set of node IPs reported by the CloudProvider changes.

/priority important-soon

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 25, 2019
@danwinship danwinship force-pushed the nodeaddresses-update-fix branch 2 times, most recently from 68f5b0c to d717cb4 Compare June 26, 2019 00:19
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Jun 26, 2019
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jun 26, 2019
@danwinship danwinship changed the title WIP kubelet: Don't use strategic merge patch on NodeStatus Don't use strategic merge patch on NodeStatus Jun 26, 2019
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 26, 2019
@RobertKrawitz
Copy link
Contributor

/cc @sttts @liggitt

@andrewsykim
Copy link
Member

I think I agree with Dan, though - we have a case where a bug exists. If that bug exists in 2 components, they both need to be fixed.

I agree with this too, but would prefer to take a pragmatic approach for the out-of-tree case. Maybe we can isolate a commit that fixes the patch strategy for only the CCM case and backport that to previous versions? Forcing every out-of-tree provider to upgrade to v1.16 Kubernetes will not be easy because of the way things are vendored now (vendoring k8s.io/kubernetes is required).

@andrewsykim
Copy link
Member

@kfox1111 for the vSphere case, it would only apply if there is more than 1 network interface with the matching prefix.

@thockin
Copy link
Member

thockin commented Jul 2, 2019 via email

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 2, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 2, 2019
@andrewsykim
Copy link
Member

Does CCM have this equivalent loop in a new controller? Can we just make a different PR for that?

It shares nodeutil.PatchNodeStatus function with the kubelet, but yeah good point we can duplicate the function locally worse-case scenario. Thanks!

@kfox1111
Copy link

kfox1111 commented Jul 3, 2019

@kfox1111 for the vSphere case, it would only apply if there is more than 1 network interface with the matching prefix.

Yeah. I see that on my cluster for sure. I'm just trying to understand how this bug will effect me until its fixed. Is restarting a kubelet (during a crash, say) enough to trigger node update failures due to interface reordering?

Just how frequent it happens and what triggers it, dictates how to respond to it. If its only when an interface is removed or added, this won't effect any of my production systems as I don't do that. If its on kubelet restart, then its fairly serious and I need to tread very lightly as I am not in full control of that... I'd maybe have to even patch it out of band until upstreamed in that case to be safe.

@danwinship
Copy link
Contributor Author

Restarting won't trigger the bug

@kfox1111
Copy link

kfox1111 commented Jul 3, 2019

Ok. Thank you.

@k8s-ci-robot k8s-ci-robot merged commit c8cee54 into kubernetes:master Jul 3, 2019
@cheftako
Copy link
Member

cheftako commented Jul 3, 2019 via email

@NeilW
Copy link

NeilW commented Aug 21, 2019

I'll add the fault I've just seen in here in case it adds any light to the discussion.

The Brightbox Cloud Controller throws up the following error occasionally in the Node Controller when we have multiple external IPs mapped to a node

E0821 10:47:34.158656       1 node_controller.go:188] Error patching node with cloud ip addresses = [failed to patch status "{\"status\":{\"$setElementOrder/addresses\":[{\"type\":\"Hostname\"},{\"type\":\"InternalDNS\"},{\"type\":\"InternalIP\"},{\"type\":\"InternalIP\"},{\"type\":\"ExternalIP\"},{\"type\":\"ExternalDNS\"},{\"type\":\"ExternalIP\"},{\"type\":\"ExternalDNS\"}],\"addresses\":[{\"address\":\"109.107.37.240\",\"type\":\"ExternalIP\"},{\"address\":\"109.107.35.7\",\"type\":\"ExternalIP\"},{\"address\":\"cip-8fp4v.gb1.brightbox.com\",\"type\":\"ExternalDNS\"},{\"address\":\"cip-2mse0.gb1.brightbox.com\",\"type\":\"ExternalDNS\"}]}}" for node "srv-tryop": The order in patch list:
[map[address:109.107.37.240 type:ExternalIP] map[address:109.107.35.7 type:ExternalIP] map[address:cip-8fp4v.gb1.brightbox.com type:ExternalDNS] map[address:cip-2mse0.gb1.brightbox.com type:ExternalDNS]]
 doesn't match $setElementOrder list:
[map[type:Hostname] map[type:InternalDNS] map[type:InternalIP] map[type:InternalIP] map[type:ExternalIP] map[type:ExternalDNS] map[type:ExternalIP] map[type:ExternalDNS]]
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.