Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use strategic merge patch on Node.Status.Addresses #79391

Merged
merged 2 commits into from Jul 3, 2019

Conversation

@danwinship
Copy link
Contributor

danwinship commented Jun 25, 2019

What type of PR is this?
/kind bug

What this PR does / why we need it:
On a node with multiple IP addresses of the same type (eg, multiple InternalIP addresses), when the set of active addresses changed, kubelet would sometimes randomly reorder them. This could cause the node to suddenly declare a different primary node IP, causing various breakage.

This is because the kubelet sync code uses strategic merge patch to update Node.Status, but Node.Status.Addresses is incorrectly annotated with a patchStrategy that only works when there is only a single address of the same type.

Even though the annotation is wrong for the API, fixing it is an API break, particularly since changing it would result in even worse behavior when the clients and servers were different versions. (The client would generate a strategic patch according to one interpretation which the server would then apply according to a different interpretation.)

So this fixes PatchNodeStatus() by having it edit the automatically-generated strategic merge patch to force it to use a "replace" strategy on Status.Addressess.

Which issue(s) this PR fixes:
none, but the newly-added unit test demonstrates the bug. (It fails against the old code.)
(Downstream bug is https://bugzilla.redhat.com/show_bug.cgi?id=1696628 but it's mostly private)

Does this PR introduce a user-facing change?:

Kubelet should now more reliably report the same primary node IP even if the set of node IPs reported by the CloudProvider changes.

/priority important-soon

@k8s-ci-robot k8s-ci-robot requested review from dashpole and smarterclayton Jun 25, 2019
@danwinship danwinship force-pushed the danwinship:nodeaddresses-update-fix branch 2 times, most recently from 68f5b0c to d717cb4 Jun 25, 2019
@danwinship danwinship force-pushed the danwinship:nodeaddresses-update-fix branch from d717cb4 to 3064d17 Jun 26, 2019
@danwinship danwinship force-pushed the danwinship:nodeaddresses-update-fix branch from 3064d17 to e22af74 Jun 26, 2019
@danwinship danwinship changed the title WIP kubelet: Don't use strategic merge patch on NodeStatus Don't use strategic merge patch on NodeStatus Jun 26, 2019
@RobertKrawitz

This comment has been minimized.

Copy link
Contributor

RobertKrawitz commented Jun 26, 2019

@k8s-ci-robot k8s-ci-robot requested review from liggitt and sttts Jun 26, 2019
@thockin

This comment has been minimized.

Copy link
Member

thockin commented Jul 2, 2019

@danwinship danwinship force-pushed the danwinship:nodeaddresses-update-fix branch from 79fa94b to 05a9634 Jul 2, 2019
@kfox1111

This comment has been minimized.

Copy link

kfox1111 commented Jul 2, 2019

What typically causes the set of active addresses to change? adding new interfaces/deleting interfaces only? simply restarting kubelet? rebooting? other?

If you do not have a cloud provider set, then the bug doesn't affect you. (I'll update the release note to clarify that.) If you do have a cloud provider set, then the set of active addresses is up to the cloud provider, but generally corresponds to adding/deleting interfaces, and maybe adding/deleting secondary IPs on those interfaces.

In the case of vsphere driver, it calls net.Interfaces and filters out any non vmware prefix. It returns them in the order net.Interfaces returns them. net.Interfaces calls interfaceTable(0), defined here:
https://golang.org/src/net/interface_linux.go
which uses netlink to grab the list. I don't think netlink provides any guarantee on what order...

So its possible simply restarting kubelet would cause the order to change. Is a reordering a problem, or is it only when the number of entries changes?

@andrewsykim

This comment has been minimized.

Copy link
Member

andrewsykim commented Jul 2, 2019

I think I agree with Dan, though - we have a case where a bug exists. If that bug exists in 2 components, they both need to be fixed.

I agree with this too, but would prefer to take a pragmatic approach for the out-of-tree case. Maybe we can isolate a commit that fixes the patch strategy for only the CCM case and backport that to previous versions? Forcing every out-of-tree provider to upgrade to v1.16 Kubernetes will not be easy because of the way things are vendored now (vendoring k8s.io/kubernetes is required).

@andrewsykim

This comment has been minimized.

Copy link
Member

andrewsykim commented Jul 2, 2019

@kfox1111 for the vSphere case, it would only apply if there is more than 1 network interface with the matching prefix.

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Jul 2, 2019

Copy link
Member

thockin left a comment

Thanks!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm label Jul 2, 2019
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jul 2, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@andrewsykim

This comment has been minimized.

Copy link
Member

andrewsykim commented Jul 3, 2019

Does CCM have this equivalent loop in a new controller? Can we just make a different PR for that?

It shares nodeutil.PatchNodeStatus function with the kubelet, but yeah good point we can duplicate the function locally worse-case scenario. Thanks!

@kfox1111

This comment has been minimized.

Copy link

kfox1111 commented Jul 3, 2019

@kfox1111 for the vSphere case, it would only apply if there is more than 1 network interface with the matching prefix.

Yeah. I see that on my cluster for sure. I'm just trying to understand how this bug will effect me until its fixed. Is restarting a kubelet (during a crash, say) enough to trigger node update failures due to interface reordering?

Just how frequent it happens and what triggers it, dictates how to respond to it. If its only when an interface is removed or added, this won't effect any of my production systems as I don't do that. If its on kubelet restart, then its fairly serious and I need to tread very lightly as I am not in full control of that... I'd maybe have to even patch it out of band until upstreamed in that case to be safe.

@danwinship

This comment has been minimized.

Copy link
Contributor Author

danwinship commented Jul 3, 2019

Restarting won't trigger the bug

@kfox1111

This comment has been minimized.

Copy link

kfox1111 commented Jul 3, 2019

Ok. Thank you.

@k8s-ci-robot k8s-ci-robot merged commit c8cee54 into kubernetes:master Jul 3, 2019
23 checks passed
23 checks passed
cla/linuxfoundation danwinship authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-iscsi Skipped.
pull-kubernetes-e2e-gce-iscsi-serial Skipped.
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details
@cheftako

This comment has been minimized.

Copy link
Member

cheftako commented Jul 3, 2019

@danwinship danwinship deleted the danwinship:nodeaddresses-update-fix branch Jul 8, 2019
@liggitt liggitt added this to the v1.16 milestone Aug 6, 2019
@NeilW

This comment has been minimized.

Copy link

NeilW commented Aug 21, 2019

I'll add the fault I've just seen in here in case it adds any light to the discussion.

The Brightbox Cloud Controller throws up the following error occasionally in the Node Controller when we have multiple external IPs mapped to a node

E0821 10:47:34.158656       1 node_controller.go:188] Error patching node with cloud ip addresses = [failed to patch status "{\"status\":{\"$setElementOrder/addresses\":[{\"type\":\"Hostname\"},{\"type\":\"InternalDNS\"},{\"type\":\"InternalIP\"},{\"type\":\"InternalIP\"},{\"type\":\"ExternalIP\"},{\"type\":\"ExternalDNS\"},{\"type\":\"ExternalIP\"},{\"type\":\"ExternalDNS\"}],\"addresses\":[{\"address\":\"109.107.37.240\",\"type\":\"ExternalIP\"},{\"address\":\"109.107.35.7\",\"type\":\"ExternalIP\"},{\"address\":\"cip-8fp4v.gb1.brightbox.com\",\"type\":\"ExternalDNS\"},{\"address\":\"cip-2mse0.gb1.brightbox.com\",\"type\":\"ExternalDNS\"}]}}" for node "srv-tryop": The order in patch list:
[map[address:109.107.37.240 type:ExternalIP] map[address:109.107.35.7 type:ExternalIP] map[address:cip-8fp4v.gb1.brightbox.com type:ExternalDNS] map[address:cip-2mse0.gb1.brightbox.com type:ExternalDNS]]
 doesn't match $setElementOrder list:
[map[type:Hostname] map[type:InternalDNS] map[type:InternalIP] map[type:InternalIP] map[type:ExternalIP] map[type:ExternalDNS] map[type:ExternalIP] map[type:ExternalDNS]]
]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.