-
Notifications
You must be signed in to change notification settings - Fork 39.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't use strategic merge patch on Node.Status.Addresses #79391
Don't use strategic merge patch on Node.Status.Addresses #79391
Conversation
68f5b0c
to
d717cb4
Compare
d717cb4
to
3064d17
Compare
3064d17
to
e22af74
Compare
I agree with this too, but would prefer to take a pragmatic approach for the out-of-tree case. Maybe we can isolate a commit that fixes the patch strategy for only the CCM case and backport that to previous versions? Forcing every out-of-tree provider to upgrade to v1.16 Kubernetes will not be easy because of the way things are vendored now (vendoring k8s.io/kubernetes is required). |
@kfox1111 for the vSphere case, it would only apply if there is more than 1 network interface with the matching prefix. |
Does CCM have this equivalent loop in a new controller? Can we just make a
different PR for that?
…On Tue, Jul 2, 2019 at 4:49 PM Andrew Sy Kim ***@***.***> wrote:
I think I agree with Dan, though - we have a case where a bug exists. If
that bug exists in 2 components, they both need to be fixed.
I agree with this too, but would prefer to take a pragmatic approach for
the out-of-tree case. Maybe we can isolate a commit that fixes the patch
strategy for only the CCM case and backport that to previous versions?
Forcing every out-of-tree provider to upgrade to v1.16 Kubernetes will not
be easy because of the way things are vendored now (vendoring
k8s.io/kubernetes is required).
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#79391?email_source=notifications&email_token=ABKWAVCCZ6CYDFA2XEYS2MTP5PSR7A5CNFSM4H3MURFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZC3KCA#issuecomment-507884808>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABKWAVH5NEQWSW6LEZ45KWTP5PSR7ANCNFSM4H3MURFA>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship, thockin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
It shares |
Yeah. I see that on my cluster for sure. I'm just trying to understand how this bug will effect me until its fixed. Is restarting a kubelet (during a crash, say) enough to trigger node update failures due to interface reordering? Just how frequent it happens and what triggers it, dictates how to respond to it. If its only when an interface is removed or added, this won't effect any of my production systems as I don't do that. If its on kubelet restart, then its fairly serious and I need to tread very lightly as I am not in full control of that... I'd maybe have to even patch it out of band until upstreamed in that case to be safe. |
Restarting won't trigger the bug |
Ok. Thank you. |
On Tue, Jul 2, 2019 at 8:51 AM Tim Hockin ***@***.***> wrote:
When CCM has been rolled out but some of the nodes are not yet updated?
For the cloud node controller to set the address on the node, the node must
have the cloud taint.
The node is given the cloud taint when it is created by the kubelet.
The kubelet will only create with the cloud taint when the kubelet is
started with the external cloud provider.
So I believe we are ok.
|
I'll add the fault I've just seen in here in case it adds any light to the discussion. The Brightbox Cloud Controller throws up the following error occasionally in the Node Controller when we have multiple external IPs mapped to a node
|
What type of PR is this?
/kind bug
What this PR does / why we need it:
On a node with multiple IP addresses of the same type (eg, multiple InternalIP addresses), when the set of active addresses changed, kubelet would sometimes randomly reorder them. This could cause the node to suddenly declare a different primary node IP, causing various breakage.
This is because the kubelet sync code uses strategic merge patch to update Node.Status, but Node.Status.Addresses is incorrectly annotated with a patchStrategy that only works when there is only a single address of the same type.
Even though the annotation is wrong for the API, fixing it is an API break, particularly since changing it would result in even worse behavior when the clients and servers were different versions. (The client would generate a strategic patch according to one interpretation which the server would then apply according to a different interpretation.)
So this fixes
PatchNodeStatus()
by having it edit the automatically-generated strategic merge patch to force it to use a "replace" strategy on Status.Addressess.Which issue(s) this PR fixes:
none, but the newly-added unit test demonstrates the bug. (It fails against the old code.)
(Downstream bug is https://bugzilla.redhat.com/show_bug.cgi?id=1696628 but it's mostly private)
Does this PR introduce a user-facing change?:
/priority important-soon