-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix node lifecycle controller panic when conditionType ready is nil #122874
Conversation
Hi @fusida. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test Can you add a unit test or integration test that covers this bug? I expect it to not have any issues now. |
Sounds like a bug. /triage accepted |
/priority important-soon |
Can you share which k8s version you got this in? And does this look to be an regression? |
it happened 1.22.15, but he code still unchange in 1.28 patch node conditions ready nil, and stop kubelet report lease, then the kcm set node notready will meet this problem |
/remove-area apiserver |
@fusida: Those labels are not set on the issue: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-area code-generation |
/retest |
I have removed the second commit. |
LGTM label has been added. Git tree hash: 9869f19b0ec8fa8f8a52dec0b39f066aa8a8864e
|
@aojea done |
@@ -623,6 +623,11 @@ func (nc *Controller) doNoExecuteTaintingPass(ctx context.Context) { | |||
return false, 50 * time.Millisecond | |||
} | |||
_, condition := controllerutil.GetNodeCondition(&node.Status, v1.NodeReady) | |||
if condition == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's definitely good not to panic, but is retrying correct, or should the absence of the Ready condition be treated as presence with Unknown
status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that it should never be nil and it happens because an external force action, then other controller in the kubelet should reconcile it, so retrying waiting for the kubelet (IIRC) seems a good option
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aojea, fusida, liggitt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Changelog suggestion -fix node lifecycle controller panic when conditionType ready is been patch nil by mistake
+Fixed a panic in the node lifecycle controller when a `Ready` condition was incorrectly patched to be null. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
there are some network plugins and operators that will patch the node conditions, if they use merge-patch by mistake may cause the node conditions to be replaced unexpectedly.
when node .status.conditions[].ready is not exist, the node lifecycle controller may panic.
the kube controller manager error:
Which issue(s) this PR fixes:
Fixes #
condition may be nil and condition.Status will panic:
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: