New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes No ref for container
in probes after kubelet restart
#84792
Fixes No ref for container
in probes after kubelet restart
#84792
Conversation
Welcome @EricMountain! |
Hi @EricMountain. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig node |
/assign @vishh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ok-to-test
Thanks for your work here!
If you know it, can you share more around the original motivation for the container reference manager? In other words, why did we add this manager, instead of generating references on the fly every time?
Are there benefits offered by the container reference manager that we may loose if we get rid of it entirely?
If there may be benefits, an alternative option would be "falling back" to generating the reference on the fly only if it does not exist in the container manage after the kubelet restart? If I'm understanding correctly, such a change would require less code change... we would just need to update RefManager.GetRef
.
f6b4cb3
to
ddbe5ff
Compare
/retest |
/test pull-kubernetes-e2e-kind |
ddbe5ff
to
a859f2c
Compare
Working on understanding the test failure. |
a859f2c
to
71a28d7
Compare
Retest due to failure creating kind cluster. /test pull-kubernetes-e2e-kind |
71a28d7
to
7842644
Compare
/test pull-kubernetes-kubemark-e2e-gce-big |
/test pull-kubernetes-node-e2e-containerd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks for your work here :) I agree with merging this change for the short-term, and will take a look at your other diff.
/assign @tallclair |
Hi @tallclair! Would you take a look at this PR? Thanks! |
/priority important-soon |
Hello @matt-tyler @tallclair ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/hold
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: EricMountain, tallclair The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
65fe21a
to
4cb28f6
Compare
Hello @EricMountain, @tallclair ! |
/lgtm |
What type of PR is this?
/kind bug
What this PR does / why we need it:
After restarting, the kubelet synchs previously running containers. However it does not populate the container RefManager map (pkg/kubelet/container/container_reference_manager.go).
Any probes that fail against containers that were running before the kubelet restarted do not generate events, and instead log:
So for instance
kubectl describe pod ...
with a container (started before kubelet restart) failing its readiness probe will show:I.e. a non-ready container, however no matching events. If a container is not-ready, then it is because it is failing its Readiness probe and we expect corresponding events.
Liveness probe events are similarly not created in such a situation, though the container is restarted. So we see short container existence time and restarts, but not the matching liveness probe failure events.
Which issue(s) this PR fixes
Fixes the
No ref
log mentionned in #83336, though probably not the underlying issue.Special notes for your reviewer:
2 commits:
No ref
issue and ensure events are created by using almost the same recordContainerEvent() function as implemented in pkg/kubelet/kuberuntime/kuberuntime_container.go. The difference is that we don't do container ID/name substitution as the container ID is not present in the messages passed in.Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
N/A