-
Notifications
You must be signed in to change notification settings - Fork 39.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubelet: ensure static pod UIDs are unique #87461
Conversation
/test pull-kubernetes-e2e-gce |
/retest Failed tests kinda look like storage mount errors so let's see what happens if I roll the dice again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your pr :) Think this bug has a long history :) Trying to tag the folks who have been most involved in the conversation.
Another stale copy of this issue? |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Any advice on how to move this forward? |
/remove-lifecycle stale |
can't we just merge it and be done with it? |
this is slowly turning into comedy gold |
Does it cause all pods to be restarted, or just static pods? |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Sorry I missed your question. I believe it's all pods, because the line that moves resets the hasher, so all source information is missing from the pod hash currently, and when we add that in the hash will change. /remove-lifecycle stale |
/retest |
/lgtm |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bboreham, derekwaynecarr, dims The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Review the full test history for this PR. Silence the bot with an |
2 similar comments
/retest Review the full test history for this PR. Silence the bot with an |
/retest Review the full test history for this PR. Silence the bot with an |
Ensure node name and file path are included in the hash which produces the pod UID, otherwise all pods created from the same manifest have the same UID across the cluster. The real author of this code is Yu-Ju Hong <yjhong@google.com>. I am resurrecting an abandoned PR, and changed the git author to pass CLA check. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
5cac49e
to
19de431
Compare
Rebased on latest main branch |
/retest |
@dashpole @derekwaynecarr could you please approve again after rebase. Tests passed on last re-run. |
/lgtm |
impacts deployments that run control plane as static pods. /milestone v1.20 |
Hey there, just wanted to mention that this was after test freeze and the branch cut. If you’d like this in 1.20, it needs to be cherry picked and the release is tomorrow so this probably won’t make it until 1.20.1 |
Seems like this is going to have to wait until 1.21, then? Given that it forces a clusterwide restart. Also, we can't be sure that no users are (ab)using this bug and treating it as a feature. |
/milestone v1.21 |
Milestone indeed. one of the oldest known bugs in Kubernetes has finally been fixed 🎉 |
What this PR does / why we need it:
Fix the issue where pods like kube-proxy, created from the same static manifest across a cluster, all have the same UID. This breaks the concept of a unique ID and confuses some tools.
The underlying issue is a simple clash of behaviours: kubelet uses a md5 hasher which was changed to reset the hash in
DeepHashObject
. The fix is to re-order the calls so that one goes first.This does have the side-effect that all pod UIDs will change.
Special notes for your reviewer:
This is a re-opening of #43420. It and the similar #57135 were held up and ultimately abandoned over concerns that all pods would get restarted during upgrade.
Since then, several Kubernetes upgrades have restarted all pods, and there is a statement at #84443 that this is expected behaviour, so this should not be a reason to delay the fix.
/kind bug
/sig node