kubelet: ensure static pod UIDs are unique #87461

bboreham · 2020-01-22T17:45:12Z

What this PR does / why we need it:
Fix the issue where pods like kube-proxy, created from the same static manifest across a cluster, all have the same UID. This breaks the concept of a unique ID and confuses some tools.

The underlying issue is a simple clash of behaviours: kubelet uses a md5 hasher which was changed to reset the hash in DeepHashObject. The fix is to re-order the calls so that one goes first.
This does have the side-effect that all pod UIDs will change.

Special notes for your reviewer:
This is a re-opening of #43420. It and the similar #57135 were held up and ultimately abandoned over concerns that all pods would get restarted during upgrade.

Since then, several Kubernetes upgrades have restarted all pods, and there is a statement at #84443 that this is expected behaviour, so this should not be a reason to delay the fix.

The calculation of pod UIDs for static pods has changed to ensure each static pod gets a unique value - this will cause all static pod containers to be recreated/restarted if an in-place kubelet upgrade from 1.20 to 1.21 is performed. Note that draining pods before upgrading the kubelet across minor versions is the supported upgrade path.

/kind bug
/sig node

bboreham · 2020-01-22T18:59:00Z

/test pull-kubernetes-e2e-gce

bboreham · 2020-01-23T11:00:22Z

/retest

Failed tests kinda look like storage mount errors so let's see what happens if I roll the dice again.

mattjmcnaughton

Thanks for your pr :) Think this bug has a long history :) Trying to tag the folks who have been most involved in the conversation.

cc @tallclair @yujuhong @Random-Liu @dchen1107

roffe · 2020-02-16T20:37:55Z

Another stale copy of this issue?

fejta-bot · 2020-05-16T20:41:24Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

bboreham · 2020-06-10T18:00:44Z

Any advice on how to move this forward?
Is there a meeting I should attend?

bboreham · 2020-06-16T05:54:02Z

/remove-lifecycle stale

roffe · 2020-06-18T13:28:38Z

can't we just merge it and be done with it?

roffe · 2020-07-01T09:19:43Z

this is slowly turning into comedy gold

dims · 2020-07-01T10:39:01Z

/assign @sjenning @dashpole
/test all

dashpole · 2020-07-06T23:37:40Z

Change the calculation of pod UIDs so that static pods get a unique value - will cause all containers to be killed and recreated after in-place upgrade.

Does it cause all pods to be restarted, or just static pods?

fejta-bot · 2020-10-04T23:41:43Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

bboreham · 2020-10-05T08:51:49Z

Does it cause all pods to be restarted, or just static pods?

Sorry I missed your question. I believe it's all pods, because the line that moves resets the hasher, so all source information is missing from the pod hash currently, and when we add that in the hash will change.

/remove-lifecycle stale

bboreham · 2020-10-05T08:52:58Z

/retest
/remove-lifecycle stale

dashpole · 2020-11-04T23:08:22Z

/lgtm

derekwaynecarr · 2020-11-09T19:17:04Z

/lgtm
/approve

k8s-ci-robot · 2020-11-09T19:17:43Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bboreham, derekwaynecarr, dims

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/kubelet/OWNERS~~ [derekwaynecarr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fejta-bot · 2020-11-09T22:52:40Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

fejta-bot · 2020-11-10T03:04:41Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

fejta-bot · 2020-11-10T06:55:41Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

Ensure node name and file path are included in the hash which produces the pod UID, otherwise all pods created from the same manifest have the same UID across the cluster. The real author of this code is Yu-Ju Hong <yjhong@google.com>. I am resurrecting an abandoned PR, and changed the git author to pass CLA check. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

bboreham · 2020-11-10T10:05:03Z

Rebased on latest main branch

bboreham · 2020-11-15T17:56:34Z

/retest

bboreham · 2020-11-16T09:29:24Z

@dashpole @derekwaynecarr could you please approve again after rebase. Tests passed on last re-run.

dashpole · 2020-11-16T17:43:28Z

/lgtm

derekwaynecarr · 2020-12-01T15:27:48Z

impacts deployments that run control plane as static pods.

/milestone v1.20

jeremyrickard · 2020-12-07T13:02:14Z

impacts deployments that run control plane as static pods.

/milestone v1.20

Hey there, just wanted to mention that this was after test freeze and the branch cut. If you’d like this in 1.20, it needs to be cherry picked and the release is tomorrow so this probably won’t make it until 1.20.1

jberkus · 2020-12-08T00:14:15Z

Seems like this is going to have to wait until 1.21, then? Given that it forces a clusterwide restart.

Also, we can't be sure that no users are (ab)using this bug and treating it as a feature.

jeremyrickard · 2020-12-08T00:27:57Z

/milestone v1.21

roffe · 2021-01-04T08:35:01Z

Milestone indeed. one of the oldest known bugs in Kubernetes has finally been fixed 🎉

k8s-ci-robot requested review from dchen1107 and dims January 22, 2020 17:46

k8s-ci-robot added the area/kubelet label Jan 22, 2020

bboreham mentioned this pull request Jan 22, 2020

kubelet: fix the generating the UID for pod manifest files #43420

Closed

mattjmcnaughton reviewed Jan 24, 2020

View reviewed changes

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 16, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 16, 2020

k8s-ci-robot assigned dashpole and sjenning Jul 1, 2020

nammn mentioned this pull request Jul 7, 2020

[process-agent] generate unique uid for static pods DataDog/datadog-agent#5916

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 4, 2020

k8s-ci-robot assigned dims Oct 22, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 22, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 9, 2020

bboreham force-pushed the fix-uid-gen branch from 5cac49e to 19de431 Compare November 10, 2020 10:04

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 10, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 16, 2020

k8s-ci-robot added this to the v1.20 milestone Dec 1, 2020

k8s-ci-robot merged commit 61dc69a into kubernetes:master Dec 1, 2020

rphillips mentioned this pull request Dec 1, 2020

Bug 1903248: UPSTREAM: 87461: kubelet: ensure pod UIDs are unique openshift/kubernetes#474

Merged

bboreham mentioned this pull request Dec 3, 2020

kube-proxy show up as single pod weaveworks/scope#2931

Closed

k8s-ci-robot modified the milestones: v1.20, v1.21 Dec 8, 2020

github-actions bot mentioned this pull request Dec 8, 2020

Week Ending December 6, 2020 dev-obs/actus#303

Open

MOZGIII mentioned this pull request Jan 12, 2021

Add support for Kubernetes mirror and static pods vectordotdev/vector#6001

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubelet: ensure static pod UIDs are unique #87461

kubelet: ensure static pod UIDs are unique #87461

bboreham commented Jan 22, 2020 •

edited by liggitt

Loading

bboreham commented Jan 22, 2020

bboreham commented Jan 23, 2020

mattjmcnaughton left a comment

roffe commented Feb 16, 2020

fejta-bot commented May 16, 2020

bboreham commented Jun 10, 2020

bboreham commented Jun 16, 2020

roffe commented Jun 18, 2020

roffe commented Jul 1, 2020

dims commented Jul 1, 2020

dashpole commented Jul 6, 2020

fejta-bot commented Oct 4, 2020

bboreham commented Oct 5, 2020 •

edited

Loading

bboreham commented Oct 5, 2020

dashpole commented Nov 4, 2020

derekwaynecarr commented Nov 9, 2020

k8s-ci-robot commented Nov 9, 2020

fejta-bot commented Nov 9, 2020

fejta-bot commented Nov 10, 2020

fejta-bot commented Nov 10, 2020

bboreham commented Nov 10, 2020

bboreham commented Nov 15, 2020

bboreham commented Nov 16, 2020

dashpole commented Nov 16, 2020

derekwaynecarr commented Dec 1, 2020

jeremyrickard commented Dec 7, 2020

jberkus commented Dec 8, 2020

jeremyrickard commented Dec 8, 2020

roffe commented Jan 4, 2021

kubelet: ensure static pod UIDs are unique #87461

kubelet: ensure static pod UIDs are unique #87461

Conversation

bboreham commented Jan 22, 2020 • edited by liggitt Loading

bboreham commented Jan 22, 2020

bboreham commented Jan 23, 2020

mattjmcnaughton left a comment

Choose a reason for hiding this comment

roffe commented Feb 16, 2020

fejta-bot commented May 16, 2020

bboreham commented Jun 10, 2020

bboreham commented Jun 16, 2020

roffe commented Jun 18, 2020

roffe commented Jul 1, 2020

dims commented Jul 1, 2020

dashpole commented Jul 6, 2020

fejta-bot commented Oct 4, 2020

bboreham commented Oct 5, 2020 • edited Loading

bboreham commented Oct 5, 2020

dashpole commented Nov 4, 2020

derekwaynecarr commented Nov 9, 2020

k8s-ci-robot commented Nov 9, 2020

fejta-bot commented Nov 9, 2020

fejta-bot commented Nov 10, 2020

fejta-bot commented Nov 10, 2020

bboreham commented Nov 10, 2020

bboreham commented Nov 15, 2020

bboreham commented Nov 16, 2020

dashpole commented Nov 16, 2020

derekwaynecarr commented Dec 1, 2020

jeremyrickard commented Dec 7, 2020

jberkus commented Dec 8, 2020

jeremyrickard commented Dec 8, 2020

roffe commented Jan 4, 2021

bboreham commented Jan 22, 2020 •

edited by liggitt

Loading

bboreham commented Oct 5, 2020 •

edited

Loading