JobTrackingWithFinalizers: Orphan pods might not get the finalizer cleared #108645

alculquicondor · 2022-03-10T21:43:08Z

What happened?

I run into a situation where a pod didn't loose the finalizer, then it blocked the deletion of a namespace.

We do have an integration test for orphan pods

kubernetes/test/integration/job/job_test.go

Line 537 in 425ff1c

func TestOrphanPodsFinalizersCleared(t *testing.T) {

But I fear that it might not cover all cases

What did you expect to happen?

Finalizer to always be removed

How can we reproduce it (as minimally and precisely as possible)?

This is a theory, rather than a concrete set of steps:

Some pods of a job start.
A pod finishes (then it would get a deletion timestamp). Let's assume the job controller doesn't have time to count this pod.
A client deletes the job. The cascade would theoretically delete the pod, but since it already has a deletion timestamp, it's not updated again, so it doesn't have a chance to go to the work queue for orphan jobs.

Anything else we need to know?

No response

Kubernetes version

1.23.4

Cloud provider

GKE

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot · 2022-03-10T21:43:17Z

@alculquicondor: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

alculquicondor · 2022-03-10T21:43:44Z

/sig apps
/area batch

alculquicondor · 2022-03-10T21:57:34Z

Another theory for a repro: delete pods right before deleting the job.

alculquicondor added the kind/bug Categorizes issue or PR as related to a bug. label Mar 10, 2022

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 10, 2022

k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. area/batch and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 10, 2022

alculquicondor mentioned this issue Mar 16, 2022

Fix: Clean job tracking finalizer from orphan pods #108752

Merged

k8s-ci-robot closed this as completed in #108752 Mar 25, 2022

MalteMagnussen mentioned this issue Nov 9, 2023

Pod with Finalizer: Can't delete the Pod #121828

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JobTrackingWithFinalizers: Orphan pods might not get the finalizer cleared #108645

JobTrackingWithFinalizers: Orphan pods might not get the finalizer cleared #108645

alculquicondor commented Mar 10, 2022 •

edited

k8s-ci-robot commented Mar 10, 2022

alculquicondor commented Mar 10, 2022

alculquicondor commented Mar 10, 2022

JobTrackingWithFinalizers: Orphan pods might not get the finalizer cleared #108645

JobTrackingWithFinalizers: Orphan pods might not get the finalizer cleared #108645

Comments

alculquicondor commented Mar 10, 2022 • edited

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented Mar 10, 2022

alculquicondor commented Mar 10, 2022

alculquicondor commented Mar 10, 2022

alculquicondor commented Mar 10, 2022 •

edited