Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JobTrackingWithFinalizers: Orphan pods might not get the finalizer cleared #108645

Closed
alculquicondor opened this issue Mar 10, 2022 · 3 comments · Fixed by #108752
Closed

JobTrackingWithFinalizers: Orphan pods might not get the finalizer cleared #108645

alculquicondor opened this issue Mar 10, 2022 · 3 comments · Fixed by #108752
Labels
area/batch kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/apps Categorizes an issue or PR as relevant to SIG Apps.

Comments

@alculquicondor
Copy link
Member

alculquicondor commented Mar 10, 2022

What happened?

I run into a situation where a pod didn't loose the finalizer, then it blocked the deletion of a namespace.

We do have an integration test for orphan pods

func TestOrphanPodsFinalizersCleared(t *testing.T) {

But I fear that it might not cover all cases

What did you expect to happen?

Finalizer to always be removed

How can we reproduce it (as minimally and precisely as possible)?

This is a theory, rather than a concrete set of steps:

  1. Some pods of a job start.
  2. A pod finishes (then it would get a deletion timestamp). Let's assume the job controller doesn't have time to count this pod.
  3. A client deletes the job. The cascade would theoretically delete the pod, but since it already has a deletion timestamp, it's not updated again, so it doesn't have a chance to go to the work queue for orphan jobs.

Anything else we need to know?

No response

Kubernetes version

1.23.4

Cloud provider

GKE

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@alculquicondor alculquicondor added the kind/bug Categorizes issue or PR as related to a bug. label Mar 10, 2022
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 10, 2022
@k8s-ci-robot
Copy link
Contributor

@alculquicondor: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@alculquicondor
Copy link
Member Author

/sig apps
/area batch

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. area/batch and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 10, 2022
@alculquicondor
Copy link
Member Author

Another theory for a repro: delete pods right before deleting the job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/batch kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/apps Categorizes an issue or PR as relevant to SIG Apps.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants