New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BackoffLimit does not work for pods in Jobs (Kubernetes 1.19) #101584
Comments
@kolomenkin: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig architecture |
@kolomenkin: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig apps |
If ErrImagePull happened, the containers of pod would not be created, the backoff only does ImagePull again, so the job's BackoffLimit could not perceive the ImagePullBackoff |
Glanced over. Is this dup of #87278? |
I would say it is not a full dulicate of #87278. I can imagine someone can decide to keep those issue without a fix for Deployments. I'm not sure if Deployment yaml editing is affected, But I point similar behavior in the context of CronJob. I.e. we need to success/fail a job by some schedule. And it is extremely important when CronJob is updated (redeployed) with a different valid image reference. And these new changes are ignored infinitely. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
Does anyone have a workaround for this? |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@Cuppojoe: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened:
What you expected to happen:
I expect Pod to use default BackoffLimit like in documentation:
Default value = 6
Delays = 10sec, 20sec, 40sec, 80 sec, 160 sec and 320 sec
Total delay is less than 12 minutes.
I expect Pod will fail to start in about 12 minutes after it was created.
With "restartPolicy: Never" I expect Job will finish after Pod will fail (in 12 minutes).
And I expect scheduler will create a new job from updated template when it is necessary by the schedule
(currently it never happens).
How to reproduce it (as minimally and precisely as possible):
See "What happened"
Anything else we need to know?:
It seems the counter of Back-off pulling retries is not incremented. Here is a part of POD log:
It seems it makes a docker pull try every 12 seconds on average. It seems it waits only initial 10 seconds between retries.
Here is an earlier version of the same POD event log:
Related (similar) issues:
Environment:
kubectl version
): 1.19Server Version: version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.6-eks-49a6c0", GitCommit:"49a6c0bf091506e7bafcdb1b142351b69363355a", GitTreeState:"clean", BuildDate:"2020-12-23T22:10:21Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
cat /etc/os-release
):uname -a
):The text was updated successfully, but these errors were encountered: