-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job not failing after backoffLimit is reached #96630
Comments
@yacinelazaar: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig apps |
This is expected behaviour, the reason you have more than 2 failed pods is due to the fact that the backoff count can get reset when a pod exits successfully. So pod fails backoff = 1, then pod completes backoff = 0, the pod fails backoff = 1, the pod completes backoff = 0, etc.
|
Yes but the pods are created successively here. Plus I have sorted the pods by creationTimestamp so there is still 5 failing pods after the last completed pod so backoff = 4 which still superior to 2. |
Ah yes, I see you have four failed pods in a row
Can you show the full output of |
It seems this may be some flaky behaviour that you're seeing |
Had a couple of trials with reduced sleep time (3s) and came across this:
The job is failed but i dont see a reason why. Thought i had to have 3 pods successively failing to reach that status but in this case it just failed after 2. Notice it did not fail after the 4th and 5th. Here is the job description:
As for the job controller, it reports these errors whenever a pod fails:
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Rotten issues close after 30d of inactivity. Send feedback to sig-contributor-experience at kubernetes/community. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened:
I have the current job running that randomly fails or succeeds with completions set to 4 and no parallelism. Notice the backoffLimit here is 2:
So the job create multiple pods then fails:
But upon checking the created pods
I noticed that the job did not fail after making 2 failed retries but 4 (check the last 5 pods)
What you expected to happen:
Job to fail after 2 failed retries
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kubectl version
): v1.19.3cat /etc/os-release
): Ubuntu 18.04.4 LTS (Bionic Beaver)uname -a
): Linux master-0 4.15.0-123-generic Suggest people verify they can start a VM on GCE. #126-Ubuntu SMP Wed Oct 21 09:40:11 UTC 2020 x86_64 x86_64 x86_64 GNU/LinuxThe text was updated successfully, but these errors were encountered: