-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[e2e flake] [sig-apps] Job should run a job to completion when tasks sometimes fail and are not locally restarted #59527
Comments
kind/bug |
/priority failing-test |
@mithrav: Reiterating the mentions to trigger a notification: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Pinging for updates. This is still a top cause of flakiness. |
So looking at the logs there are 2 problems here actually:
All in all it looks like the backoff policy is causing both problems. Digging further to get to the actual root cause. |
So it looks like no. 2 is caused by 1, actually, and since I'm able to reproduce this once in a while locally I'll try to get to the root cause. |
Automatic merge from submit-queue (batch tested with PRs 60978, 60985). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Backoff only when failed pod shows up **What this PR does / why we need it**: Upon introducing the backoff policy we started to delay sync runs for the job when it failed several times before. This leads to failed jobs not reporting status right away in cases that are not related to failed pods, eg. a successful run. This PR ensures the backoff is applied only when `updatePod` receives a failed pod. **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes #59918 #59527 /assign @janetkuo @kow3ns **Release note**: ```release-note None ```
#60985 merged, closing. |
Back story in #54904, but still the top 2nd flake on Velodrome: http://velodrome.k8s.io/dashboard/db/bigquery-metrics?orgId=1
The text was updated successfully, but these errors were encountered: