-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend the timeout on initial validation of reboot tests. #14784
Conversation
@@ -41,6 +41,9 @@ const ( | |||
|
|||
// How long pods have to be "ready" after the reboot. | |||
rebootPodReadyAgainTimeout = 5 * time.Minute | |||
|
|||
// How long pods have to be "ready" initially before a reboot. | |||
rebootPodReadyInitialTimeout = 2 * time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be more inclined to increase podReadyBeforeTimeout (currently 20 sec) to, say 40 sec, for two reasons:
- I can't see any reason why the pod startup time for these pods should be any different than other pods?
- increasing the timeout 6x from 20 sec to 2 minutes seems excessive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Done
- Because if we make it long, we won't (hopefully) have to tune this again. As with other timeouts of this style, since we are polling, most of the time we won't burn that much time waiting for the condition to become true.
LGTM once comment addressed. |
Comment addressed. Though gce-reboot is running closer to 50% flaky, so I'm ok w/ marking it closed after 5 clean runs... --brendan |
Unit, integration and GCE e2e test build/test passed for commit f626b6ed38afec866d606f6c5d69f119772881d3. |
Labelling this PR as size/XS |
LGTM. As discussed, this will cause the minimum failure time for the test suite to be about 100 x 2 min (3.5 hours), but we can adjust the timeout down again as necessary in one central place. |
Unit, integration and GCE e2e test build/test passed for commit 09337d1. |
For better or worse, reboot tests complete in around ~30 min right now. With this change, a failed or flaky test could stop merges for 7 hours (a failed run followed by a success)? Ick... |
NVM. Forgot that Jenkins let you configure build time limit, so there's that extra control knob (currently at 90 min). Should I go ahead and merge this manually? |
Yup
|
Extend the timeout on initial validation of reboot tests.
Manually merging this PR. |
An attempt to improve #14772
All flakes that I've seen recently are waiting for a pod to transition from (Running, ready=false) to (Running, read=true)
No reason to not just extend the timeout.