Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend the timeout on initial validation of reboot tests. #14784

Merged
merged 1 commit into from
Sep 30, 2015

Conversation

brendandburns
Copy link
Contributor

An attempt to improve #14772

All flakes that I've seen recently are waiting for a pod to transition from (Running, ready=false) to (Running, read=true)

No reason to not just extend the timeout.

@brendandburns
Copy link
Contributor Author

@fabioy @quinton-hoole

@@ -41,6 +41,9 @@ const (

// How long pods have to be "ready" after the reboot.
rebootPodReadyAgainTimeout = 5 * time.Minute

// How long pods have to be "ready" initially before a reboot.
rebootPodReadyInitialTimeout = 2 * time.Minute
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be more inclined to increase podReadyBeforeTimeout (currently 20 sec) to, say 40 sec, for two reasons:

  1. I can't see any reason why the pod startup time for these pods should be any different than other pods?
  2. increasing the timeout 6x from 20 sec to 2 minutes seems excessive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Done
  2. Because if we make it long, we won't (hopefully) have to tune this again. As with other timeouts of this style, since we are polling, most of the time we won't burn that much time waiting for the condition to become true.

@ghost
Copy link

ghost commented Sep 30, 2015

LGTM once comment addressed.
I'd suggest removing the "fixes" in the description, and closing #14772 only once we've seen 20 successive runs of this test (given that flakyness is only in the 10% range). I'll do that now and update description accordingly.

@ghost ghost mentioned this pull request Sep 30, 2015
@brendandburns
Copy link
Contributor Author

Comment addressed. Though gce-reboot is running closer to 50% flaky, so I'm ok w/ marking it closed after 5 clean runs...

--brendan

@k8s-bot
Copy link

k8s-bot commented Sep 30, 2015

Unit, integration and GCE e2e test build/test passed for commit f626b6ed38afec866d606f6c5d69f119772881d3.

@k8s-github-robot k8s-github-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Sep 30, 2015
@k8s-github-robot
Copy link

Labelling this PR as size/XS

@ghost
Copy link

ghost commented Sep 30, 2015

LGTM. As discussed, this will cause the minimum failure time for the test suite to be about 100 x 2 min (3.5 hours), but we can adjust the timeout down again as necessary in one central place.

@ghost ghost added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 30, 2015
@k8s-bot
Copy link

k8s-bot commented Sep 30, 2015

Unit, integration and GCE e2e test build/test passed for commit 09337d1.

@fabioy
Copy link
Contributor

fabioy commented Sep 30, 2015

For better or worse, reboot tests complete in around ~30 min right now. With this change, a failed or flaky test could stop merges for 7 hours (a failed run followed by a success)? Ick...

@fabioy
Copy link
Contributor

fabioy commented Sep 30, 2015

NVM. Forgot that Jenkins let you configure build time limit, so there's that extra control knob (currently at 90 min).

Should I go ahead and merge this manually?

@ghost
Copy link

ghost commented Sep 30, 2015

Yup
On Sep 29, 2015 6:08 PM, "Fabio Yeon" notifications@github.com wrote:

NVM. Forgot that Jenkins let you configure build time limit, so there's
that extra control knob (currently at 90 min).

Should I go ahead and merge this manually?


Reply to this email directly or view it on GitHub
#14784 (comment)
.

fabioy added a commit that referenced this pull request Sep 30, 2015
Extend the timeout on initial validation of reboot tests.
@fabioy fabioy merged commit 23759c8 into kubernetes:master Sep 30, 2015
@fabioy
Copy link
Contributor

fabioy commented Sep 30, 2015

Manually merging this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants