-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reboot e2e test timeout because of slow docker startup #9349
Comments
I swear i searched for the error message and found nothing. I think the reboot test involves a whole lot of stuff and there's a good chance all 3 bugs are actually different, so retitling. |
One way to speed up docker's startup is removing all containers before reboot for testing purpose. There is not much things we can do at node level. |
The reboot test shows no failures in the last 30 runs due to this bug. I'm de-prioritizing back to P2, and if it continues to not be flaky, I'm going to kick it out of 1.0 |
@jszczepkowski - can you please take a look? |
Fyi I think reboot is part of the tests we're skipping (https://github.com/GoogleCloudPlatform/kubernetes/blob/master/hack/jenkins/e2e.sh#L69), so if we don't think reboot is important/stable enough to run regularly, this probably isn't a 1.0 bug |
@quinton-hoole @fabioy Do we think this one is now fixed via #14772 , or is there more to do? |
We're actually still seeing occasional timeouts since #14772 , so lets leave this one open to track those. e.g. |
These are still flaking occasionally:
|
I think a lot of that is addressed in #19189 (comment) |
... except that both of those failures were during the TCP problems we saw yesterday, where a bunch of other tests were also failing. |
@dchen1107 Why did you assign this to me? |
Closing in favor of #19189. |
Agree with the dupe. I don't think these were actually networking problems, or nodes not coming up. The cases I saw last week/start of this week were because of cluster addons/static pods not coming up. I might be wrong/taken a biased sampling of wrong logs. |
I've seen this fail on occasion on local e2e runs. It looks like what happens is:
Here's a gist of a segment of my logs:
https://gist.github.com/bprashanth/c90bd10b359851a57cb1
and heres describe events for the pod that the test complained about (elasticsearch-logging-v1-7w3d6):
https://gist.github.com/bprashanth/f380c2e80e5740c37520
@mforbes (just from git blame on test unsure if you care)
The text was updated successfully, but these errors were encountered: