Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GKE tests timeout #69891

Closed
mortent opened this Issue Oct 16, 2018 · 10 comments

Comments

Projects
None yet
6 participants
@mortent
Copy link
Member

mortent commented Oct 16, 2018

@msau42

This comment has been minimized.

Copy link
Member

msau42 commented Oct 16, 2018

Nevermind ignore this

@msau42

This comment has been minimized.

Copy link
Member

msau42 commented Oct 16, 2018

I think i found the issue.

This test case is stopping kubelet:

I1016 07:04:10.561] �[0m[sig-storage] In-tree Volumes�[0m �[90m[Driver: gcepd]�[0m �[0m[Testpattern: Pre-provisioned PV (default fs)] subPath�[0m 
I1016 07:04:10.561]   �[1mshould unmount if pod is gracefully deleted while kubelet is down [Disruptive][Slow]�[0m
I1016 07:04:10.561]   �[37m/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/storage/testsuites/subpath.go:313�[0m

I1016 07:04:42.313] Oct 16 07:04:42.313: INFO: ssh prow@130.211.223.164:22: command:   sudo systemctl stop kubelet
I1016 07:04:42.313] Oct 16 07:04:42.313: INFO: ssh prow@130.211.223.164:22: stdout:    ""
I1016 07:04:42.313] Oct 16 07:04:42.313: INFO: ssh prow@130.211.223.164:22: stderr:    ""
I1016 07:04:42.313] Oct 16 07:04:42.313: INFO: ssh prow@130.211.223.164:22: exit code: 0
I1016 07:04:42.314] Oct 16 07:04:42.313: INFO: Waiting up to 1m0s for node gke-e2e-7208-4a777-default-pool-8048d34c-tg31 condition Ready to be false
I1016 07:04:42.316] Oct 16 07:04:42.315: INFO: Condition Ready of node gke-e2e-7208-4a777-default-pool-8048d34c-tg31 is true instead of false. Reason: KubeletReady, message: kubelet is posting ready status. AppArmor enabled

But because of #69786, the Node lifecycle controller never updated the node ready status, so the test failed. However, the test doesn't seem to have gone through its recovery procedure to restart kubelet.

So all the subsequent tests that have pods scheduled to this node will fail.

@msau42

This comment has been minimized.

Copy link
Member

msau42 commented Oct 16, 2018

/assign @jingxu97

@AishSundar

This comment has been minimized.

Copy link
Contributor

AishSundar commented Oct 17, 2018

@jingxu97 any update on this issue?

@AishSundar

This comment has been minimized.

Copy link
Contributor

AishSundar commented Oct 17, 2018

@wangzhen127 @msau42 now that #69786 is fixed, will this help this timeout as well?

@msau42

This comment has been minimized.

Copy link
Member

msau42 commented Oct 17, 2018

Yes, #69786 will not trigger the condition that is causing this test to fail and not properly cleanup.

#69944 is fixing the test cleanup

@AishSundar

This comment has been minimized.

Copy link
Contributor

AishSundar commented Oct 18, 2018

#69944 is merged and from the looks of it the gke jobs seem to be passing (atleast in master which had a recent run). Thanks @jingxu97 and @msau42 for the investigation and fix.

https://k8s-testgrid.appspot.com/sig-release-master-blocking#gke-cos-master-serial

@jberkus or @mortent to close this issue once upgrade jobs turn green as well

@jberkus

This comment has been minimized.

Copy link

jberkus commented Oct 18, 2018

Now that we're not timing out, we're seeing some other failures. I'll wait for one more consistent run, and then close this and open a new issue for the new failures.

@jberkus

This comment has been minimized.

Copy link

jberkus commented Oct 23, 2018

/close

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Oct 23, 2018

@jberkus: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.