Reboot e2e test made more robust by using nohup. #9117

jszczepkowski · 2015-06-02T13:46:42Z

Reboot e2e test made more robust by using nohup in ssh commands. Fixes #9062. Follow-up of #8784.

k8s-bot · 2015-06-02T13:51:30Z

EXPERIMENTAL JENKINS PR BUILDER: e2e build succeeded.

ghost · 2015-06-02T14:00:50Z

What are the 'sleep 10's for?

jszczepkowski · 2015-06-02T21:37:49Z

To prevent race: one process is waiting for SSH connection to finish, the other is doing something nasty with the machine. So, I wanted to give some time to the first process to finish the session.

jszczepkowski · 2015-06-08T12:23:03Z

@quinton-hoole PTAL

ghost · 2015-06-08T15:51:47Z

You should rather explicitly wait on the condition that you require to be true (e.g. the previous SSH session closing, or whatever). And fail with an explicit error if that does not occur within the timeout (e.g. 10 sec). Otherwise you're inviting both flakiness and long delays in e2e tests, because we'll never know quite how much time to sleep, and it will inevitably be extended every time we see flakiness.

ghost · 2015-06-08T15:55:41Z

In fact looking a bit deeper, rebootNode() already does the checks for you. So I don't believe that you need to sleep at all.

https://github.com/jszczepkowski/kubernetes/blob/e2e-nodes/test/e2e/reboot.go#L217

jszczepkowski · 2015-06-09T07:36:58Z

The check in rebootNode is checking if pods are running. My sleep is not related to pods, but to ssh session. The bash command issued by nohup may be started immediately, before ssh session is closed. But we don't want reboot/network partition to start before the ssh session is closed, so that the session will be cleanly closed (we have a check for this in rebootNode). So, the sleep makes this race much less likely.

Making this condition explicit will be really complicated and will obscure the test (we will need some additional synchronization abstractions) . I think the sleep is good enough.

ghost · 2015-06-10T00:46:34Z

Fair enough, but please put the above explanation in as a comment in the code for the benefit of the next person who has to some along and debug. Thanks. Then LGTM.

Reboot e2e test made more robust by using nohup in ssh commands. Fixes kubernetes#9062. Follow-up of kubernetes#8784.

k8s-bot · 2015-06-10T14:11:34Z

EXPERIMENTAL JENKINS PR BUILDER: e2e build succeeded.

jszczepkowski · 2015-06-10T14:20:40Z

I've added the comments. The PR should be ready now.

Reboot e2e test made more robust by using nohup.

googlebot added the cla: yes label Jun 2, 2015

jszczepkowski assigned ghost Jun 2, 2015

ghost added the area/test-infra label Jun 4, 2015

bgrant0607 mentioned this pull request Jun 5, 2015

Reboot e2e test timeout because of slow docker startup #9349

Closed

Reboot e2e test made more robust by using nohup.

346b847

Reboot e2e test made more robust by using nohup in ssh commands. Fixes kubernetes#9062. Follow-up of kubernetes#8784.

jszczepkowski force-pushed the e2e-nodes branch from 30a029c to 346b847 Compare June 10, 2015 14:04

ghost added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed cla: yes area/test-infra labels Jun 10, 2015

ArtfulCoder added a commit that referenced this pull request Jun 11, 2015

Merge pull request #9117 from jszczepkowski/e2e-nodes

237b968

Reboot e2e test made more robust by using nohup.

ArtfulCoder merged commit 237b968 into kubernetes:master Jun 11, 2015

ghost added cla: yes area/test-infra labels Jun 11, 2015

jszczepkowski unassigned ghost Aug 12, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reboot e2e test made more robust by using nohup. #9117

Reboot e2e test made more robust by using nohup. #9117

jszczepkowski commented Jun 2, 2015

k8s-bot commented Jun 2, 2015

ghost commented Jun 2, 2015

jszczepkowski commented Jun 2, 2015

jszczepkowski commented Jun 8, 2015

ghost commented Jun 8, 2015

ghost commented Jun 8, 2015

jszczepkowski commented Jun 9, 2015

ghost commented Jun 10, 2015

k8s-bot commented Jun 10, 2015

jszczepkowski commented Jun 10, 2015

Reboot e2e test made more robust by using nohup. #9117

Reboot e2e test made more robust by using nohup. #9117

Conversation

jszczepkowski commented Jun 2, 2015

k8s-bot commented Jun 2, 2015

ghost commented Jun 2, 2015

jszczepkowski commented Jun 2, 2015

jszczepkowski commented Jun 8, 2015

ghost commented Jun 8, 2015

ghost commented Jun 8, 2015

jszczepkowski commented Jun 9, 2015

ghost commented Jun 10, 2015

k8s-bot commented Jun 10, 2015

jszczepkowski commented Jun 10, 2015