Probing container should not be restarted with a /healthz http liveness probe [Conformance] {E2eNode Suite} #59150

ravisantoshgudimetla · 2018-01-31T19:42:19Z

We have noticed few months ago that this test flaked again(openshift/origin#12072 (comment))

Opening this issue again to start dicussion on the test.

Following are my observations after going through history and logs :

We are bringing up a nginx container and trying to check if <container_IP:80> is available.
While this is not a bad idea, there are many things that could go wrong because of which nginx may not come to running. Some of them could be related to underlying infrastructure on which we have no control over, some could be related to network plugin being used etc. Some were mentioned in [k8s.io] Probing container should *not* be restarted with a /healthz http liveness probe [Conformance] {E2eNode Suite} #30714 (comment) as well.
For example, the latest one failed with dial tcp 10.128.1.69:80: getsockopt: no route to host. This could be related to network setup as well.
This makes the test indeterministic and therefore flaky.

Since the goal this test is to check that a good container won't restart based on liveness probe, would it make sense to add a liveness command that checks for the existence of a particular directory that was created during container start up or something else instead of a HTTP GET.

/sig node
cc @Random-Liu

tianshapjq · 2018-02-01T02:15:36Z

configure liveness and readiness probes
not sure if this is helpful, but to my understanding, this is technically already supported as described in this page. You just have to figure out which probe you need most, either httpGet or exec.

ravisantoshgudimetla · 2018-02-01T13:57:22Z

@tianshapjq It is not about configuring which probe to use. The problem is with test case flaking out and it flakes for what I believe could be an infrastructure related issue. I raised this issue to ensure that everyone is ok switching to exec from httpGet.

ravisantoshgudimetla · 2018-02-01T18:57:12Z

https://github.com/kubernetes/kubernetes/blob/master/test/e2e/common/container_probe.go#L114 - I just noticed that we already have a test which does exec liveness probe test. Do we still need the http test? If not, can we delete it or increase timeouts?

fejta · 2018-02-08T20:14:58Z

/kind flake

fejta · 2018-02-08T20:15:27Z

@kubernetes/sig-node-test-failures

Automatic merge from submit-queue (batch tested with PRs 60342, 60505, 59218, 52900, 60486). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Increase failureThresholds for failing HTTP liveness test **What this PR does / why we need it**: Removes test from e2e which relies on HTTP liveness as a measure to tell if the container is good or bad. While this is not a bad idea, we cannot rely on this test as HTTP liveness relies on network/infrastructure etc on which sometimes we have no control over. While increasing the timeout may be an option it may not be ideal for all cloud providers/type of hardware etc. **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes #59150 **Special notes for your reviewer**: I have stated reasons in the issue #59150. We have seen that this test is flaking recently in openshift/origin#12072 **Release note**: ```release-note NONE ```

k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jan 31, 2018

ravisantoshgudimetla mentioned this issue Feb 1, 2018

Increase failureThresholds for failing HTTP liveness test #59218

Merged

mithrav mentioned this issue Feb 8, 2018

Probing container should be restarted with a exec \"cat /tmp/health\" liveness probe [Conformance] #59528

Closed

k8s-ci-robot added the kind/flake Categorizes issue or PR as related to a flaky test. label Feb 8, 2018

k8s-github-robot closed this as completed in #59218 Feb 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Probing container should not be restarted with a /healthz http liveness probe [Conformance] {E2eNode Suite} #59150

Probing container should not be restarted with a /healthz http liveness probe [Conformance] {E2eNode Suite} #59150

ravisantoshgudimetla commented Jan 31, 2018

tianshapjq commented Feb 1, 2018

ravisantoshgudimetla commented Feb 1, 2018

ravisantoshgudimetla commented Feb 1, 2018 •

edited

fejta commented Feb 8, 2018

fejta commented Feb 8, 2018

Probing container should *not* be restarted with a /healthz http liveness probe [Conformance] {E2eNode Suite} #59150

Probing container should *not* be restarted with a /healthz http liveness probe [Conformance] {E2eNode Suite} #59150

Comments

ravisantoshgudimetla commented Jan 31, 2018

tianshapjq commented Feb 1, 2018

ravisantoshgudimetla commented Feb 1, 2018

ravisantoshgudimetla commented Feb 1, 2018 • edited

fejta commented Feb 8, 2018

fejta commented Feb 8, 2018

Probing container should not be restarted with a /healthz http liveness probe [Conformance] {E2eNode Suite} #59150

Probing container should not be restarted with a /healthz http liveness probe [Conformance] {E2eNode Suite} #59150

ravisantoshgudimetla commented Feb 1, 2018 •

edited