New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Probing container should *not* be restarted with a /healthz http liveness probe [Conformance] {E2eNode Suite} #59150
Comments
configure liveness and readiness probes |
@tianshapjq It is not about configuring which probe to use. The problem is with test case flaking out and it flakes for what I believe could be an infrastructure related issue. I raised this issue to ensure that everyone is ok switching to exec from httpGet. |
https://github.com/kubernetes/kubernetes/blob/master/test/e2e/common/container_probe.go#L114 - I just noticed that we already have a test which does exec liveness probe test. Do we still need the http test? If not, can we delete it or increase timeouts? |
/kind flake |
@kubernetes/sig-node-test-failures |
Automatic merge from submit-queue (batch tested with PRs 60342, 60505, 59218, 52900, 60486). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Increase failureThresholds for failing HTTP liveness test **What this PR does / why we need it**: Removes test from e2e which relies on HTTP liveness as a measure to tell if the container is good or bad. While this is not a bad idea, we cannot rely on this test as HTTP liveness relies on network/infrastructure etc on which sometimes we have no control over. While increasing the timeout may be an option it may not be ideal for all cloud providers/type of hardware etc. **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes #59150 **Special notes for your reviewer**: I have stated reasons in the issue #59150. We have seen that this test is flaking recently in openshift/origin#12072 **Release note**: ```release-note NONE ```
We have noticed few months ago that this test flaked again(openshift/origin#12072 (comment))
Opening this issue again to start dicussion on the test.
Following are my observations after going through history and logs :
For example, the latest one failed with dial tcp 10.128.1.69:80: getsockopt: no route to host. This could be related to network setup as well.
Since the goal this test is to check that a good container won't restart based on liveness probe, would it make sense to add a liveness command that checks for the existence of a particular directory that was created during container start up or something else instead of a HTTP GET.
/sig node
cc @Random-Liu
The text was updated successfully, but these errors were encountered: