Extended.[k8s.io] Probing container should not be restarted with a /healthz http liveness probe [Conformance] #12072

bparees · 2016-11-30T05:04:32Z

• Failure [138.845 seconds]
[k8s.io] Probing container
/data/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:793
  should *not* be restarted with a /healthz http liveness probe [Conformance] [It]
  /data/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/common/container_probe.go:233

  Nov 29 23:16:31.438: pod e2e-tests-container-probe-dk0xc/liveness-http - expected number of restarts: 0, found restarts: 1

  /data/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/common/container_probe.go:373

https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_conformance/8991/

The text was updated successfully, but these errors were encountered:

derekwaynecarr · 2016-12-01T21:08:30Z

Same as kubernetes/kubernetes#28084

smarterclayton · 2017-02-13T17:27:25Z

I disabled this on origin_gce to due to flaking

0xmichalis · 2017-03-28T10:53:01Z

https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/211/consoleFull#-169259343956c60d7be4b02b88ae8c268b

gabemontero · 2017-03-30T15:38:03Z

It got noted again in upstream via kubernetes/kubernetes#30714

Saw it on our end again in #13577

bparees · 2017-08-31T18:04:58Z

https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin_extended_conformance_install_update/5216/

bparees · 2017-08-31T18:05:29Z

referenced issue above has been closed so i am raising the priority of this issue.

bparees · 2017-10-09T22:54:53Z

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/16758/test_pull_request_origin_extended_conformance_install_update/7582/

smarterclayton · 2017-10-10T00:20:18Z

This test sucks, I think there may be a serious problem here. Has flaked for last two years. On Oct 9, 2017, at 6:54 PM, Ben Parees <notifications@github.com> wrote: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/16758/test_pull_request_origin_extended_conformance_install_update/7582/ — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#12072 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p99dqMmIVVOYzt2AkJpKJMGJDm6tks5sqqRAgaJpZM4K_zvJ> .

sjenning · 2018-01-22T19:37:43Z

@ravisantoshgudimetla PTAL

ravisantoshgudimetla · 2018-01-26T21:29:14Z

I have been trying to reproduce this locally but without luck. The main idea of this test is to check that a good container won't restart. Looking at the code and the history of this issue, following are my observations.

We are bringing up a nginx container and trying to check if <container_IP:80> is available.
While this is not a bad idea, there are many things that could go wrong because of which nginx may not come to running. Some of them could be related to underlying infrastructure on which we have no control over, some could be related to network plugin being used etc. Some were mentioned in [k8s.io] Probing container should *not* be restarted with a /healthz http liveness probe [Conformance] {E2eNode Suite} kubernetes/kubernetes#30714 (comment) as well.
For example, the latest one failed with dial tcp 10.128.1.69:80: getsockopt: no route to host. This could be related to network setup as well.
This makes the test indeterministic and therefore flaky.

Since the goal this test is to check that a good container won't restart based on liveness probe, would it make sense to add a liveness command that checks for the existence of a particular directory or something else instead of a HTTP GET.

stevekuznetsov · 2018-02-01T19:30:48Z

@ravisantoshgudimetla do you feel like you can deliver the change to the liveness probe to move away from HTTP?

ravisantoshgudimetla · 2018-02-01T19:34:42Z

Flaked again-https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin_extended_conformance_gce/15413/console

@stevekuznetsov - I created an upstream issue yesterday. I think it would be better to delete this test. I did not get any feedback on it yet but I will create a PR and see if the upstream is ok with it.

Automatic merge from submit-queue (batch tested with PRs 60342, 60505, 59218, 52900, 60486). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Increase failureThresholds for failing HTTP liveness test **What this PR does / why we need it**: Removes test from e2e which relies on HTTP liveness as a measure to tell if the container is good or bad. While this is not a bad idea, we cannot rely on this test as HTTP liveness relies on network/infrastructure etc on which sometimes we have no control over. While increasing the timeout may be an option it may not be ideal for all cloud providers/type of hardware etc. **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes #59150 **Special notes for your reviewer**: I have stated reasons in the issue #59150. We have seen that this test is flaking recently in openshift/origin#12072 **Release note**: ```release-note NONE ```

openshift-bot · 2018-05-02T23:59:28Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2018-06-02T00:40:22Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2018-07-02T00:41:02Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

bparees added component/kubernetes kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1 labels Nov 30, 2016

bparees assigned derekwaynecarr Nov 30, 2016

bparees mentioned this issue Nov 30, 2016

Adding extended tests for buildconfig postCommit #12047

Merged

derekwaynecarr added priority/P2 and removed priority/P1 labels Dec 1, 2016

mfojtik mentioned this issue Dec 2, 2016

Enable oss storage driver #12092

Merged

0xmichalis mentioned this issue Dec 6, 2016

test: retry rollout latest on update conflicts #11482

Merged

marun mentioned this issue Dec 14, 2016

Add hack/build-dind-images.sh #12192

Merged

bparees mentioned this issue Feb 12, 2017

remove mongo clustered test: replaced by statefulset example and test #12913

Merged

gabemontero mentioned this issue Mar 23, 2017

support for gitlab and bitbucket webhooks #13389

Merged

0xmichalis mentioned this issue Mar 28, 2017

Retry pending deployments longer before failing them #13550

Merged

gabemontero mentioned this issue Mar 30, 2017

fix sync plugin PR tester #13577

Closed

php-coder mentioned this issue Apr 10, 2017

oadm policy reconcile-sccs: update comments and help text #13702

Merged

pravisankar mentioned this issue Apr 11, 2017

Support DNS names for egress network policy #13002

Merged

mfojtik mentioned this issue Apr 19, 2017

UPSTREAM: 44625: Retry secret reference addition on conflict #13804

Closed

0xmichalis mentioned this issue Apr 19, 2017

Update logging in our deployment controllers #13762

Merged

gabemontero mentioned this issue Apr 25, 2017

enable builds with PR references #13893

Merged

soltysh mentioned this issue May 2, 2017

Rename all controller.go to be easily distinguishable in logs #13699

Merged

deads2k mentioned this issue May 3, 2017

switch to use upstream remote authentication and authorization #14011

Merged

danwinship mentioned this issue May 9, 2017

Fix the OVS flow "note" to match 1.5 and earlier #14098

Merged

sdodson mentioned this issue May 12, 2017

Remove userland-proxy-path from daemon.json openshift/openshift-ansible#4174

Merged

miminar mentioned this issue May 29, 2017

Prefer secure connection during image pruning #14114

Merged

php-coder mentioned this issue Jun 6, 2017

Show SCC provider in error message #13842

Merged

jim-minter mentioned this issue Jun 9, 2017

make templateinstance immutability message less unfriendly #14494

Merged

sosiouxme mentioned this issue Aug 16, 2017

openshift-checks: have playbooks invoke std_include openshift/openshift-ansible#5026

Merged

abutcher mentioned this issue Aug 17, 2017

Fail scaleup configuration when there are no new_nodes or new_masters openshift/openshift-ansible#4784

Merged

bparees added priority/P1 and removed priority/P2 labels Aug 31, 2017

wongma7 mentioned this issue Sep 8, 2017

UPSTREAM: 45345: Support "fstype" parameter in dynamically provisioned PVs #16232

Merged

ashcrow mentioned this issue Sep 29, 2017

Fix registry auth variable openshift/openshift-ansible#5595

Merged

derekwaynecarr assigned sjenning and unassigned derekwaynecarr Dec 4, 2017

sjenning assigned ravisantoshgudimetla Jan 22, 2018

sjenning removed their assignment Jan 22, 2018

ravisantoshgudimetla mentioned this issue Jan 31, 2018

Probing container should *not* be restarted with a /healthz http liveness probe [Conformance] {E2eNode Suite} kubernetes/kubernetes#59150

Closed

simo5 mentioned this issue Feb 1, 2018

Change Header used for impersonation scopes #18378

Merged

ravisantoshgudimetla mentioned this issue Feb 1, 2018

Increase failureThresholds for failing HTTP liveness test kubernetes/kubernetes#59218

Merged

sosiouxme mentioned this issue Feb 14, 2018

[k8s.io] Probing container should *not* be restarted with a /healthz http liveness probe [Conformance] [Suite:openshift/conformance/parallel] [Suite:k8s] #18616

Closed

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 2, 2018

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 2, 2018

openshift-ci-robot closed this as completed Jul 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extended.[k8s.io] Probing container should not be restarted with a /healthz http liveness probe [Conformance] #12072

Extended.[k8s.io] Probing container should not be restarted with a /healthz http liveness probe [Conformance] #12072

bparees commented Nov 30, 2016

derekwaynecarr commented Dec 1, 2016

smarterclayton commented Feb 13, 2017

0xmichalis commented Mar 28, 2017

gabemontero commented Mar 30, 2017

bparees commented Aug 31, 2017

bparees commented Aug 31, 2017

bparees commented Oct 9, 2017

smarterclayton commented Oct 10, 2017 via email

sjenning commented Jan 22, 2018

ravisantoshgudimetla commented Jan 26, 2018

stevekuznetsov commented Feb 1, 2018

ravisantoshgudimetla commented Feb 1, 2018 •

edited

openshift-bot commented May 2, 2018

openshift-bot commented Jun 2, 2018

openshift-bot commented Jul 2, 2018

Extended.[k8s.io] Probing container should *not* be restarted with a /healthz http liveness probe [Conformance] #12072

Extended.[k8s.io] Probing container should *not* be restarted with a /healthz http liveness probe [Conformance] #12072

Comments

bparees commented Nov 30, 2016

derekwaynecarr commented Dec 1, 2016

smarterclayton commented Feb 13, 2017

0xmichalis commented Mar 28, 2017

gabemontero commented Mar 30, 2017

bparees commented Aug 31, 2017

bparees commented Aug 31, 2017

bparees commented Oct 9, 2017

smarterclayton commented Oct 10, 2017 via email

sjenning commented Jan 22, 2018

ravisantoshgudimetla commented Jan 26, 2018

stevekuznetsov commented Feb 1, 2018

ravisantoshgudimetla commented Feb 1, 2018 • edited

openshift-bot commented May 2, 2018

openshift-bot commented Jun 2, 2018

openshift-bot commented Jul 2, 2018

Extended.[k8s.io] Probing container should not be restarted with a /healthz http liveness probe [Conformance] #12072

Extended.[k8s.io] Probing container should not be restarted with a /healthz http liveness probe [Conformance] #12072

ravisantoshgudimetla commented Feb 1, 2018 •

edited