add test confirming service network functions from openshift-apiserver pod #25291

deads2k · 2020-07-16T17:56:04Z

This adds checks to be sure we don't experience outages on the service network. We noticed that there were "connection refused" errors from the openshift-apiserver to the kube-apiserver only through the service network. The direct access via the node IP (kube-apiserver are host-network) continued without any failures.

/hold

Need to hold until the openshift-apiserver network checks land. The choice of Fail versus Flake will be determined based on failure rates.

sanchezl · 2020-07-17T03:06:42Z

test/extended/networking/service_network_consistent.go

+
+		failures := []string{}
+		for _, check := range connectivityChecks.Items {
+			if !strings.Contains(check.Name, "kube-service") {


Suggested change

if !strings.Contains(check.Name, "kube-service") {

if !strings.Contains(check.Name, "kubernetes-service") {

danwinship · 2020-07-17T13:11:08Z

test/extended/networking/service_network_consistent.go

+			}
+		}
+		if len(failures) > 0 {
+			g.Fail(fmt.Sprintf("the KUBERNETES_SERVICE_HOST:KUBERNETES_SERVICE_PORT was inaccessible via the service network IP (compare against kube-apiserver direct endpoint access):\n%v", strings.Join(failures, "\n")))


KUBERNETES_SERVICE_HOST is the service network IP. (ie, this is basically saying "172.30.0.1 was inaccessible via 172.30.0.1").

Also, a random developer looking at test failures will have no idea what "(compare against kube-apiserver direct endpoint access)" is supposed to mean. Is that an instruction to the reader? If so, where is the reader supposed to get that other information, and why can't the test case make the comparison itself?

Also, you can use g.Errorf(...) instead of g.Fail(fmt.Sprintf(...))

Also, a random developer looking at test failures will have no idea what "(compare against kube-apiserver direct endpoint access)" is supposed to mean. Is that an instruction to the reader? If so, where is the reader supposed to get that other information, and why can't the test case make the comparison itself?

I'll see about adding more information. Basically we are seeing cases today where direct access to a kube-apiserver via the node IP works fine. But access via 172.30.0.1 failed. So we know the kube-apiservers (all of them) are accepting connections, but 172.30.0.1 shows repeated "connection refused"

so I want to be sure no one bounces these as "well the kube-apiserver is returning connection refused". In every case we've seen so far, the kube-apiserver is provably functioning and handling connections from the exact same pods, but cannot be accessed via the service network on a node that is reporting the network as ready.

danwinship · 2020-07-17T13:17:55Z

test/extended/networking/service_network_consistent.go

+			}
+		}
+		if len(failures) > 0 {
+			g.Fail(fmt.Sprintf("the `oc -n openshift-kube-apiserver get services/apiserver` was inaccessible via the service network IP (compare against kube-apiserver direct endpoint access):\n%v", strings.Join(failures, "\n")))


It would not be clear to me from the output of these two tests what the difference is between them is (what does -n openshift-kube-apiserver services/apiserver point to, if not the kube apiserver?)

One is maintained by the kube-apiserver directly. The kube-apiserver directly writes into the endpoints resource. This is 172.30.0.1.

The other is used by the service monitor. It is a real service maintained by the service/endpoints controller. This has a different IP.

deads2k · 2020-07-20T13:24:33Z

/test all

deads2k · 2020-07-20T17:59:40Z

now has checks on the actual kube-apiserver endpoints to avoid false positives.

/hold cancel

deads2k · 2020-07-20T20:25:39Z

The test ran.

/test all

deads2k · 2020-07-20T22:49:37Z

/retest

deads2k · 2020-07-21T18:26:22Z

/test all

deads2k · 2020-07-22T17:08:59Z

/test all

deads2k · 2020-08-10T12:28:25Z

/test all

deads2k · 2020-08-10T18:51:44Z

/retest

deads2k · 2020-08-11T13:16:58Z

/test all

now that Luis fixed the names

sanchezl

/lgtm

Won't run cleanly until openshift/cluster-kube-apiserver-operator#928 is merged.

openshift-ci-robot · 2020-08-11T17:45:06Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, sanchezl

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [deads2k]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2020-08-11T19:47:37Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-ci-robot · 2020-08-11T20:21:45Z

New changes are detected. LGTM label has been removed.

deads2k · 2020-08-12T13:31:41Z

/retest

…r pod

openshift-ci-robot · 2020-09-02T20:43:44Z

@deads2k: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-aws-csi	`d228c92`	link	`/test e2e-aws-csi`
ci/prow/e2e-cmd	`d228c92`	link	`/test e2e-cmd`
ci/prow/e2e-gcp-upgrade	`d228c92`	link	`/test e2e-gcp-upgrade`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-merge-robot · 2020-10-23T03:55:46Z

@deads2k: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-agnostic-cmd	`d228c92`	link	`/test e2e-agnostic-cmd`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-bot · 2021-01-21T18:45:04Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2021-02-20T19:02:25Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2021-03-22T23:20:30Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot · 2021-03-22T23:20:41Z

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 16, 2020

openshift-ci-robot requested review from danwinship and smarterclayton July 16, 2020 17:56

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 16, 2020

sanchezl approved these changes Jul 17, 2020

View reviewed changes

danwinship reviewed Jul 17, 2020

View reviewed changes

deads2k force-pushed the service-network-down branch from 887b17c to cf0dcdb Compare July 20, 2020 15:33

openshift-ci-robot added the vendor-update Touching vendor dir or related files label Jul 20, 2020

deads2k force-pushed the service-network-down branch 2 times, most recently from 7384056 to d563b59 Compare July 20, 2020 17:47

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 20, 2020

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 23, 2020

deads2k force-pushed the service-network-down branch from d563b59 to ce1bc0d Compare August 7, 2020 19:28

openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 7, 2020

sanchezl approved these changes Aug 11, 2020

View reviewed changes

openshift-ci-robot assigned sanchezl Aug 11, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 11, 2020

deads2k force-pushed the service-network-down branch from ce1bc0d to e8b9115 Compare August 11, 2020 20:21

openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Aug 11, 2020

deads2k force-pushed the service-network-down branch from e8b9115 to 3cdf0de Compare August 12, 2020 15:26

deads2k added 4 commits September 2, 2020 14:38

add test confirming service network functions from openshift-apiserve…

009df1c

…r pod

bump(openshift/client-go)

1537092

confirm load balancers do not experience outages

1d6d61b

see if I can make a fake test summary

d228c92

deads2k force-pushed the service-network-down branch from 3cdf0de to d228c92 Compare September 2, 2020 18:52

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 21, 2021

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 20, 2021

openshift-ci-robot closed this Mar 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add test confirming service network functions from openshift-apiserver pod #25291

add test confirming service network functions from openshift-apiserver pod #25291

deads2k commented Jul 16, 2020

sanchezl Jul 17, 2020

danwinship Jul 17, 2020

deads2k Jul 20, 2020

deads2k Jul 20, 2020 •

edited

danwinship Jul 17, 2020

deads2k Jul 20, 2020

deads2k commented Jul 20, 2020

deads2k commented Jul 20, 2020

deads2k commented Jul 20, 2020

deads2k commented Jul 20, 2020

deads2k commented Jul 21, 2020

deads2k commented Jul 22, 2020

deads2k commented Aug 10, 2020

deads2k commented Aug 10, 2020

deads2k commented Aug 11, 2020

sanchezl left a comment

openshift-ci-robot commented Aug 11, 2020

openshift-bot commented Aug 11, 2020

openshift-ci-robot commented Aug 11, 2020

deads2k commented Aug 12, 2020

openshift-ci-robot commented Sep 2, 2020 •

edited

openshift-merge-robot commented Oct 23, 2020

openshift-bot commented Jan 21, 2021

openshift-bot commented Feb 20, 2021

openshift-bot commented Mar 22, 2021

openshift-ci-robot commented Mar 22, 2021

	if !strings.Contains(check.Name, "kube-service") {
	if !strings.Contains(check.Name, "kubernetes-service") {

add test confirming service network functions from openshift-apiserver pod #25291

add test confirming service network functions from openshift-apiserver pod #25291

Conversation

deads2k commented Jul 16, 2020

sanchezl Jul 17, 2020

Choose a reason for hiding this comment

danwinship Jul 17, 2020

Choose a reason for hiding this comment

deads2k Jul 20, 2020

Choose a reason for hiding this comment

deads2k Jul 20, 2020 • edited

Choose a reason for hiding this comment

danwinship Jul 17, 2020

Choose a reason for hiding this comment

deads2k Jul 20, 2020

Choose a reason for hiding this comment

deads2k commented Jul 20, 2020

deads2k commented Jul 20, 2020

deads2k commented Jul 20, 2020

deads2k commented Jul 20, 2020

deads2k commented Jul 21, 2020

deads2k commented Jul 22, 2020

deads2k commented Aug 10, 2020

deads2k commented Aug 10, 2020

deads2k commented Aug 11, 2020

sanchezl left a comment

Choose a reason for hiding this comment

openshift-ci-robot commented Aug 11, 2020

openshift-bot commented Aug 11, 2020

openshift-ci-robot commented Aug 11, 2020

deads2k commented Aug 12, 2020

openshift-ci-robot commented Sep 2, 2020 • edited

openshift-merge-robot commented Oct 23, 2020

openshift-bot commented Jan 21, 2021

openshift-bot commented Feb 20, 2021

openshift-bot commented Mar 22, 2021

openshift-ci-robot commented Mar 22, 2021

deads2k Jul 20, 2020 •

edited

openshift-ci-robot commented Sep 2, 2020 •

edited