refactor and improve CRD publishing e2e tests in an HA setup #90452

p0lyn0mial · 2020-04-24T12:26:57Z

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:

/kind flake

What this PR does / why we need it: makes CRD publishing e2e tests more stable, especially in an HA setup by checking all instances of API server before running the actual test.

Which issue(s) this PR fixes: In our environment with multiple masters these tests are very unstable. It turned out that the tests didn't guarantee that all instances had observed the same version of the spec before running the actual tests.

Fixes #

Special notes for your reviewer: we need to build an image with socat and I need to refactor portforward.go tests to reuse RunKubectlPortForward function

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

p0lyn0mial · 2020-04-24T12:30:06Z

/assign @roycaihw @liggitt @sttts

liggitt · 2020-04-24T13:11:50Z

test/e2e/apimachinery/crd_publish_openapi.go

+
+// setupAPIServersProxyPodAndPortForward a convenience method that creates and runs a pod that proxies connections to the API servers.
+// It also uses kubectl port-forward to route local connections to that pod.
+func setupAPIServersProxyPodAndPortForward(f *framework.Framework) ([]int, func(), error) {


it's unclear to me why running this code inside a pod would improve stability of this test... something running in a pod can have the same issue reaching an arbitrary API server in an HA environment, and this seems to introduce additional ways the test can fail (issues with the pod, with the kubectl port-forward, etc)

it will improve stability because the pod exposes a proxy to all instances (the list is taken from the default endpoint resource) and we check the spec from every replica before running the actual tests.

before we checked a pubic endpoint that was behind a LB - so we didn't know if we reached all replicas.

liggitt · 2020-04-24T14:00:12Z

test/e2e/apimachinery/crd_publish_openapi.go

+}
+
+func getAllAPIServersEndpoint(c k8sclientset.Interface) ([]string, error) {
+	eps, err := c.CoreV1().Endpoints(metav1.NamespaceDefault).Get(context.TODO(), "kubernetes", metav1.GetOptions{})


is going directly to the kubernetes endpoints like this guaranteed to be valid in all conformant environments (remember this is modifying a conformance test)? I vaguely remember different configurations for the kubernetes service (e.g. #13978)

It seems that endpoints for kubernetes service are populated by EndpointReconciler. From what I understand it should be okay as the endpoints are filled by kube API servers unless the endpoint reconciler was turned off. Could you please double-check that?

@liggitt have you had time to verify that?

Running without the endpoint reconciler is a supported option (--endpoint-reconciler-type=none), as is using a node-port for the kubernetes master service (--kubernetes-service-node-port=...). Switching a conformance test to go directly to endpoints instead of through the kubernetes service doesn't seem proper.

I would expect there to be a shim in our test framework that correctly waits for HA configurations (an HA configuration should be able to be shown to be conformant) by getting api endpoints appropriately depending on what the config is. I.e. h := framework.NewAPIEndpointHelper(client); h.WaitForAll(func(<gets a client config>) (bool, error) { do a test with the client to see if x is true }) that handles that.

ukasz do you think you would be willing to articulate a version of the helper that works in each case jordan mentioned and hides this behavior when it's not possible (if no endpoints are set, the test may also need a way to clarify that)?

Everyone is instead doing this poorly, incorrectly, and inconsistently in the existing code - having a correct way to do at least do this for the apiserver is good. I don't think it's acceptable for us to punt on HA apiservers which is the recommended prod configuration of kube to prove that they are conformant in a reasonable way (and medium length sleeps may not be sufficient). We definitely paper over this in other components (stuff running in deployment sized two) - so it would be nice if the underlying code could be adapted to something like the OpenShift ingress tests, or the upgrade tests that verify the data. We still have a need for probalistic checking (i.e. if we're trying to verify a SLB directly)

ukasz do you think you would be willing to articulate a version of the helper that works in each case jordan mentioned and hides this behavior when it's not possible (if no endpoints are set, the test may also need a way to clarify that)?

@smarterclayton sure, I can start with a version that understands endpoints. Many thanks.

xref: #100585

…s#90452

Origin-commit: b7f9cc11026358f9eb96e08aa287d44373148d7c

…es#90452

refactor/improve CRD publishing e2e tests in an HA setup

a6d6571

k8s-ci-robot requested review from caesarxuchao and logicalhan April 24, 2020 12:28

k8s-ci-robot added area/test sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 24, 2020

k8s-ci-robot assigned liggitt, roycaihw and sttts Apr 24, 2020

liggitt reviewed Apr 24, 2020

View reviewed changes

bazel

36426d7

liggitt reviewed Apr 24, 2020

View reviewed changes

p0lyn0mial added a commit to p0lyn0mial/origin that referenced this pull request Apr 27, 2020

UPSTREAM: <drop>: this comit will be replaced by kubernetes/kubernete…

40156d4

…s#90452

p0lyn0mial mentioned this pull request Apr 27, 2020

Bug 1761043: provides a temporal fix to improve CRD publishing e2e tests in an HA setup openshift/origin#24920

Merged

p0lyn0mial added a commit to p0lyn0mial/origin that referenced this pull request Apr 28, 2020

UPSTREAM: <drop>: this comit will be replaced by kubernetes/kubernete…

b7f9cc1

…s#90452

p0lyn0mial added a commit to p0lyn0mial/origin that referenced this pull request Apr 28, 2020

UPSTREAM: <drop>: this comit will be replaced by kubernetes/kubernete…

c25b7f8

…s#90452

openshift-publish-robot pushed a commit to openshift/kubernetes that referenced this pull request Apr 28, 2020

UPSTREAM: <drop>: this comit will be replaced by kubernetes#90452

8bfb789

Origin-commit: b7f9cc11026358f9eb96e08aa287d44373148d7c

p0lyn0mial added a commit to p0lyn0mial/origin that referenced this pull request Apr 29, 2020

UPSTREAM: <drop>: this commit will be replaced by kubernetes/kubernet…

5e30bf1

…es#90452

p0lyn0mial added a commit to p0lyn0mial/origin that referenced this pull request Apr 29, 2020

UPSTREAM: <drop>: this commit will be replaced by kubernetes/kubernet…

817b776

…es#90452

p0lyn0mial added a commit to p0lyn0mial/origin that referenced this pull request Apr 29, 2020

UPSTREAM: <drop>: this commit will be replaced by kubernetes/kubernet…

4199c29

…es#90452

p0lyn0mial added a commit to p0lyn0mial/kubernetes that referenced this pull request Apr 29, 2020

UPSTREAM: <drop>: this commit will be replaced by kubernetes#90452

0ad4d9b

p0lyn0mial mentioned this pull request Apr 29, 2020

UPSTREAM: <drop>: this commit will be replaced by https://github.com/… openshift/kubernetes#124

Merged

This was referenced Dec 6, 2021

Rebase 1.23-rc.1 openshift/kubernetes#1083

Closed

[wip] Rebase 1.23.0 ga openshift/kubernetes#1086

Closed

Bug 2033751: Kube 1.23.0 rebase openshift/kubernetes#1087

Merged

openshift-ci-robot mentioned this pull request Dec 20, 2021

Bug 1939552: unstable CustomResourcePublishOpenAPI tests openshift/kubernetes#1095

Closed

This was referenced Apr 20, 2022

[WIP] update to k8s(v1.24.0-rc.0) openshift/kubernetes#1248

Closed

Fix "GCE L4 load balancer: enable migration of Instance Group management out of K/K." openshift/kubernetes#1249

Closed

openshift-ci-robot mentioned this pull request May 4, 2022

Bug 2086092: update kube to v1.24.0 openshift/kubernetes#1252

Merged

This was referenced May 25, 2022

Sdn 4.11 kubernetes 1.24 openshift/kubernetes#1275

Closed

Sdn 4.11 kubernetes 1.24.0 openshift/kubernetes#1276

Closed

This was referenced Aug 19, 2022

[WIP] k8s 1.25.0-rc.1 openshift/kubernetes#1348

Closed

WIP: Rebase 1.25 w/ etcd health check reverted openshift/kubernetes#1355

Closed

This was referenced Aug 26, 2022

WIP: Rebase 1.25 openshift/kubernetes#1357

Closed

k8s 1.25.0 openshift/kubernetes#1360

Merged

openshift-ci-robot mentioned this pull request Sep 16, 2022

[WIP] 1.25 rebase openshift/kubernetes#1367

Closed

openshift-ci-robot mentioned this pull request Dec 12, 2022

OCPBUGS-6030: Rebase onto kube v1.26 openshift/kubernetes#1432

Merged

This was referenced Dec 20, 2022

[DO_NOT_MERGE] Test openshift/kubernetes#1444

Closed

[DO_NOT_MERGE] Rebase 1.26.0 openshift/kubernetes#1445

Closed

openshift-ci-robot mentioned this pull request Jan 3, 2023

[DO_NOT_MERGE] Rebase 1.26.0 openshift/kubernetes#1447

Closed

This was referenced Mar 27, 2023

WIP: Rebase 1.27.1 openshift/kubernetes#1524

Closed

WIP: inject detach error openshift/kubernetes#1523

Closed

WIP: tmp rebase k8s 1.27.0 openshift/kubernetes#1528

Closed

This was referenced Apr 4, 2023

WIP: test 4.14 images openshift/kubernetes#1532

Closed

WIP: Rebase CI test openshift/kubernetes#1539

Closed

openshift-ci-robot mentioned this pull request Apr 14, 2023

[WIP] Debug panic openshift/kubernetes#1546

Closed

openshift-ci-robot mentioned this pull request Apr 27, 2023

STOR-1263: Update to Kubernetes 1.27.1 openshift/kubernetes#1558

Merged

openshift-ci-robot mentioned this pull request Jun 13, 2023

WIP: testing CI openshift/kubernetes#1602

Closed

This was referenced Jun 20, 2023

ocp next openshift/kubernetes#1612

Closed

DO NOT MERGE: Update to tip of kubernetes/kubernetes openshift/kubernetes#1613

Closed

[WIP] Update to tip of kubernetes/kubernetes openshift/kubernetes#1614

Closed

openshift-ci-robot mentioned this pull request Jul 24, 2023

STOR-1425: Update to Kubernetes 1.28.1 openshift/kubernetes#1646

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor and improve CRD publishing e2e tests in an HA setup #90452

refactor and improve CRD publishing e2e tests in an HA setup #90452

p0lyn0mial commented Apr 24, 2020

p0lyn0mial commented Apr 24, 2020

liggitt Apr 24, 2020

p0lyn0mial Apr 24, 2020 •

edited

liggitt Apr 24, 2020

p0lyn0mial May 5, 2020

p0lyn0mial May 28, 2020

liggitt Jul 15, 2020

smarterclayton Mar 22, 2021

p0lyn0mial Mar 23, 2021

p0lyn0mial Mar 26, 2021 •

edited

refactor and improve CRD publishing e2e tests in an HA setup #90452

refactor and improve CRD publishing e2e tests in an HA setup #90452

Conversation

p0lyn0mial commented Apr 24, 2020

p0lyn0mial commented Apr 24, 2020

liggitt Apr 24, 2020

Choose a reason for hiding this comment

p0lyn0mial Apr 24, 2020 • edited

Choose a reason for hiding this comment

liggitt Apr 24, 2020

Choose a reason for hiding this comment

p0lyn0mial May 5, 2020

Choose a reason for hiding this comment

p0lyn0mial May 28, 2020

Choose a reason for hiding this comment

liggitt Jul 15, 2020

Choose a reason for hiding this comment

smarterclayton Mar 22, 2021

Choose a reason for hiding this comment

p0lyn0mial Mar 23, 2021

Choose a reason for hiding this comment

p0lyn0mial Mar 26, 2021 • edited

Choose a reason for hiding this comment

p0lyn0mial Apr 24, 2020 •

edited

p0lyn0mial Mar 26, 2021 •

edited