New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor and improve CRD publishing e2e tests in an HA setup #90452
refactor and improve CRD publishing e2e tests in an HA setup #90452
Conversation
|
||
// setupAPIServersProxyPodAndPortForward a convenience method that creates and runs a pod that proxies connections to the API servers. | ||
// It also uses kubectl port-forward to route local connections to that pod. | ||
func setupAPIServersProxyPodAndPortForward(f *framework.Framework) ([]int, func(), error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's unclear to me why running this code inside a pod would improve stability of this test... something running in a pod can have the same issue reaching an arbitrary API server in an HA environment, and this seems to introduce additional ways the test can fail (issues with the pod, with the kubectl port-forward, etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will improve stability because the pod exposes a proxy to all instances (the list is taken from the default endpoint resource) and we check the spec from every replica before running the actual tests.
before we checked a pubic endpoint that was behind a LB - so we didn't know if we reached all replicas.
} | ||
|
||
func getAllAPIServersEndpoint(c k8sclientset.Interface) ([]string, error) { | ||
eps, err := c.CoreV1().Endpoints(metav1.NamespaceDefault).Get(context.TODO(), "kubernetes", metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is going directly to the kubernetes endpoints like this guaranteed to be valid in all conformant environments (remember this is modifying a conformance test)? I vaguely remember different configurations for the kubernetes service (e.g. #13978)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that endpoints for kubernetes service are populated by EndpointReconciler. From what I understand it should be okay as the endpoints are filled by kube API servers unless the endpoint reconciler was turned off. Could you please double-check that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liggitt have you had time to verify that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running without the endpoint reconciler is a supported option (--endpoint-reconciler-type=none
), as is using a node-port for the kubernetes master service (--kubernetes-service-node-port=...
). Switching a conformance test to go directly to endpoints instead of through the kubernetes service doesn't seem proper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect there to be a shim in our test framework that correctly waits for HA configurations (an HA configuration should be able to be shown to be conformant) by getting api endpoints appropriately depending on what the config is. I.e. h := framework.NewAPIEndpointHelper(client); h.WaitForAll(func(<gets a client config>) (bool, error) { do a test with the client to see if x is true })
that handles that.
ukasz do you think you would be willing to articulate a version of the helper that works in each case jordan mentioned and hides this behavior when it's not possible (if no endpoints are set, the test may also need a way to clarify that)?
Everyone is instead doing this poorly, incorrectly, and inconsistently in the existing code - having a correct way to do at least do this for the apiserver is good. I don't think it's acceptable for us to punt on HA apiservers which is the recommended prod configuration of kube to prove that they are conformant in a reasonable way (and medium length sleeps may not be sufficient). We definitely paper over this in other components (stuff running in deployment sized two) - so it would be nice if the underlying code could be adapted to something like the OpenShift ingress tests, or the upgrade tests that verify the data. We still have a need for probalistic checking (i.e. if we're trying to verify a SLB directly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ukasz do you think you would be willing to articulate a version of the helper that works in each case jordan mentioned and hides this behavior when it's not possible (if no endpoints are set, the test may also need a way to clarify that)?
@smarterclayton sure, I can start with a version that understands endpoints. Many thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xref: #100585
Origin-commit: b7f9cc11026358f9eb96e08aa287d44373148d7c
What type of PR is this?
/kind flake
What this PR does / why we need it: makes CRD publishing e2e tests more stable, especially in an HA setup by checking all instances of API server before running the actual test.
Which issue(s) this PR fixes: In our environment with multiple masters these tests are very unstable. It turned out that the tests didn't guarantee that all instances had observed the same version of the spec before running the actual tests.
Fixes #
Special notes for your reviewer: we need to build an image with
socat
and I need to refactorportforward.go
tests to reuse RunKubectlPortForward functionDoes this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: