Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e: fix router metrics test flake on AWS #24085

Merged
merged 1 commit into from
Nov 4, 2019

Conversation

ironcladlou
Copy link
Contributor

On AWS, the default router speaks PROXY protocol. The fix in
#24075 switched some router tests to
(correctly) speak to the router directly. However, the fix did not update client
code to speak PROXY to the router on AWS. The tests still sometimes pass on AWS
by coincidence (as other non-test clients send traffic to the router through the
LB, causing router stats to sometimes match test expectations.)

Fix the tests so that test clients talking to routers use PROXY protocol on AWS.

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 4, 2019
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 4, 2019
@ironcladlou
Copy link
Contributor Author

/test e2e-aws

@Miciah
Copy link
Contributor

Miciah commented Nov 4, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 4, 2019
@smarterclayton
Copy link
Contributor

/retest

@ironcladlou
Copy link
Contributor Author

ironcladlou commented Nov 4, 2019

The GCP errors are something else:

Flaky tests:

[Feature:Prometheus][Conformance] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog [Suite:openshift/conformance/parallel/minimal]
[sig-scheduling] Multi-AZ Clusters should spread the pods of a service across zones [Suite:openshift/conformance/parallel] [Suite:k8s]

Failing tests:

[Feature:Builds][Conformance] oc new-app  should succeed with a --name of 58 characters [Suite:openshift/conformance/parallel/minimal]

The AWS failure (https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/24085/pull-ci-openshift-origin-master-e2e-aws/13324) shows a bug in my fix here — the weighted.go test shouldn't use PROXY because a custom router pod is used for the test.

I've fixed that bug.

/retest

On AWS, the default router speaks PROXY protocol. The fix in
openshift#24075 switched some router tests to
(correctly) speak to the router directly. However, the fix did not update client
code to speak PROXY to the router on AWS. The tests still sometimes pass on AWS
by coincidence (as other non-test clients send traffic to the router through the
LB, causing router stats to sometimes match test expectations.)

Fix the tests so that test clients talking to routers use PROXY protocol on AWS.
@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 4, 2019
@ironcladlou
Copy link
Contributor Author

/test e2e-aws

@Miciah
Copy link
Contributor

Miciah commented Nov 4, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 4, 2019
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ironcladlou, Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

func expectRouteStatusCodeRepeatedExec(ns, execPodName, url, host string, statusCode int, times int, proxy bool) error {
var extraArgs []string
if proxy {
extraArgs = append(extraArgs, "--haproxy-protocol")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This argument isn't in the version of curl bundled in UBI...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's so old though... curl/curl@6baeb6d

In my dev clusters, the image that gets used is gcr.io/kubernetes-e2e-test-images/agnhost:2.6 which has curl 7.61.1 (x86_64-alpine-linux-musl) libcurl/7.61.1 LibreSSL/2.5.5 zlib/1.2.11 libssh2/1.8.2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton
Copy link
Contributor

/retest

How did AWS pass then?

@hexfusion
Copy link
Contributor

/test e2e-gcp-upgrade

@smarterclayton
Copy link
Contributor

Force merging to unblock queues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants