Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1986575: Add e2e test cases for haproxy timeout api fields, and reject negative timeout values #644

Merged

Conversation

rfredette
Copy link
Contributor

This PR adds 2 tests to the e2e suite:

TestHAProxyTimeouts:

  • Set timeouts for all 6 new timeout fields, and verify that all timeout values are reflected in the router deployment's environment.
  • Most of the values are set to durations that, if converted directly from Duration to string, would be invalid strings to HAProxy. E.g. 90 * time.Second would string-ify as "1m30s", but HAProxy requires it to only use one time unit, such as "90s". This makes sure the durationToHAProxyTimespec function is working as intended.
  • One timeout value is set higher than the maximum HAProxy will allow, to make sure that the timeout is clipped to HAProxy's max.

TestHAProxyTimeoutsRejection:

  • Sets timeouts for all 6 new timeout fields that are allowed by the API, but are not actually valid timeout periods, like 0 or negative values. Verify that none of the environment variables are set, forcing the router pods to use the default timeouts.

In addition to the tests, this PR also includes a change to reject timeout values that are zero or less, instead of just rejecting zero-length timeouts. The router image will fall back to using default timeouts if negative ones are set, but rejecting them before they are set in the deployment environment is more obvious to the user.

@openshift-ci openshift-ci bot added bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Aug 17, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 17, 2021

@rfredette: This pull request references Bugzilla bug 1986575, which is invalid:

  • expected the bug to target the "4.9.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1986575: Add e2e test cases for haproxy timeout api fields, and reject negative timeout values

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 17, 2021
@rfredette
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 17, 2021

@rfredette: An error was encountered querying GitHub for users with public email (hongli@redhat.com) for bug 1986575 on the Bugzilla server at https://bugzilla.redhat.com. No known errors were detected, please see the full error message for details.

Full error message. non-200 OK status code: 403 Forbidden body: "{\n \"documentation_url\": \"https://docs.github.com/en/free-pro-team@latest/rest/overview/resources-in-the-rest-api#secondary-rate-limits\",\n \"message\": \"You have exceeded a secondary rate limit. Please wait a few minutes before you try again.\"\n}\n"

Please contact an administrator to resolve this issue, then request a bug refresh with /bugzilla refresh.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rfredette
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci openshift-ci bot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Aug 17, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 17, 2021

@rfredette: This pull request references Bugzilla bug 1986575, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.9.0) matches configured target release for branch (4.9.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @lihongan

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from lihongan August 17, 2021 19:06
@@ -521,22 +521,22 @@ func desiredRouterDeployment(ci *operatorv1.IngressController, ingressController
}
env = append(env, corev1.EnvVar{Name: RouterHAProxyThreadsEnvName, Value: strconv.Itoa(threads)})

if ci.Spec.TuningOptions.ClientTimeout != nil && ci.Spec.TuningOptions.ClientTimeout.Duration != 0*time.Second {
if ci.Spec.TuningOptions.ClientTimeout != nil && ci.Spec.TuningOptions.ClientTimeout.Duration > 0*time.Second {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if Spec.TuningOptions is nil? Is that possible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if Spec.TuningOptions is nil? Is that possible?

It isn't possible. Spec.TuningOptions is not a pointer type.

t.Errorf("expected %s = %q, got %q", envVar.Name, tlsInspectDelayOutput, envVar.Value)
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be best to actually see these show up in the haproxy.config file with the values that you specified. You would need to scan a router pod's haproxy.config for the timeout values that should be there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I will add checks for the values in haproxy's config.

Comment on lines +48 to +59
conditions := []operatorv1.OperatorCondition{
{Type: operatorv1.IngressControllerAvailableConditionType, Status: operatorv1.ConditionTrue},
{Type: operatorv1.LoadBalancerManagedIngressConditionType, Status: operatorv1.ConditionFalse},
{Type: operatorv1.DNSManagedIngressConditionType, Status: operatorv1.ConditionFalse},
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use IngressControllerAvailableConditionType.

Comment on lines 136 to 146
case "ROUTER_DEFAULT_CLIENT_TIMEOUT":
fallthrough
case "ROUTER_CLIENT_FIN_TIMEOUT":
fallthrough
case "ROUTER_DEFAULT_SERVER_TIMEOUT":
fallthrough
case "ROUTER_DEFAULT_SERVER_FIN_TIMEOUT":
fallthrough
case "ROUTER_DEFAULT_TUNNEL_TIMEOUT":
fallthrough
case "ROUTER_INSPECT_DELAY":
Copy link
Contributor

@Miciah Miciah Aug 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case "ROUTER_DEFAULT_CLIENT_TIMEOUT":
fallthrough
case "ROUTER_CLIENT_FIN_TIMEOUT":
fallthrough
case "ROUTER_DEFAULT_SERVER_TIMEOUT":
fallthrough
case "ROUTER_DEFAULT_SERVER_FIN_TIMEOUT":
fallthrough
case "ROUTER_DEFAULT_TUNNEL_TIMEOUT":
fallthrough
case "ROUTER_INSPECT_DELAY":
case "ROUTER_DEFAULT_CLIENT_TIMEOUT", "ROUTER_CLIENT_FIN_TIMEOUT", "ROUTER_DEFAULT_SERVER_TIMEOUT", "ROUTER_DEFAULT_SERVER_FIN_TIMEOUT", "ROUTER_DEFAULT_TUNNEL_TIMEOUT", "ROUTER_INSPECT_DELAY":

Edit: Ignore this suggestion if you're replacing this logic with a check for the settings in haproxy.config.

Copy link
Contributor

@Miciah Miciah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few spots we can tighten up the code or stop the test early where later steps depend on an earlier step that has failed. Otherwise, looks good. I look forward to using podExec in other E2E tests!

return false, nil
})
if pollErr != nil {
t.Errorf("Router pod %s failed to become ready: %v", routerPod.Name, pollErr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
t.Errorf("Router pod %s failed to become ready: %v", routerPod.Name, pollErr)
t.Fatalf("Router pod %s failed to become ready: %v", routerPod.Name, pollErr)

Comment on lines +196 to +284
t.Errorf("Error executing %s: %v", strings.Join(cmd, " "), err)
t.Errorf("stderr: %v", stderr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
t.Errorf("Error executing %s: %v", strings.Join(cmd, " "), err)
t.Errorf("stderr: %v", stderr)
t.Errorf("Error executing %s: %v", strings.Join(cmd, " "), err)
t.Errorf("stderr: %v", stderr)
continue

Comment on lines 216 to 316
}
values := strings.Split(strings.TrimSpace(stdout.String()), "\n")
// tcp-request inspect-delay is set in 2 places, but both should match
if len(values) != 2 {
t.Errorf("Expected 2 instances of \"tcp-request inspect-delay\", got %v", len(values))
}
inspectDelayDefault := "5s"
if strings.TrimSpace(values[0]) != inspectDelayDefault ||
strings.TrimSpace(values[1]) != inspectDelayDefault {
t.Errorf("Expected value for \"tcp-request inspect-delay\" to be %v, got %v", []string{"5s", "5s"}, values)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
values := strings.Split(strings.TrimSpace(stdout.String()), "\n")
// tcp-request inspect-delay is set in 2 places, but both should match
if len(values) != 2 {
t.Errorf("Expected 2 instances of \"tcp-request inspect-delay\", got %v", len(values))
}
inspectDelayDefault := "5s"
if strings.TrimSpace(values[0]) != inspectDelayDefault ||
strings.TrimSpace(values[1]) != inspectDelayDefault {
t.Errorf("Expected value for \"tcp-request inspect-delay\" to be %v, got %v", []string{"5s", "5s"}, values)
}
} else {
values := strings.Split(strings.TrimSpace(stdout.String()), "\n")
// tcp-request inspect-delay is set in 2 places, but both should match
if len(values) != 2 {
t.Errorf("Expected 2 instances of \"tcp-request inspect-delay\", got %v", len(values))
} else {
inspectDelayDefault := "5s"
if strings.TrimSpace(values[0]) != inspectDelayDefault ||
strings.TrimSpace(values[1]) != inspectDelayDefault {
t.Errorf(`Expected value for "tcp-request inspect-delay" to be %v, got %v`, []string{"5s", "5s"}, values)
}
}
}

test/e2e/util.go Outdated
Comment on lines 219 to 226
err = exec.Stream(remotecommand.StreamOptions{
Stdout: stdout,
Stderr: stderr,
})
if err != nil {
return err
}
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err = exec.Stream(remotecommand.StreamOptions{
Stdout: stdout,
Stderr: stderr,
})
if err != nil {
return err
}
return nil
return exec.Stream(remotecommand.StreamOptions{
Stdout: stdout,
Stderr: stderr,
})

Copy link
Contributor

@Miciah Miciah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, just noticed a couple more small things that should be addressed.

}
pollErr := wait.PollImmediate(2*time.Second, 5*time.Minute, func() (bool, error) {
if err := kclient.List(context.TODO(), podList, client.InNamespace(deployment.Namespace), client.MatchingLabels(labels)); err != nil {
t.Errorf("failed to list pods for ingress controllers %s: %v", ic.Name, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using t.Errorf isn't quite the right thing to do here because it won't terminate the polling loop, but it will cause the test to fail even if the polling loop succeeds on the next try. Better to log and return to retry immediately:

Suggested change
t.Errorf("failed to list pods for ingress controllers %s: %v", ic.Name, err)
t.Logf("failed to list pods for ingress controllers %s: %v", ic.Name, err)
return false, nil

t.Errorf("failed to list pods for ingress controllers %s: %v", ic.Name, err)
}

routerPod = podList.Items[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The List could have successfully listed 0 pods:

Suggested change
routerPod = podList.Items[0]
if len(podList.Items) == 0 {
t.Logf("failed to find any pods for ingress controllers %s", ic.Name)
return false, nil
}
routerPod = podList.Items[0]

Comment on lines 235 to 250
t.Errorf("failed to list pods for ingress controllers %s: %v", ic.Name, err)
}

routerPod = podList.Items[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My previous two comments apply here as well.

…gative timeout values

HAProxy won't accept timeouts with negative values, so if one makes it
into the ingresscontroller config, discard it and use the default value
@rfredette
Copy link
Contributor Author

Thanks @Miciah! I've incorporated your suggested changes.

@Miciah
Copy link
Contributor

Miciah commented Aug 25, 2021

/lgtm
No remaining issues from my side.
/hold
@candita, I'm adding a hold in case you want to give this another look now that Ryan has added the haproxy.config checks.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 25, 2021
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 25, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 25, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Miciah, rfredette

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Miciah
Copy link
Contributor

Miciah commented Aug 27, 2021

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 27, 2021
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

22 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 28, 2021

@rfredette: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-single-node 855e6de link /test e2e-aws-single-node

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit fbd8fcf into openshift:master Aug 28, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 28, 2021

@rfredette: All pull requests linked via external trackers have merged:

Bugzilla bug 1986575 has been moved to the MODIFIED state.

In response to this:

Bug 1986575: Add e2e test cases for haproxy timeout api fields, and reject negative timeout values

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants