Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1915079: Disable canary route churn by default #525

Merged

Conversation

sgreene570
Copy link
Contributor

canary: Disable canary route rotation by default

Forcing the router to reload every 5 minutes for canary testing has a
heavy performance impact for larger clusters. By default, disable the
canary route rotation logic. Canary route rotation can be enabled for
internal use or debugging by setting the new
"ingress.operator.openshift.io/rotate-canary-route" annotation to
"true" on the default ingress controller.

pkg/operator/controller/canary/controller.go:
Add a mutex to the canary controller's reconciler type
to make the new reconciler enableCanaryRouteRotation field
go-routine safe. Set this field to true when the default ingress
controller has the new canary route rotation annotation
set to true. Use this field in the canary check polling loop
to determine if the canary route endpoint should be rotated.


test/e2e: Add canary route rotation annotation test

test/e2e/canary_test.go: Add an e2e test to verify
that annotating the default ingress controller with the
canary route rotation annotation works as intended.

@openshift-ci-robot
Copy link
Contributor

@sgreene570: This pull request references Bugzilla bug 1915079, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.0) matches configured target release for branch (4.7.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1915079: Disable canary route churn by default

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 11, 2021
@sgreene570
Copy link
Contributor Author

must-gather failed
/test e2e-aws-operator

test/e2e/canary_test.go Outdated Show resolved Hide resolved
@sgreene570
Copy link
Contributor Author

/retest

@sgreene570
Copy link
Contributor Author

/test e2e-aws-operator

@sgreene570
Copy link
Contributor Author

/test e2e-aws

@sgreene570
Copy link
Contributor Author

/test e2e-aws-operator

Copy link
Contributor

@Miciah Miciah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold
I made a few small suggestions, but feel free to release the hold and let it merge as is (assuming CI starts passing).

pkg/operator/controller/canary/controller.go Outdated Show resolved Hide resolved
pkg/operator/controller/canary/controller.go Outdated Show resolved Hide resolved
pkg/operator/controller/canary/controller.go Outdated Show resolved Hide resolved
pkg/operator/controller/canary/controller.go Outdated Show resolved Hide resolved
test/e2e/canary_test.go Outdated Show resolved Hide resolved
test/e2e/canary_test.go Show resolved Hide resolved
@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 13, 2021
@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 13, 2021
@Miciah
Copy link
Contributor

Miciah commented Jan 13, 2021

/retest

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jan 13, 2021
@sgreene570
Copy link
Contributor Author

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 13, 2021
@sgreene570
Copy link
Contributor Author

sgreene570 commented Jan 13, 2021

@Miciah I pushed fixes for your suggestions despite CI being green since recently merged changes to this repo would have triggered retests anyways.

PTAL when you can.

@@ -44,6 +45,12 @@ const (
// canaryCheckFailureCount is how many successive failing canary checks should
// be observed before the default ingress controller goes degraded.
canaryCheckFailureCount = 5

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a user-facing annotation, should this annoation be documented elsewhere? Also, please add that the user must set the value as true or false, i.e. the annotation implies false if its value is not set to true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I enhanced the go-doc per your request.
We don't expect users to want to opt-in to this functionality given the performance impact on the default cluster routes, so I think the documentation here is sufficient. This annotation would be useful for debugging a customer case via bugzilla or our support engineers, or for internal debugging / testing.

// from the default ingress controller.
ic := &operatorv1.IngressController{}
if err := r.client.Get(context.TODO(), request.NamespacedName, ic); err != nil {
errors = append(errors, fmt.Errorf("failed to get ingress controller %s: %v", ic.Name, err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ic.Name necessarily set if the get failed? (Doesn't really matter much given that there's only one ingresscontroller that it could be.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im going to assume it does not get set......fixed.
Thanks!

test/e2e/canary_test.go: Add an e2e test to verify
that annotating the default ingress controller with the
canary route rotation annotation works as intended.
Forcing the router to reload every 5 minutes for canary testing has a
heavy performance impact for larger clusters. By default, disable the
canary route rotation logic. Canary route rotation can be enabled for
internal use or debugging by setting the new
"ingress.operator.openshift.io/rotate-canary-route" annotation to
"true" on the default ingress controller.

pkg/operator/controller/canary/controller.go:
Add a mutex to the canary controller's reconciler type
to make the new reconciler `enableCanaryRouteRotation` field
go-routine safe. Set this field to true when the default ingress
controller has the new  canary route rotation annotation
set to true. Use this field in the canary check polling loop
to determine if the canary route endpoint should be rotated.
@Miciah
Copy link
Contributor

Miciah commented Jan 13, 2021

Latest changes look good. Thanks!
/lgtm
/hold
in case @frobware wants to review.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 13, 2021
@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 13, 2021
Copy link
Contributor

@frobware frobware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Minor nits/observations but nothing to hold LGTM for.

}

func setDefaultIngressControllerRotationAnnotation(t *testing.T, val bool) error {
t.Helper()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary given we only appear to access t for logging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Miciah suggested adding this:

You can call t.Helper() at the start of this function to cause any log output that this function generates to have the caller's line number.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was only useful when you want to fail or error. No big deal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log messages will have line numbers, right? The line numbers of the call sites are more specific than line numbers in the helper.

}
// If the canaryRoute and updatedCanaryRoute do not have the same targetPort,
// then the canary route rotation annotation is working.
if cmp.Equal(canaryRoute.Spec.Port.TargetPort, updatedCanaryRoute.Spec.Port.TargetPort) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: collapse the multiple returns to the result of the Equal expression.

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: frobware, Miciah, sgreene570

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [Miciah,frobware,sgreene570]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sgreene570
Copy link
Contributor Author

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 13, 2021
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@sgreene570
Copy link
Contributor Author

hitting VPC limits

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@sgreene570
Copy link
Contributor Author

/test e2e-aws-operator

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

11 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 3bb8ed5 into openshift:master Jan 14, 2021
@openshift-ci-robot
Copy link
Contributor

@sgreene570: All pull requests linked via external trackers have merged:

Bugzilla bug 1915079 has been moved to the MODIFIED state.

In response to this:

Bug 1915079: Disable canary route churn by default

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants