Bug 1905100: Add "ingress.operator.openshift.io/hard-stop-after" annotation #522

frobware · 2021-01-08T13:10:44Z

Annotating either an ingresscontroller or the ingress config with this
new annoation will redeploy the router and that will configure haproxy
to emit the haproxy "hard-stop-after" global option.

An ingresscontroller with a valid annotation set will override
ingresses.config/cluster (if set).

Examples:

Annotating the ingress config:

$ oc annotate ingresses.config/cluster ingress.operator.openshift.io/hard-stop-after=1h

Annotating the "default" ingresscontroller:

$ oc -n openshift-ingress-operator annotate ingresscontrollers/default ingress.operator.openshift.io/hard-stop-after=30m

openshift-ci-robot · 2021-01-08T13:10:50Z

@frobware: This pull request references Bugzilla bug 1905100, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.7.0) matches configured target release for branch (4.7.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1905100: Add "ingress.operator.openshift.io/hard-stop-after" annotation

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

frobware · 2021-01-08T13:11:24Z

/hold

Will push some e2e tests.

Miciah

Just some very minor suggestions Overall, it looks good.

pkg/operator/controller/ingress/deployment.go

pkg/operator/controller/ingress/deployment_test.go

Annotating either an ingresscontroller or the ingress config with this new annoation will redeploy the router and that will configure haproxy to emit the haproxy "hard-stop-after" global option. An ingresscontroller with a valid annotation set will override ingresses.config/cluster (if set). Examples: Annotating the ingress config: $ oc annotate ingresses.config/cluster ingress.operator.openshift.io/hard-stop-after=1h Annotating the "default" ingresscontroller: $ oc -n openshift-ingress-operator annotate ingresscontrollers/default ingress.operator.openshift.io/hard-stop-after=30m

frobware · 2021-01-11T16:53:22Z

/hold cancel

I pushed some tests. This is ready for a review.

test/e2e/hard_stop_after_test.go

sgreene570 · 2021-01-11T19:02:28Z

pkg/operator/controller/ingress/deployment.go

@@ -623,6 +649,12 @@ func desiredRouterDeployment(ci *operatorv1.IngressController, ingressController
 		env = append(env, corev1.EnvVar{Name: RouterDisableHTTP2EnvName, Value: "true"})
 	}

+	if enabled, value := HardStopAfterIsEnabled(ci, ingressConfig); enabled {
+		if _, err := time.ParseDuration(value); err != nil {
+			env = append(env, corev1.EnvVar{Name: RouterHardStopAfterEnvName, Value: value})


time.ParseDuration can't parse things that specify days (ie 1d). ~~We don't expect users to be setting values that high, right?~~ We don't expect folks to set a value using days do we? (In the past this was an issue with the route timeout annotation).

Also if we fail to parse the annotation should we log that?

I think the logic is inverted here: We should use value only if time.ParseDuration(value) returns a nil error value.

Perhaps HardStopAfterIsEnabledByAnnotation should perform the validation. What should happen if ingresses.config/cluster has a valid annotation and then the user adds an invalid annotation to the ingresscontroller? I think the first annotation should remain in effect, but with the current logic, the invalid annotation on the ingresscontroller causes the valid annotation on the ingress.config to be ignored.

Done and added a test; though I will add an additional test that explicitly sets "" as a timeout value - just need to rearrange the test a bit now.

pkg/operator/controller/ingress/deployment.go

sgreene570 · 2021-01-11T22:51:07Z

test/e2e/hard_stop_after_test.go

+	}
+
+	// Cleanup
+	if err := clearHardStopAfterDurationForIngressConfig(t, kclient, hardStopAfterRetryTimeout, ingressConfig); err != nil {


These cleanups should probably be wrapped in a defer block and moved above any t.Fatalf cases, right?
I think this applies to multiple test cases in this file.

This doesn't hold true in terms of moving them above any t.Fatal calls. It's only at the places that say // cleanup that you could make them into a defer block so I don't know what we would gain. I will remove the comments. If you want these to be rock solid and have no effect on any other test then the better thing to do would be to test against their own controller, config, deployment - but they will take longer to run.

So if test assertions fail, then we don't need to revert the default ingress controller back to it's original state?
ie https://github.com/openshift/cluster-ingress-operator/pull/522/files#diff-62e23c6a05e53db8113412f4b1533e9f9d4479ac083164f750ded87106d4a78aR59
Just trying to understand how cleanups work for failing tests in the hard stop e2e tests.

http2 tests seem to follow a similar pattern to this PR, so there's precedence. So maybe we should double check any possible defer improvements in a follow up for all e2e tests in general.

defer improvements i

It's not clear to me what a defer improvement would look like. The test has failed. CI will eventually fail, so there's nothing that would aid getting a green run. Mutating the same objects in multiple tests doesn't help things.

Should defer look like:

defer func() { if err := setHardStopAfterDurationForIngressController(t, kclient, hardStopAfterRetryTimeout, "", ic); err != nil { t.Fatalf("failed to clear hard-stop-after on ingresscontroller: %v", err) } if err := waitForHardStopAfterUnsetInAllComponents(kclient, hardStopAfterRetryTimeout, ic, ingressConfig, routerDeployment); err != nil { t.Fatalf("some component still has hard-stop-after set: %v", err) } }() if err := hardStopAfterTestIngressController(t, kclient, ic, routerDeployment, (300 * time.Minute).String()); err != nil { t.Fatalf("test assertions failed: %v", err) }

Or?

defer func() { if err := waitForHardStopAfterUnsetInAllComponents(kclient, hardStopAfterRetryTimeout, ic, ingressConfig, routerDeployment); err != nil { t.Fatalf("some component still has hard-stop-after set: %v", err) } }() defer func() { if err := setHardStopAfterDurationForIngressController(t, kclient, hardStopAfterRetryTimeout, "", ic); err != nil { t.Fatalf("failed to clear hard-stop-after on ingresscontroller: %v", err) } }() if err := hardStopAfterTestIngressController(t, kclient, ic, routerDeployment, (300 * time.Minute).String()); err != nil { t.Fatalf("test assertions failed: %v", err) }

given that defer runs in LIFO.

FTR, TIL that fatal arranges for all defers to be called. Thank you @sgreene570 .

I think the former (grouping all cleanup code into one defer block) would make the most sense (if we decide we want to add any defers at all).

frobware · 2021-01-12T09:00:53Z

/retest

sgreene570 · 2021-01-12T14:55:28Z

test/e2e/hard_stop_after_test.go

+		t.Fatalf("failed to clear hard-stop-after on ingresscontroller: %v", err)
+	}
+
+	// Deployment should reveet back to config value


nit: reveet -> revert

sgreene570 · 2021-01-12T15:05:59Z

e2e-aws-operator passed
/lgtm in the interest of time.

frobware · 2021-01-12T15:18:28Z

/hold

Need to redo TestRouteHardStopAfterTestZeroLengthDuration as it doesn't exercise setting a "" value, it just clears the annotation by virtue of calling setHardStopAfterDurationForIngressController which is not really the same thing.

frobware · 2021-01-12T16:34:56Z

/hold cancel

@sgreene570 I addressed #522 (comment). PTAL.

sgreene570 · 2021-01-12T16:56:20Z

/hold cancel

@sgreene570 I addressed #522 (comment). PTAL.

Looks good.
/lgtm

openshift-ci-robot · 2021-01-12T16:56:38Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: frobware, sgreene570

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [frobware,sgreene570]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sgreene570 · 2021-01-12T17:59:21Z

operator_test.go:402: failed waiting deployment openshift-ingress-operator/default to scale to 3: failed to achieve expected replicas, last observed: 2
/test e2e-aws-operator

frobware · 2021-01-12T18:00:53Z

/test e2e-aws

openshift-bot · 2021-01-12T18:49:09Z

/retest