Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1905100: Add "ingress.operator.openshift.io/hard-stop-after" annotation #522

Merged

Conversation

frobware
Copy link
Contributor

@frobware frobware commented Jan 8, 2021

Annotating either an ingresscontroller or the ingress config with this
new annoation will redeploy the router and that will configure haproxy
to emit the haproxy "hard-stop-after" global option.

An ingresscontroller with a valid annotation set will override
ingresses.config/cluster (if set).

Examples:

Annotating the ingress config:

$ oc annotate ingresses.config/cluster ingress.operator.openshift.io/hard-stop-after=1h

Annotating the "default" ingresscontroller:

$ oc -n openshift-ingress-operator annotate ingresscontrollers/default ingress.operator.openshift.io/hard-stop-after=30m

@openshift-ci-robot openshift-ci-robot added bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Jan 8, 2021
@openshift-ci-robot
Copy link
Contributor

@frobware: This pull request references Bugzilla bug 1905100, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.0) matches configured target release for branch (4.7.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1905100: Add "ingress.operator.openshift.io/hard-stop-after" annotation

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 8, 2021
@frobware
Copy link
Contributor Author

frobware commented Jan 8, 2021

/hold

Will push some e2e tests.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 8, 2021
@frobware frobware force-pushed the add-no-hard-stop-after-option branch 2 times, most recently from 07f5f0a to d33069a Compare January 8, 2021 13:23
@frobware frobware force-pushed the add-no-hard-stop-after-option branch 2 times, most recently from 9bc4ee7 to d2d7144 Compare January 8, 2021 17:35
Copy link
Contributor

@Miciah Miciah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some very minor suggestions Overall, it looks good.

pkg/operator/controller/ingress/deployment.go Outdated Show resolved Hide resolved
pkg/operator/controller/ingress/deployment.go Outdated Show resolved Hide resolved
pkg/operator/controller/ingress/deployment.go Outdated Show resolved Hide resolved
Annotating either an ingresscontroller or the ingress config with this
new annoation will redeploy the router and that will configure haproxy
to emit the haproxy "hard-stop-after" global option.

An ingresscontroller with a valid annotation set will override
ingresses.config/cluster (if set).

Examples:

Annotating the ingress config:

    $ oc annotate ingresses.config/cluster ingress.operator.openshift.io/hard-stop-after=1h

Annotating the "default" ingresscontroller:

    $ oc -n openshift-ingress-operator annotate ingresscontrollers/default ingress.operator.openshift.io/hard-stop-after=30m
@frobware frobware force-pushed the add-no-hard-stop-after-option branch from d2d7144 to 7b7327f Compare January 8, 2021 18:12
@frobware
Copy link
Contributor Author

/hold cancel

I pushed some tests. This is ready for a review.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 11, 2021
test/e2e/hard_stop_after_test.go Outdated Show resolved Hide resolved
test/e2e/hard_stop_after_test.go Show resolved Hide resolved
test/e2e/hard_stop_after_test.go Outdated Show resolved Hide resolved
@frobware frobware force-pushed the add-no-hard-stop-after-option branch from 8f657e8 to af74ad7 Compare January 11, 2021 18:54
@@ -623,6 +649,12 @@ func desiredRouterDeployment(ci *operatorv1.IngressController, ingressController
env = append(env, corev1.EnvVar{Name: RouterDisableHTTP2EnvName, Value: "true"})
}

if enabled, value := HardStopAfterIsEnabled(ci, ingressConfig); enabled {
if _, err := time.ParseDuration(value); err != nil {
env = append(env, corev1.EnvVar{Name: RouterHardStopAfterEnvName, Value: value})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time.ParseDuration can't parse things that specify days (ie 1d). We don't expect users to be setting values that high, right? We don't expect folks to set a value using days do we? (In the past this was an issue with the route timeout annotation).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if we fail to parse the annotation should we log that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic is inverted here: We should use value only if time.ParseDuration(value) returns a nil error value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps HardStopAfterIsEnabledByAnnotation should perform the validation. What should happen if ingresses.config/cluster has a valid annotation and then the user adds an invalid annotation to the ingresscontroller? I think the first annotation should remain in effect, but with the current logic, the invalid annotation on the ingresscontroller causes the valid annotation on the ingress.config to be ignored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and added a test; though I will add an additional test that explicitly sets "" as a timeout value - just need to rearrange the test a bit now.

@frobware frobware force-pushed the add-no-hard-stop-after-option branch from af74ad7 to 4386845 Compare January 11, 2021 20:59
@frobware frobware force-pushed the add-no-hard-stop-after-option branch from c16ac2f to 3e1c30d Compare January 11, 2021 21:35
}

// Cleanup
if err := clearHardStopAfterDurationForIngressConfig(t, kclient, hardStopAfterRetryTimeout, ingressConfig); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These cleanups should probably be wrapped in a defer block and moved above any t.Fatalf cases, right?
I think this applies to multiple test cases in this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't hold true in terms of moving them above any t.Fatal calls. It's only at the places that say // cleanup that you could make them into a defer block so I don't know what we would gain. I will remove the comments. If you want these to be rock solid and have no effect on any other test then the better thing to do would be to test against their own controller, config, deployment - but they will take longer to run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if test assertions fail, then we don't need to revert the default ingress controller back to it's original state?
ie https://github.com/openshift/cluster-ingress-operator/pull/522/files#diff-62e23c6a05e53db8113412f4b1533e9f9d4479ac083164f750ded87106d4a78aR59
Just trying to understand how cleanups work for failing tests in the hard stop e2e tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http2 tests seem to follow a similar pattern to this PR, so there's precedence. So maybe we should double check any possible defer improvements in a follow up for all e2e tests in general.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer improvements i

It's not clear to me what a defer improvement would look like. The test has failed. CI will eventually fail, so there's nothing that would aid getting a green run. Mutating the same objects in multiple tests doesn't help things.

Should defer look like:

	defer func() {
		if err := setHardStopAfterDurationForIngressController(t, kclient, hardStopAfterRetryTimeout, "", ic); err != nil {
			t.Fatalf("failed to clear hard-stop-after on ingresscontroller: %v", err)
		}
		if err := waitForHardStopAfterUnsetInAllComponents(kclient, hardStopAfterRetryTimeout, ic, ingressConfig, routerDeployment); err != nil {
			t.Fatalf("some component still has hard-stop-after set: %v", err)
		}
	}()

	if err := hardStopAfterTestIngressController(t, kclient, ic, routerDeployment, (300 * time.Minute).String()); err != nil {
		t.Fatalf("test assertions failed: %v", err)
	}

Or?

	defer func() {
		if err := waitForHardStopAfterUnsetInAllComponents(kclient, hardStopAfterRetryTimeout, ic, ingressConfig, routerDeployment); err != nil {
			t.Fatalf("some component still has hard-stop-after set: %v", err)
		}
	}()

	defer func() {
		if err := setHardStopAfterDurationForIngressController(t, kclient, hardStopAfterRetryTimeout, "", ic); err != nil {
			t.Fatalf("failed to clear hard-stop-after on ingresscontroller: %v", err)
		}
	}()

	if err := hardStopAfterTestIngressController(t, kclient, ic, routerDeployment, (300 * time.Minute).String()); err != nil {
		t.Fatalf("test assertions failed: %v", err)
	}

given that defer runs in LIFO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR, TIL that fatal arranges for all defers to be called. Thank you @sgreene570 .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the former (grouping all cleanup code into one defer block) would make the most sense (if we decide we want to add any defers at all).

@frobware
Copy link
Contributor Author

/retest

@frobware frobware force-pushed the add-no-hard-stop-after-option branch from 7c92728 to f1a580d Compare January 12, 2021 12:29
t.Fatalf("failed to clear hard-stop-after on ingresscontroller: %v", err)
}

// Deployment should reveet back to config value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: reveet -> revert

@sgreene570
Copy link
Contributor

e2e-aws-operator passed
/lgtm in the interest of time.

@frobware
Copy link
Contributor Author

/hold

Need to redo TestRouteHardStopAfterTestZeroLengthDuration as it doesn't exercise setting a "" value, it just clears the annotation by virtue of calling setHardStopAfterDurationForIngressController which is not really the same thing.

@frobware frobware force-pushed the add-no-hard-stop-after-option branch from f1a580d to 31ee8ab Compare January 12, 2021 16:33
@frobware
Copy link
Contributor Author

/hold cancel

@sgreene570 I addressed #522 (comment). PTAL.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 12, 2021
@sgreene570
Copy link
Contributor

/hold cancel

@sgreene570 I addressed #522 (comment). PTAL.

Looks good.
/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 12, 2021
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: frobware, sgreene570

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [frobware,sgreene570]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sgreene570
Copy link
Contributor

operator_test.go:402: failed waiting deployment openshift-ingress-operator/default to scale to 3: failed to achieve expected replicas, last observed: 2
/test e2e-aws-operator

@frobware
Copy link
Contributor Author

/test e2e-aws

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@frobware
Copy link
Contributor Author

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

7 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 54e2164 into openshift:master Jan 13, 2021
@openshift-ci-robot
Copy link
Contributor

@frobware: All pull requests linked via external trackers have merged:

Bugzilla bug 1905100 has been moved to the MODIFIED state.

In response to this:

Bug 1905100: Add "ingress.operator.openshift.io/hard-stop-after" annotation

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@frobware
Copy link
Contributor Author

/cherry-pick release-4.6

@openshift-cherrypick-robot

@frobware: #522 failed to apply on top of branch "release-4.6":

Applying: Add "ingress.operator.openshift.io/hard-stop-after" annotation
Using index info to reconstruct a base tree...
M	pkg/operator/controller/ingress/deployment.go
M	pkg/operator/controller/ingress/deployment_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/operator/controller/ingress/deployment_test.go
CONFLICT (content): Merge conflict in pkg/operator/controller/ingress/deployment_test.go
Auto-merging pkg/operator/controller/ingress/deployment.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Add "ingress.operator.openshift.io/hard-stop-after" annotation
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-4.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@frobware
Copy link
Contributor Author

@frobware: #522 failed to apply on top of branch "release-4.6":

Applying: Add "ingress.operator.openshift.io/hard-stop-after" annotation
Using index info to reconstruct a base tree...
M	pkg/operator/controller/ingress/deployment.go
M	pkg/operator/controller/ingress/deployment_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/operator/controller/ingress/deployment_test.go
CONFLICT (content): Merge conflict in pkg/operator/controller/ingress/deployment_test.go
Auto-merging pkg/operator/controller/ingress/deployment.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Add "ingress.operator.openshift.io/hard-stop-after" annotation
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Manual backport here: #535

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants