destroy/gcp: bubble up errors after 5 minutes #3749

jstuever · 2020-06-12T20:46:34Z

This change will allow errors to become visible to users using log-level
of warn or higher. It continues to log errors encountered while deleting
cloud resources as DEBUG while escalating the errors to WARN once every
5 minutes.

This change will allow errors to become visible to users using log-level of warn or higher. It continues to log errors encountered while deleting cloud resources as DEBUG while escalating the errors to WARN once every 5 minutes.

jstuever · 2020-06-15T15:59:43Z

/retest
/test e2e-gcp

abhinavdahiya · 2020-06-15T19:36:16Z

pkg/destroy/gcp/address.go

 	for _, item := range items {
 		err := o.deleteAddress(item)
 		if err != nil {
-			errs = append(errs, err)
+			o.errorTracker.suppressWarning(item.key, err, o.Logger)


why do we have to suppress the warning in each handler instead of handling it at the caller sites?

We don't know the "item.key" at the caller sites... so we have no way of knowing what object the error is related to. We can't use the error itself (as we found in AWS), because the error messages are sometimes dynamic and some errors will never bubble up. Thus why we have to know which object the error is related to. As a result, we aren't really bubbling up a specific error. Instead, we are bubbling up "this object has seen errors".

What would it take to bubble up right error messages so that we can suppress the errors if we have to at higher level instead of inside each handler

We could refactor such that we have a single list of items to delete by SelfLink which includes type, and loop through those with a switch based on type that calls the appropriate delete function. The delete function could then just return an error and the main loop could handle the suppression. We would have to expand our discovery phase to populate that list. That sounds like a lot of work.

why would we have to include a new discovery.. the delete knows which items was being deleted.. would adding context about the item being deleted when returning error not be enough..

The way it was before, we were returning aggregated errors as a single error. Breaking that down into individual components would require significant work as well.

Just as a way to think through, how would we go about changing the code so that we didn't have to suppress individually and can actually do this handling at higher level outside these runnners? So can we can shape follow-up to improve it.

abhinavdahiya · 2020-06-26T17:20:53Z

/approve
/lgtm

openshift-ci-robot · 2020-06-26T17:21:24Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/destroy/gcp/OWNERS~~ [abhinavdahiya]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2020-06-26T17:21:50Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-06-26T19:18:48Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-ci-robot · 2020-06-26T20:41:13Z

@jstuever: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-openstack	`613f609`	link	`/test e2e-openstack`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-bot · 2020-06-26T20:50:05Z

/retest

Please review the full test history for this PR and help us cut down flakes.

destroy/gcp: bubble up errors after 5 minutes

613f609

This change will allow errors to become visible to users using log-level of warn or higher. It continues to log errors encountered while deleting cloud resources as DEBUG while escalating the errors to WARN once every 5 minutes.

openshift-ci-robot requested review from jhixson74 and patrickdillon June 12, 2020 20:46

abhinavdahiya reviewed Jun 15, 2020

View reviewed changes

jstuever requested a review from abhinavdahiya June 19, 2020 16:55

openshift-ci-robot assigned abhinavdahiya Jun 26, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 26, 2020

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 26, 2020

openshift-merge-robot merged commit 9ff752b into openshift:master Jun 26, 2020

jstuever deleted the cors1380 branch November 11, 2020 21:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

destroy/gcp: bubble up errors after 5 minutes #3749

destroy/gcp: bubble up errors after 5 minutes #3749

jstuever commented Jun 12, 2020

jstuever commented Jun 15, 2020

abhinavdahiya Jun 15, 2020

jstuever Jun 17, 2020

abhinavdahiya Jun 22, 2020

jstuever Jun 22, 2020

abhinavdahiya Jun 22, 2020

jstuever Jun 22, 2020

abhinavdahiya Jun 26, 2020

abhinavdahiya commented Jun 26, 2020

openshift-ci-robot commented Jun 26, 2020

openshift-bot commented Jun 26, 2020

openshift-bot commented Jun 26, 2020

openshift-ci-robot commented Jun 26, 2020 •

edited

Loading

openshift-bot commented Jun 26, 2020

destroy/gcp: bubble up errors after 5 minutes #3749

destroy/gcp: bubble up errors after 5 minutes #3749

Conversation

jstuever commented Jun 12, 2020

jstuever commented Jun 15, 2020

abhinavdahiya Jun 15, 2020

Choose a reason for hiding this comment

jstuever Jun 17, 2020

Choose a reason for hiding this comment

abhinavdahiya Jun 22, 2020

Choose a reason for hiding this comment

jstuever Jun 22, 2020

Choose a reason for hiding this comment

abhinavdahiya Jun 22, 2020

Choose a reason for hiding this comment

jstuever Jun 22, 2020

Choose a reason for hiding this comment

abhinavdahiya Jun 26, 2020

Choose a reason for hiding this comment

abhinavdahiya commented Jun 26, 2020

openshift-ci-robot commented Jun 26, 2020

openshift-bot commented Jun 26, 2020

openshift-bot commented Jun 26, 2020

openshift-ci-robot commented Jun 26, 2020 • edited Loading

openshift-bot commented Jun 26, 2020

openshift-ci-robot commented Jun 26, 2020 •

edited

Loading