destroy: return aws resources that could not be deleted #3772

staebler · 2020-06-18T12:30:21Z

The RunWithContext method has been modified to return a slice of ARNs that could not be destroyed. Only the ARNs of the first-level resources will be returned. For example, when deleting a VPC, the uninstaller will first delete other resources that use the VPC. Any of those resources that are blocked from being deleted but are not tagged for the cluster will not show up in the list of blocked resources. However, the first-level VPC will show up as blocked in this case.

This will be used by Hive to expose to the user the resources that cannot be deleted so that the user can take action on those resources.

https://issues.redhat.com/browse/CO-973

staebler · 2020-06-18T12:30:45Z

/hold

This builds on #3765.

staebler · 2020-06-18T15:27:19Z

As suggested in #3765 (comment), the destroyer will now collect--but not attempt to delete--the un-deleted tagged resources when the context expires while trying to delete the EC2 instances.

staebler · 2020-06-19T14:13:05Z

/hold cancel

#3765 has merged.

sdodson · 2020-07-16T18:21:32Z

/assign @patrickdillon @jstuever
PTAL

gregsheremeta · 2020-08-04T17:45:34Z

Hi, we really need someone to take a look at this. @patrickdillon @jstuever do you have some cycles to review?

@sdodson @abhinavdahiya can you assign other reviewers if @patrickdillon and @jstuever can't get to it?

jstuever · 2020-08-07T20:44:24Z

/cc @abhinavdahiya
I invested some time earlier in this PR to try to understand the changes; there is a lot going on. I believe it may change behaviour in terms of what is printed as "Deleted". I think someone who is more familiar with the aws destroy should review it.

jstuever · 2020-09-23T18:48:17Z

I'll take another look at this after code freeze; sorry for the delay.

pkg/destroy/aws/aws.go

abhinavdahiya

I am not in favor of the second level of tracking that is being added here for remaining,
i.e need to delete the instance (that we identified from tag) but that usually requires us to also clean up instance iam profile so we also add that profile resource to the tracker.

I think the best way to provide most information is to just tracker the tagged resources for users to describe remaining, our discovery is best effort and a lot of the times there are other resources that block deletion of the tagged resources that only AWS knows or provides in the error message.

So telling the user VPC deletion is left is all the information that is required for user to go ahead and delete the VPC in case we cannot.

pkg/destroy/aws/aws.go

staebler · 2020-10-08T21:23:29Z

I am not in favor of the second level of tracking that is being added here for remaining,
i.e need to delete the instance (that we identified from tag) but that usually requires us to also clean up instance iam profile so we also add that profile resource to the tracker.

Thanks for the review, @abhinavdahiya. I can live with only returning the tagged resources that could not be deleted rather than also including the next-level resources. I will rework this PR with that in mind.

openshift-ci-robot · 2020-10-10T20:01:46Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign jstuever after the PR has been reviewed.
You can assign the PR to them by writing /assign @jstuever in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

pkg/destroy/aws/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

staebler · 2020-10-10T20:06:03Z

I am not in favor of the second level of tracking that is being added here for remaining,
i.e need to delete the instance (that we identified from tag) but that usually requires us to also clean up instance iam profile so we also add that profile resource to the tracker.

Thanks for the review, @abhinavdahiya. I can live with only returning the tagged resources that could not be deleted rather than also including the next-level resources. I will rework this PR with that in mind.

@abhinavdahiya I've re-written the changes. The logic should be much easier to follow now.

staebler · 2020-10-12T13:52:06Z

pkg/destroy/aws/aws.go

-					nextTagClients = append(nextTagClients, tagClient)
-				} else {
-					o.Logger.Debugf("no deletions from %s, removing client", tagClientNames[tagClient])
+			instancesToDelete, err := o.findEC2Instances(ctx, ec2Client, deleted)


I would like to add a comment about why EC2 instances are special and need to be in the terminated state first before attempting to delete other resources. Can someone shed light on this for me?

This commit message should provide context on why we stop instances before anything else. cf69c1e

Great! Thanks.

abhinavdahiya · 2020-10-13T16:52:19Z

pkg/destroy/aws/aws.go

-	tagClientStack := append([]*resourcegroupstaggingapi.ResourceGroupsTaggingAPI(nil), tagClients...)
+
+	// Terminate EC2 instances.
+	ec2Client := ec2.New(awsSession)


I think this needs a region argument otherwise this might not function correctly..?

aws.NewConfig().WithRegion(region) something like that..

The current code in master does not specify the region. The awsSession should already be configured with the region.

installer/pkg/destroy/aws/aws.go

Line 168 in d91c40b

deleted, err := terminateEC2InstancesByTags(ctx, ec2.New(awsSession), iamClient, o.Filters, o.Logger)

abhinavdahiya · 2020-10-13T18:37:50Z

The change looks good in at high level, the destroy isn't completing currently see e2e-aws exceeding the 4 hour timeout with no deprovision logs.

So i think there is something missing here. maybe https://github.com/openshift/installer/pull/3772/files#discussion_r504111660 ??

The `RunWithContext` method has been modified to return a slice of ARNs that could not be destroyed. Only the ARNs of the first-level resources will be returned. For example, when deleting a VPC, the uninstaller will first delete other resources that use the VPC. Any of those resources that are blocked from being deleted but are not tagged for the cluster will not show up in the list of blocked resources. However, the first-level VPC will show up as blocked in this case. This will be used by Hive to expose to the user the resources that cannot be deleted so that the user can take action on those resources. https://issues.redhat.com/browse/CO-973

openshift-ci-robot · 2020-10-14T03:21:30Z

@staebler: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-crc	`5db7fb7`	link	`/test e2e-crc`
ci/prow/e2e-ovirt	`5db7fb7`	link	`/test e2e-ovirt`
ci/prow/e2e-libvirt	`5db7fb7`	link	`/test e2e-libvirt`
ci/prow/e2e-aws	`5db7fb7`	link	`/test e2e-aws`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

staebler · 2020-10-14T18:25:33Z

The change looks good in at high level, the destroy isn't completing currently see e2e-aws exceeding the 4 hour timeout with no deprovision logs.

So i think there is something missing here. maybe https://github.com/openshift/installer/pull/3772/files#discussion_r504111660 ??

The tests are timing out running e2e-aws-gather-core-dump, before any attempts to deprovision.

staebler · 2020-10-15T03:56:06Z

The change looks good in at high level, the destroy isn't completing currently see e2e-aws exceeding the 4 hour timeout with no deprovision logs.
So i think there is something missing here. maybe https://github.com/openshift/installer/pull/3772/files#discussion_r504111660 ??

The tests are timing out running e2e-aws-gather-core-dump, before any attempts to deprovision.

I think that it is a problem with this PR in particular and not the code in general. This may be because this PR is quite old and is maybe not getting some changes that have been made to the tests in the past couple months. I copied the branch and opened a new PR with the same code (#4270), and the e2e-aws test passes. I am going to close this PR in favor of the new one.

/close

openshift-ci-robot · 2020-10-15T03:56:23Z

@staebler: Closed this PR.

In response to this:

The change looks good in at high level, the destroy isn't completing currently see e2e-aws exceeding the 4 hour timeout with no deprovision logs.
So i think there is something missing here. maybe https://github.com/openshift/installer/pull/3772/files#discussion_r504111660 ??

The tests are timing out running e2e-aws-gather-core-dump, before any attempts to deprovision.

I think that it is a problem with this PR in particular and not the code in general. This may be because this PR is quite old and is maybe not getting some changes that have been made to the tests in the past couple months. I copied the branch and opened a new PR with the same code (#4270), and the e2e-aws test passes. I am going to close this PR in favor of the new one.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot requested review from jstuever and patrickdillon June 18, 2020 12:30

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 18, 2020

staebler force-pushed the aws_destroy_return_blockers branch from 7d5fcb4 to 9465ece Compare June 18, 2020 12:40

staebler mentioned this pull request Jun 19, 2020

provide a means to abandon deprovisioning openshift/hive#1017

Closed

staebler force-pushed the aws_destroy_return_blockers branch from ed89e9c to 464da6b Compare June 19, 2020 14:11

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 19, 2020

openshift-ci-robot assigned jstuever and patrickdillon Jul 16, 2020

openshift-ci-robot requested a review from abhinavdahiya August 7, 2020 20:44

abhinavdahiya reviewed Oct 8, 2020

View reviewed changes

pkg/destroy/aws/aws.go Outdated Show resolved Hide resolved

abhinavdahiya reviewed Oct 8, 2020

View reviewed changes

staebler force-pushed the aws_destroy_return_blockers branch from 464da6b to 5a09a63 Compare October 10, 2020 20:01

staebler commented Oct 12, 2020

View reviewed changes

abhinavdahiya reviewed Oct 13, 2020

View reviewed changes

staebler force-pushed the aws_destroy_return_blockers branch from 5a09a63 to 5db7fb7 Compare October 13, 2020 22:50

openshift-ci-robot closed this Oct 15, 2020

staebler mentioned this pull request Oct 15, 2020

destroy: return aws resources that could not be deleted #4270

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

destroy: return aws resources that could not be deleted #3772

destroy: return aws resources that could not be deleted #3772

staebler commented Jun 18, 2020 •

edited

staebler commented Jun 18, 2020

staebler commented Jun 18, 2020

staebler commented Jun 19, 2020

sdodson commented Jul 16, 2020

gregsheremeta commented Aug 4, 2020

jstuever commented Aug 7, 2020

jstuever commented Sep 23, 2020

abhinavdahiya left a comment

staebler commented Oct 8, 2020

openshift-ci-robot commented Oct 10, 2020

staebler commented Oct 10, 2020

staebler Oct 12, 2020

abhinavdahiya Oct 12, 2020

staebler Oct 12, 2020

abhinavdahiya Oct 13, 2020

staebler Oct 13, 2020

abhinavdahiya commented Oct 13, 2020

openshift-ci-robot commented Oct 14, 2020

staebler commented Oct 14, 2020

staebler commented Oct 15, 2020

openshift-ci-robot commented Oct 15, 2020

destroy: return aws resources that could not be deleted #3772

destroy: return aws resources that could not be deleted #3772

Conversation

staebler commented Jun 18, 2020 • edited

staebler commented Jun 18, 2020

staebler commented Jun 18, 2020

staebler commented Jun 19, 2020

sdodson commented Jul 16, 2020

gregsheremeta commented Aug 4, 2020

jstuever commented Aug 7, 2020

jstuever commented Sep 23, 2020

abhinavdahiya left a comment

Choose a reason for hiding this comment

staebler commented Oct 8, 2020

openshift-ci-robot commented Oct 10, 2020

staebler commented Oct 10, 2020

staebler Oct 12, 2020

Choose a reason for hiding this comment

abhinavdahiya Oct 12, 2020

Choose a reason for hiding this comment

staebler Oct 12, 2020

Choose a reason for hiding this comment

abhinavdahiya Oct 13, 2020

Choose a reason for hiding this comment

staebler Oct 13, 2020

Choose a reason for hiding this comment

abhinavdahiya commented Oct 13, 2020

openshift-ci-robot commented Oct 14, 2020

staebler commented Oct 14, 2020

staebler commented Oct 15, 2020

openshift-ci-robot commented Oct 15, 2020

staebler commented Jun 18, 2020 •

edited