-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
destroy: return aws resources that could not be deleted #3772
Conversation
/hold This builds on #3765. |
7d5fcb4
to
9465ece
Compare
As suggested in #3765 (comment), the destroyer will now collect--but not attempt to delete--the un-deleted tagged resources when the context expires while trying to delete the EC2 instances. |
ed89e9c
to
464da6b
Compare
/hold cancel #3765 has merged. |
/assign @patrickdillon @jstuever |
Hi, we really need someone to take a look at this. @patrickdillon @jstuever do you have some cycles to review? @sdodson @abhinavdahiya can you assign other reviewers if @patrickdillon and @jstuever can't get to it? |
/cc @abhinavdahiya |
I'll take another look at this after code freeze; sorry for the delay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not in favor of the second level of tracking that is being added here for remaining,
i.e need to delete the instance (that we identified from tag) but that usually requires us to also clean up instance iam profile so we also add that profile resource to the tracker.
I think the best way to provide most information is to just tracker the tagged resources for users to describe remaining, our discovery is best effort and a lot of the times there are other resources that block deletion of the tagged resources that only AWS knows or provides in the error message.
So telling the user VPC deletion is left is all the information that is required for user to go ahead and delete the VPC in case we cannot.
Thanks for the review, @abhinavdahiya. I can live with only returning the tagged resources that could not be deleted rather than also including the next-level resources. I will rework this PR with that in mind. |
464da6b
to
5a09a63
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@abhinavdahiya I've re-written the changes. The logic should be much easier to follow now. |
nextTagClients = append(nextTagClients, tagClient) | ||
} else { | ||
o.Logger.Debugf("no deletions from %s, removing client", tagClientNames[tagClient]) | ||
instancesToDelete, err := o.findEC2Instances(ctx, ec2Client, deleted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to add a comment about why EC2 instances are special and need to be in the terminated state first before attempting to delete other resources. Can someone shed light on this for me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit message should provide context on why we stop instances before anything else. cf69c1e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thanks.
tagClientStack := append([]*resourcegroupstaggingapi.ResourceGroupsTaggingAPI(nil), tagClients...) | ||
|
||
// Terminate EC2 instances. | ||
ec2Client := ec2.New(awsSession) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs a region argument otherwise this might not function correctly..?
aws.NewConfig().WithRegion(region)
something like that..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current code in master does not specify the region. The awsSession
should already be configured with the region.
installer/pkg/destroy/aws/aws.go
Line 168 in d91c40b
deleted, err := terminateEC2InstancesByTags(ctx, ec2.New(awsSession), iamClient, o.Filters, o.Logger) |
The change looks good in at high level, the destroy isn't completing currently see e2e-aws exceeding the 4 hour timeout with no deprovision logs. So i think there is something missing here. maybe https://github.com/openshift/installer/pull/3772/files#discussion_r504111660 ?? |
The `RunWithContext` method has been modified to return a slice of ARNs that could not be destroyed. Only the ARNs of the first-level resources will be returned. For example, when deleting a VPC, the uninstaller will first delete other resources that use the VPC. Any of those resources that are blocked from being deleted but are not tagged for the cluster will not show up in the list of blocked resources. However, the first-level VPC will show up as blocked in this case. This will be used by Hive to expose to the user the resources that cannot be deleted so that the user can take action on those resources. https://issues.redhat.com/browse/CO-973
5a09a63
to
5db7fb7
Compare
@staebler: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
The tests are timing out running e2e-aws-gather-core-dump, before any attempts to deprovision. |
I think that it is a problem with this PR in particular and not the code in general. This may be because this PR is quite old and is maybe not getting some changes that have been made to the tests in the past couple months. I copied the branch and opened a new PR with the same code (#4270), and the e2e-aws test passes. I am going to close this PR in favor of the new one. /close |
@staebler: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The
RunWithContext
method has been modified to return a slice of ARNs that could not be destroyed. Only the ARNs of the first-level resources will be returned. For example, when deleting a VPC, the uninstaller will first delete other resources that use the VPC. Any of those resources that are blocked from being deleted but are not tagged for the cluster will not show up in the list of blocked resources. However, the first-level VPC will show up as blocked in this case.This will be used by Hive to expose to the user the resources that cannot be deleted so that the user can take action on those resources.
https://issues.redhat.com/browse/CO-973