CORS-4058: Migrate AWS Destroy to SDK v2#9939
CORS-4058: Migrate AWS Destroy to SDK v2#9939openshift-merge-bot[bot] merged 12 commits intoopenshift:mainfrom
Conversation
** the bulk of the changes are to the ec2helpers file. All of the sdk v1 imports are removed except for session as this one is engrained too many files currently. pkg/destroy/aws/aws.go ** Add a client for ELB ELBV2 and IAM to the Cluster Removal Struct. Even though these changes are mainly to ec2helpers, the other clients were required in for certain operations. ** The rest of the file updates are alter ARN import to come from aws sdk v2.
** Remove/Change all imports from AWS sdk v1 to v2. pkg/destroy/aws/errors.go pkg/destroy/aws/ec2helpers.go ** Remove the Error checking/formatting function from ec2helpers and put the function in the errors.go file.
** Remove all SDK v1 imports from elb helpers.
** Update Route53, s3, and efs services to sdk v2. This is slowly removing the requirement for aws session.
** This caused updates to other packages such as aws/config, credentials, stscreds, and a list of aws internal packages.
… clients in destroyer.
|
@barbacbd: This pull request references CORS-4058 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "4.21." or "openshift-4.21.", but it targets "4.20.0" instead. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/cc @patrickdillon |
pkg/destroy/aws/aws.go
Outdated
| cfg, err := configv2.LoadDefaultConfig(context.TODO(), configv2.WithRegion(region)) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed loading default config: %w", err) | ||
| } | ||
| return createResourceTaggingClientWithConfig(cfg, region, endpoints), nil | ||
| } |
There was a problem hiding this comment.
For the v2 Config, we should get them from: GetConfigWithOptions(ctx, configv2.WithRegion(region).
This has the common settings (i.e. maxRetries) that we probably want to have.
13ecc48 to
831afca
Compare
pkg/destroy/aws: ** Alter the function name from HandleErrorCode to handleErrorCode. The initial thought was that this function could be used in other areas of the code, but it will remain in destroy for now. pkg/destroy/aws/shared.go: ** Remove the session import and uses in the file.
** Remove session from the imports. Added the agent handler to the configurations.
831afca to
4fcea32
Compare
|
/test gofmt |
tthvo
left a comment
There was a problem hiding this comment.
Looks good overall to me 🚀 !
I had some more questions and suggestions 🤔 I think there are a few more places where we can refactor:
- Move client constructs to the common client file
pkg/asset/installconfig/aws/clients.go - Move error handling and error constants to a common file
pkg/asset/installconfig/aws/awserrors.go
These can be handled in CORS-4078 as a final touch :D
| var awsErr awserr.Error | ||
| ok := errors.As(err, &awsErr) | ||
| if ok && awsErr.Code() == resourcegroupstaggingapi.ErrorCodeInvalidParameterException { | ||
| if strings.Contains(HandleErrorCode(err), "InvalidParameter") { |
There was a problem hiding this comment.
I think this way of error checking (i.e. parsing error code) and the one above (i.e. errors.As to a specific error type) is equivalent.
Though, should we use the same way for consistency?
There was a problem hiding this comment.
It seems previously we were checking to the error code in both places.
There was a problem hiding this comment.
@tthvo I believe that this is already the equivalent. HandleErrorCode will run errors.As and return the code.
pkg/destroy/aws/aws.go
Outdated
| route53Client, err := awssession.NewRoute53Client(ctx, awssession.EndpointOptions{ | ||
| Region: region, | ||
| Endpoints: metadata.AWS.ServiceEndpoints, | ||
| }, "") // FIXME: Do we need an ARN here? |
There was a problem hiding this comment.
AFAIK, we don't need to set the hostedZoneRole arn here for this common route53 client as it is used for deleting "owned" hosted zone.
However, we do need to recreate the client with the role arn in file: pkg/destroy/aws/shared.go for "shared" hosted zone. See lines below:
That block was deleted and we need to add that logic back.
| publicZoneClient := route53.New(session) | ||
| privateZoneClient := route53.New(session) | ||
| if o.HostedZoneRole != "" { | ||
| creds := stscreds.NewCredentials(session, o.HostedZoneRole) | ||
| privateZoneClient = route53.New(session, &aws.Config{Credentials: creds}) | ||
| logger.Infof("Assuming role %s to destroy records in private hosted zone", o.HostedZoneRole) | ||
| } |
There was a problem hiding this comment.
Unfortunately, this block is still needed. If not, the handler will use the common Route53 and completely ignore the hosted zone role.
Thus, we need to recreate the route53 client here:
- A
publicZoneClient: use the common Route53 client. - A
privateZoneClient: ifhostZoneRoleis specified, create a separate client with AWS STS.
|
/test e2e-aws-ovn-custom-iam-profile e2e-aws-ovn-public-ipv4-pool e2e-aws-ovn-upi e2e-aws-ovn-heterogeneous |
Set a Destroy User Agent. Cleanup pointer references to use the aws sdk.
tthvo
left a comment
There was a problem hiding this comment.
/lgtm
Nice, we have still have a discussion for #9939 (comment). But this looks good!
|
I'm not sure about the other CI jobs--we can check back when they finish running--but I have some concerns about the destroy in the aws custom dns job: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_installer/9939/pull-ci-openshift-installer-main-e2e-aws-custom-dns-techpreview/1968339873618202624/artifacts/e2e-aws-custom-dns-techpreview/ipi-deprovision-deprovision/artifacts/
|
The destroy finished just now I think: time="2025-09-17T18:08:54Z" level=info msg=Deleted id=vpc-0d44eceb3e9433e2f resourceType=vpc
These logs come from endpoint resolver for AWS SDK v2. They have been so noisy. Let me remove them in #9907 |
Oh, yeah I think I actually just lost it in the error messages. You're right, no issues with destroy!
Ah, that will be a separate PR, so this LGTM. Thanks! /approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: patrickdillon The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/cc @yunjiang29 |
|
/test e2e-aws-ovn |
|
/test e2e-aws-ovn-custom-iam-profile e2e-aws-ovn-public-ipv4-pool e2e-aws-ovn-upi e2e-aws-ovn-heterogeneous |
|
/retest-required |
|
@barbacbd: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/verified by ci/prow/e2e-aws-ovn |
|
@barbacbd: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
bad1b52
into
openshift:main
no-jira: Revert "Merge pull request #9939 from barbacbd/CORS-4058-release-4.21"
| resourcegroupstaggingapi.New(awsSession, aws.NewConfig().WithRegion(endpoints.UsGovWest1RegionID))) | ||
| case endpointUSGovEast1, endpointUSGovWest1: | ||
| if o.Region != endpointUSGovWest1 { | ||
| tagClient, err := createResourceTaggingClient(endpointUSGovWest1, o.endpoints) |
There was a problem hiding this comment.
Is this correct? When the region is not endpointUSGovWest1 , we create a resource tagging client for the same? If it is, could you please add a comment why?
There was a problem hiding this comment.
I actually have little context, but that block has always been there. For example, in release 4.19:
installer/pkg/destroy/aws/aws.go
Lines 166 to 170 in 7e30b7d
Git Blame returns PR #4042. IIUC, for government cloud partition, there are route53 resources that are only available in us-west-gov-1, that will have to be cleaned up.
Reintroduce the changes for the AWS Destroy code.
The Big changes between this and the original PR #9736: