Skip to content

CORS-4078: centralize v2 aws client constructions #9882

Merged
openshift-merge-bot[bot] merged 3 commits intoopenshift:mainfrom
tthvo:CORS-4078
Aug 15, 2025
Merged

CORS-4078: centralize v2 aws client constructions #9882
openshift-merge-bot[bot] merged 3 commits intoopenshift:mainfrom
tthvo:CORS-4078

Conversation

@tthvo
Copy link
Member

@tthvo tthvo commented Aug 12, 2025

Notes: I realized the changes are becoming way too large for review so I will split into them into 2 parts. This PR focuses on the config and client constructs.

CORS-4078 Part 1

Following session pattern in v1 sdk, we centralize the construction of v2 sdk aws.Config and API clients for better reusability.

To remain backwards compatible, I included the following:

  • New config and client constructs are in sessionv2.go while we gradually migrate parts of the installer code.
  • EndpointResolver v1 is kept around to resolve partition ID of a region. In v2, the partition ID is not available.
  • Service ID in v1 is converted to v2 equivalent to be used internally by the installer. This means the users can continue to use existing values while we migrate the sdk.

CORS-4078 Part 2

The following will be done in part 2 (in a follow-up PR):

  • Centralize the aws error checking logics (e.g. error code, error message constants refactoring)
  • Check custom-endpoint compatibility with CAPA and OCP components. That is whether we can put duplicate endpoints with both service ID v1 and v2 to those components 🤔

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Aug 12, 2025

@tthvo: This pull request references CORS-4078 which is a valid jira issue.

Details

In response to this:

Notes: I realized the changes are becoming way too large for review so I will split into them into 2 parts. This PR focuses on the config and client constructs.

CORS-4078 Part 1

Following session pattern in v1 sdk, we centralize the construction of v2 sdk aws.Config and API clients for better reusability.

To remain backwards compatible, I included the following:

  • New config and client constructs are in sessionv2.go while we gradually migrate parts of the installer code.
  • EndpointResolver v1 is kept around to resolve partition ID of a region. In v2, the partition ID is not available.
  • Service ID in v1 is converted to v2 equivalent to be used internally by the installer. This means the users can continue to use existing values while we migrate the sdk.

CORS-4078 Part 2

The following will be done in part 2 (in a follow-up PR):

  • Centralize the aws error checking logics (e.g. error code, error message constants refactoring)
  • Check custom-endpoint compatibility with CAPA and OCP components. That is whether we can put duplicate endpoints with both service ID v1 and v2 to those components 🤔

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 12, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Aug 12, 2025

@tthvo: This pull request references CORS-4078 which is a valid jira issue.

Details

In response to this:

Notes: I realized the changes are becoming way too large for review so I will split into them into 2 parts. This PR focuses on the config and client constructs.

CORS-4078 Part 1

Following session pattern in v1 sdk, we centralize the construction of v2 sdk aws.Config and API clients for better reusability.

To remain backwards compatible, I included the following:

  • New config and client constructs are in sessionv2.go while we gradually migrate parts of the installer code.
  • EndpointResolver v1 is kept around to resolve partition ID of a region. In v2, the partition ID is not available.
  • Service ID in v1 is converted to v2 equivalent to be used internally by the installer. This means the users can continue to use existing values while we migrate the sdk.

CORS-4078 Part 2

The following will be done in part 2 (in a follow-up PR):

  • Centralize the aws error checking logics (e.g. error code, error message constants refactoring)
  • Check custom-endpoint compatibility with CAPA and OCP components. That is whether we can put duplicate endpoints with both service ID v1 and v2 to those components 🤔

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from mtulio and patrickdillon August 12, 2025 22:19
@tthvo
Copy link
Member Author

tthvo commented Aug 12, 2025

/label platform/aws
/hold

Hold for thoroughly reviewing and checking these changes 😅 as they tend to break...

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. platform/aws labels Aug 12, 2025
@tthvo
Copy link
Member Author

tthvo commented Aug 12, 2025

/cc @barbacbd

@tthvo
Copy link
Member Author

tthvo commented Aug 14, 2025

/unhold

The changes seem OK from the results of CI jobs...

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 14, 2025
tthvo added 3 commits August 14, 2025 10:49
Following session pattern in v1 sdk, we centralize the construction of
v2 sdk aws.Config and API clients for better reusability.

To remain backwards compatible, the following is in place:
- New config and client constructs are in sessionv2.go  while we
  gradually migrate parts of the installer code.
- EndpointResolver v1 is kept around to resolve partition ID of a
  region.
- Service ID in v1 is converted to v2 equivalent internally. The user
  can continue to use existing values while we migrate the sdk.

Notes:
- Default service endpoints for CN regions are not handled by the v2
  endpoint resolver.
The following services are added:
- github.com/aws/aws-sdk-go-v2/service/elasticloadbalancingv2
- github.com/aws/aws-sdk-go-v2/service/elasticloadbalancing
- github.com/aws/aws-sdk-go-v2/service/resourcegroupstaggingapi
@barbacbd
Copy link
Contributor

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 14, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: barbacbd

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 14, 2025
@sadasu
Copy link
Contributor

sadasu commented Aug 14, 2025

/test e2e-aws-ovn

@sadasu
Copy link
Contributor

sadasu commented Aug 14, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 14, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 15, 2025

@tthvo: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-heterogeneous 9b89713 link false /test e2e-aws-ovn-heterogeneous
ci/prow/e2e-azure-ovn-resourcegroup 9b89713 link false /test e2e-azure-ovn-resourcegroup
ci/prow/e2e-aws-ovn-shared-vpc-edge-zones 9b89713 link false /test e2e-aws-ovn-shared-vpc-edge-zones
ci/prow/e2e-aws-ovn-edge-zones 9b89713 link false /test e2e-aws-ovn-edge-zones
ci/prow/e2e-vsphere-host-groups-ovn-techpreview 9b89713 link false /test e2e-vsphere-host-groups-ovn-techpreview
ci/prow/e2e-aws-byo-subnet-role-security-groups 9b89713 link false /test e2e-aws-byo-subnet-role-security-groups
ci/prow/okd-scos-e2e-aws-ovn 9b89713 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-aws-ovn-imdsv2 9b89713 link false /test e2e-aws-ovn-imdsv2
ci/prow/e2e-aws-ovn-shared-vpc-custom-security-groups 9b89713 link false /test e2e-aws-ovn-shared-vpc-custom-security-groups
ci/prow/e2e-vsphere-externallb-ovn 9b89713 link false /test e2e-vsphere-externallb-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 565c043 into openshift:main Aug 15, 2025
19 of 29 checks passed
@tthvo tthvo deleted the CORS-4078 branch August 15, 2025 06:44
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-baremetal-installer
This PR has been included in build ose-baremetal-installer-container-v4.20.0-202508150614.p0.g565c043.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-installer
This PR has been included in build ose-installer-container-v4.20.0-202508150614.p0.g565c043.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-installer-artifacts
This PR has been included in build ose-installer-artifacts-container-v4.20.0-202508150614.p0.g565c043.assembly.stream.el9.
All builds following this will include this PR.

Copy link
Contributor

@patrickdillon patrickdillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a suggestion for the future implementation, also I notice that the endpoint options contains configuration for FIPS endpoints. Good find. We will need to make sure that is set accordingly

Comment on lines +87 to +90
ec2Client, err := awsconfig.NewEC2Client(ctx, awsconfig.EndpointOptions{
Region: installConfig.Config.Platform.AWS.Region,
Endpoints: installConfig.Config.Platform.AWS.ServiceEndpoints,
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned that you were still open to suggestions despite this having merged: I would suggest creating a function on metadata that can create or get a client without passing EndpointOptions. Similar to the current getSession:

sess, err := meta.Session(ctx)

we would have something that is just:

ec2Client, err := meta.EC2Client(ctx)

This will be beneficial whenever we need to construct new clients in the asset package (which is the most common use case), we know it will be correct, and users don't need to worry about correctly configuring endpoints or fips

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, sounds good! We do actually already have that function (but I am not using it here).

func (m *Metadata) EC2Client(ctx context.Context) (*ec2.Client, error) {
if m.ec2Client == nil {
ec2Client, err := NewEC2Client(ctx, EndpointOptions{
Region: m.Region,
Endpoints: m.Services,
})
if err != nil {
return nil, fmt.Errorf("failed to create EC2 client: %w", err)
}
m.ec2Client = ec2Client
}
return m.ec2Client, nil
}

I think what I am going to do is to create a "client set" in the metadata with all AWS clients, not just EC2 for easy use. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I will replace these separately-created clients with the call to metadata in a follow-up PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. platform/aws

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants