Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OCPCLOUD-1159] Validate unknown regions using AWS API #32

Merged

Conversation

RadekManak
Copy link
Contributor

@RadekManak RadekManak commented Apr 12, 2022

When region cannot be validated locally from vendored aws/endpoints/defaults.go call AWS describeRegions API
to get list of regions and validate requested region with it.

This allows new AWS regions to work on older versions of OpenShift
without having to backport AWS SDK.

Requires openshift/machine-api-operator#1007 to merge first

@RadekManak RadekManak changed the title Validate unknown regions using AWS API [OCPCLOUD-1159] Validate unknown regions using AWS API Apr 12, 2022
@RadekManak RadekManak force-pushed the validate_unknown_regions branch 2 times, most recently from 68fc5b4 to c92179b Compare April 12, 2022 11:20
pkg/client/client.go Outdated Show resolved Hide resolved
pkg/client/client.go Show resolved Hide resolved
pkg/client/client.go Outdated Show resolved Hide resolved
pkg/client/client.go Outdated Show resolved Hide resolved
Comment on lines 211 to 232
type describeRegionsData struct {
err error
describeRegionsOutput *ec2.DescribeRegionsOutput
lastUpdated time.Time
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any data we are storing like this needs to have a mutex on it. Perhaps instead we can create a cache structure that is owned by the reconciler? That way it is a property of something rather than a global variable. A RegionCache would become an argument to validated client

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used global variable here because it reduced the number of places that required changes. I've made a separate commit that moves the cache ownership to the actuator. We can merge both or just the first one. Depending on what you like more.

@RadekManak RadekManak force-pushed the validate_unknown_regions branch 3 times, most recently from 10bc5e9 to d47ad7f Compare April 21, 2022 08:22
Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the improvements here are good, have left some additional feedback

@@ -33,6 +33,8 @@ type machineScopeParams struct {
machine *machinev1beta1.Machine
// api server controller runtime client for the openshift-config-managed namespace
configManagedClient runtimeclient.Client
// accessKeyID (string) to *DescribeRegionsData map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, that should mean that if the credentials change we get a new cache, which makes sense. Good idea!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, past me is cleverer than present me, I don't remember saying that 😂

pkg/client/client.go Outdated Show resolved Hide resolved
pkg/client/client.go Show resolved Hide resolved
return nil, err
}

regionsCache.describeRegionsOutput = describeRegionsOutput
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should make the region cache a struct with a mutex in so that we can ensure there isn't concurrent access across threads, you could even make it an interface

type RegionCache interface {
  GetData(string) RegionCacheData
  SetData(string, RegionCacheData)
}

Then the implementation itself would handle the concurrency and mutex by using an RWMutex in the Get and Set respectively, WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to handle concurrent access? The machines are reconciled sequentially. Is this just future proofing in case we decide to reconcile multiple MachineSets at once?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To a degree it's future proofing, but as the MaxConcurrentReconciles is something that could be configured within the controller manager to be more than 1 with ease, it's very easy for this to become a problem in short order

pkg/client/client.go Outdated Show resolved Hide resolved
defer c.mutex.Unlock()
c.data[accessKeyID] = data
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this now, I don't think this is what we want. We want to block for the entire duration of cachedAWSDescribeRegions call

@RadekManak RadekManak force-pushed the validate_unknown_regions branch 2 times, most recently from dee462c to be2e1d0 Compare May 3, 2022 16:15
pkg/actuators/machine/machine_scope.go Outdated Show resolved Hide resolved
return nil, err
}

c.mutex.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can improve the mutex usage here if you wanted. At the moment you are performing all reads under a write lock, you could obtain the write lock only once you know you need to write.

My suggestion would be to do the read from the map in another helper function that uses RLock, then get the lock in this function only when you determine you need to update it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locking during read here is better because we avoid multiple API requests.

With separate locks, If multiple threads enter at once and there is no data in cache, then they would all wait at write lock and query the API one by one.

With single lock, all threads wait for the first thread to fetch the data and then just read it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a pattern for this that we use when using locks, first you read, and decide if you need to write, if you need to write, you wait til you hold the write lock then check again if you need to write, then release the lock if you no longer need to write

But I think we can follow up on this later

pkg/client/client.go Show resolved Hide resolved
@RadekManak
Copy link
Contributor Author

Just comments change from the latest review

@JoelSpeed
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 10, 2022
Copy link
Member

@damdo damdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label May 10, 2022
@RadekManak
Copy link
Contributor Author

/retest-required

When region cannot be validated locally from
 vendored aws/endpoints/defaults.go call AWS describeRegions API
to get list of regions and validate requested region with it.

This allows new AWS regions to work on older versions of openshift
without having to backport AWS SDK.
@JoelSpeed
Copy link
Contributor

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 10, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 10, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 10, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 10, 2022

@RadekManak: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 8ee1581 into openshift:main May 10, 2022
@RadekManak
Copy link
Contributor Author

/cherry-pick release-4.10

@openshift-cherrypick-robot

@RadekManak: #32 failed to apply on top of branch "release-4.10":

Applying: Validate unknown regions using AWS API
Using index info to reconstruct a base tree...
A	pkg/actuators/awsplacementgroup/controller.go
A	pkg/actuators/awsplacementgroup/controller_test.go
M	pkg/actuators/machine/actuator.go
M	pkg/actuators/machine/actuator_test.go
M	pkg/actuators/machine/controller_test.go
M	pkg/actuators/machine/machine_scope.go
M	pkg/actuators/machine/machine_scope_test.go
M	pkg/actuators/machine/reconciler_test.go
M	pkg/client/client.go
M	pkg/client/mock/client_generated.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/client/mock/client_generated.go
Auto-merging pkg/client/client.go
Auto-merging pkg/actuators/machine/reconciler_test.go
CONFLICT (content): Merge conflict in pkg/actuators/machine/reconciler_test.go
Auto-merging pkg/actuators/machine/machine_scope_test.go
Auto-merging pkg/actuators/machine/machine_scope.go
Auto-merging pkg/actuators/machine/controller_test.go
Auto-merging pkg/actuators/machine/actuator_test.go
Auto-merging pkg/actuators/machine/actuator.go
CONFLICT (modify/delete): pkg/actuators/awsplacementgroup/controller_test.go deleted in HEAD and modified in Validate unknown regions using AWS API. Version Validate unknown regions using AWS API of pkg/actuators/awsplacementgroup/controller_test.go left in tree.
CONFLICT (modify/delete): pkg/actuators/awsplacementgroup/controller.go deleted in HEAD and modified in Validate unknown regions using AWS API. Version Validate unknown regions using AWS API of pkg/actuators/awsplacementgroup/controller.go left in tree.
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Validate unknown regions using AWS API
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-4.10

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants