New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OCPCLOUD-1159] Validate unknown regions using AWS API #32
[OCPCLOUD-1159] Validate unknown regions using AWS API #32
Conversation
68fc5b4
to
c92179b
Compare
pkg/client/client.go
Outdated
type describeRegionsData struct { | ||
err error | ||
describeRegionsOutput *ec2.DescribeRegionsOutput | ||
lastUpdated time.Time | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any data we are storing like this needs to have a mutex on it. Perhaps instead we can create a cache structure that is owned by the reconciler? That way it is a property of something rather than a global variable. A RegionCache
would become an argument to validated client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used global variable here because it reduced the number of places that required changes. I've made a separate commit that moves the cache ownership to the actuator. We can merge both or just the first one. Depending on what you like more.
10bc5e9
to
d47ad7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the improvements here are good, have left some additional feedback
@@ -33,6 +33,8 @@ type machineScopeParams struct { | |||
machine *machinev1beta1.Machine | |||
// api server controller runtime client for the openshift-config-managed namespace | |||
configManagedClient runtimeclient.Client | |||
// accessKeyID (string) to *DescribeRegionsData map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, that should mean that if the credentials change we get a new cache, which makes sense. Good idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, it was your idea :D
https://github.com/openshift/machine-api-provider-aws/pull/32/files#r848333870
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, past me is cleverer than present me, I don't remember saying that 😂
pkg/client/client.go
Outdated
return nil, err | ||
} | ||
|
||
regionsCache.describeRegionsOutput = describeRegionsOutput |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should make the region cache a struct with a mutex in so that we can ensure there isn't concurrent access across threads, you could even make it an interface
type RegionCache interface {
GetData(string) RegionCacheData
SetData(string, RegionCacheData)
}
Then the implementation itself would handle the concurrency and mutex by using an RWMutex in the Get and Set respectively, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to handle concurrent access? The machines are reconciled sequentially. Is this just future proofing in case we decide to reconcile multiple MachineSets at once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To a degree it's future proofing, but as the MaxConcurrentReconciles
is something that could be configured within the controller manager to be more than 1 with ease, it's very easy for this to become a problem in short order
d47ad7f
to
af6f69d
Compare
pkg/client/client.go
Outdated
defer c.mutex.Unlock() | ||
c.data[accessKeyID] = data | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this now, I don't think this is what we want. We want to block for the entire duration of cachedAWSDescribeRegions call
dee462c
to
be2e1d0
Compare
return nil, err | ||
} | ||
|
||
c.mutex.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can improve the mutex usage here if you wanted. At the moment you are performing all reads under a write lock, you could obtain the write lock only once you know you need to write.
My suggestion would be to do the read from the map in another helper function that uses RLock, then get the lock in this function only when you determine you need to update it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Locking during read here is better because we avoid multiple API requests.
With separate locks, If multiple threads enter at once and there is no data in cache, then they would all wait at write lock and query the API one by one.
With single lock, all threads wait for the first thread to fetch the data and then just read it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a pattern for this that we use when using locks, first you read, and decide if you need to write, if you need to write, you wait til you hold the write lock then check again if you need to write, then release the lock if you no longer need to write
But I think we can follow up on this later
be2e1d0
to
5ed2366
Compare
Just comments change from the latest review |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
5ed2366
to
7a469ba
Compare
/retest-required |
7a469ba
to
bde4f0b
Compare
When region cannot be validated locally from vendored aws/endpoints/defaults.go call AWS describeRegions API to get list of regions and validate requested region with it. This allows new AWS regions to work on older versions of openshift without having to backport AWS SDK.
bde4f0b
to
ffe14b2
Compare
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JoelSpeed The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@RadekManak: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/cherry-pick release-4.10 |
@RadekManak: #32 failed to apply on top of branch "release-4.10":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
When region cannot be validated locally from vendored aws/endpoints/defaults.go call AWS describeRegions API
to get list of regions and validate requested region with it.
This allows new AWS regions to work on older versions of OpenShift
without having to backport AWS SDK.
Requires openshift/machine-api-operator#1007 to merge first