New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: avoid recreate vmss cache in race condition #2589
fix: avoid recreate vmss cache in race condition #2589
Conversation
✅ Deploy Preview for kubernetes-sigs-cloud-provide-azure canceled.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andyzhangx The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
pkg/provider/azure_vmss_cache.go
Outdated
|
||
// lock and try find cacheKey from cache again, refresh cache if still not found | ||
lockKey := cacheKey + "/search" | ||
ss.lockMap.LockEntry(lockKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered using singleflight
to suppress duplicate calls rather than adding a new lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edreed using singleflight requires refactor of vmss cache, adding a new lock would be quite small and safe change, let's roll out this fix first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't require a refactor for use in this context, e.g.
v, err, _ := ss.group.Do(cacheKey, func () (interface{}, error) {
cache, err := ss.newVMSSVirtualMachinesCache(resourceGroup, vmssName, cacheKey)
if err != nil {
return nil, err
}
ss.vmssVMCache.Store(cacheKey, cache)
return cache, nil
});
if err != nil {
return "", nil, err
}
cache := v.(*azcache.TimedCache)
return cacheKey, cache, nil
where ss.group
is a singleflight.Group
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, using singleflight.Group now in the PR, I think this PR could fix the vmss list storm since it does not have lock in the before, lots of new vmss cache are created simultaneously which cause the vmss list storm.
44d94d0
to
cda9e11
Compare
/test pull-cloud-provider-azure-e2e-ccm-capz |
/lgtm |
/cherrypick release-1.25 |
@MartinForReal: once the present PR merges, I will cherry-pick it on top of release-1.25 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherrypick release-1.24 |
@MartinForReal: once the present PR merges, I will cherry-pick it on top of release-1.24 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherrypick release-1.23 |
@MartinForReal: once the present PR merges, I will cherry-pick it on top of release-1.23 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherrypick release-1.1 |
@MartinForReal: once the present PR merges, I will cherry-pick it on top of release-1.1 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@MartinForReal: #2589 failed to apply on top of branch "release-1.25":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@MartinForReal: #2589 failed to apply on top of branch "release-1.24":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@MartinForReal: #2589 failed to apply on top of branch "release-1.23":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@MartinForReal: #2589 failed to apply on top of branch "release-1.1":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
fix: avoid recreate vmss cache in race condition
In race condition, there could be duplicated
getVMSSVMCache
calls, and it would end up with creating multiple vmss cache objects, and finally cause VMSS list storm, related logs:Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: