-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix detach azure disk issue due to dirty cache #71495
fix detach azure disk issue due to dirty cache #71495
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andyzhangx The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/priority important-soon |
/milestone v1.13 |
Added to v1.13 since this is a critical bug fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@andyzhangx @feiskyer is this an issue introduced recently in 1.13 that we are trying to address with this fix? Are there CI tests to verify the goodness of the fix? We are very late in the 1.13 release cycle and I would like to avoid any code churn unless it is a critical urgent issue introduced in 1.13. Can this wait until 1.13.1? /hold |
Hi @AishSundar, Yes, we are going to fix the issue right in v1.13.0, and the code change only remains in azure directory. It's important for this fix fit into v1.13.0, thanks. |
/test pull-kubernetes-integration |
/hold cancel |
…1495-upstream-release-1.12 Automated cherry pick of #71495: fix detch azure disk issue by clean vm cache
…1495-upstream-release-1.10 Automated cherry pick of #71495: fix detch azure disk issue by clean vm cache
|
||
// Invalidate the cache right after updating | ||
key := buildVmssCacheKey(nodeResourceGroup, ss.makeVmssVMName(ssName, instanceID)) | ||
defer ss.vmssVMCache.Delete(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is the PR for this dirty cache issue, originally whether succeed or not, this PR clean the cache anyway.
I can move this clean cache operation defer ss.vmssVMCache.Delete(key)
to conditions when only error happens, would that solve your issue @khenidak ?I can build one hot fix image if you wan to try this fix.
cc@feiskyer
What type of PR is this?
/kind bug
What this PR does / why we need it:
fix detch azure disk issue by cleaning up vm cache right after every update vm operation in disk attach/detach
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #71453
Special notes for your reviewer:
When detaching lots of disks in one node, detach failure may happen, in the before, we use backoff mechanism to retries 6 times which is not good solution; now we switch to use k8s controller to detach disk volume, and it finally depends on
DisksAreAttached
func to decide whether one disk is attached or not:kubernetes/pkg/volume/azure_dd/attacher.go
Line 135 in 5dfe48b
Unfortunately
DisksAreAttached
func is invoked right afterDetachDiskByName
failure which already make the vm cache dirty, this PR makes sure vm cache will be cleaned after every update vm operation in disk attach/detach operation. And this PR will also solve strange issue, e.g. detach disk failure due to the last attach disk failure.The invocation chain for using
vmCache
inDisksAreAttached
func:Release note:
/sig azure
/assign @feiskyer
cc @khenidak @brendandburns
@antoineco