Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
fix: detach azure disk issue using dangling error #81266
What type of PR is this?
What this PR does / why we need it:
I think this PR could fix most of the disk attach issue due to detach disk not succeed(could be due to many reasons), this PR could do the recover automatically, I will cherry pick the fix to all old releases.
Actually this PR fixed two issues:
Which issue(s) this PR fixes:
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
@andyzhangx just to double check, the volume will only be detached if another pod and comes around and tries to use it. If no pod is attempting to use it, the volume will remain attached to the old node. Is that what happened?
If volume is still not detaching even if another pod tries to use it - that will be kinda weird. Can we get controller-manager logs when this happened?
@gnufied here is the useful logging, detach volume is skipped:
It's due to
At this time, the volume is in use by node#B.
And if I hack the above code by comment out the
And you could find all controller manager logs here: https://gist.githubusercontent.com/andyzhangx/6d6460ad8802f78016b0b4cbce292399/raw/3ea7142feadaf8f7bb9da7c7e75016ce7559771d/controller-manager.log
This condition will automatically go away after
I think this is a bug that you have discovered. When I made #78595 change - to add dangling volumes as uncertain, we stopped reporting the volume to node's status and hence it is failing. But this looks like a general problem with detaching volumes which are in uncertain state, I will open a PR to fix this. This needs better coverage to prevent regression too.
@andyzhangx actually part of my comment was not true, following error log:
Does not stop volume from being detached - https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/volume/attachdetach/reconciler/reconciler.go#L208 the volume should still be detached..
@gnufied thanks a lot for your help! Finally I figured out that this is another issue in detaching disk, actually detach disk operation happens, which it could not find that disk in the list, this PR fixed these two issues.
So this PR is ready for code review.
[APPROVALNOTIFIER] This PR is APPROVED
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing