New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vsphere Cloud Provider: failed to detach volume from shutdown node #75342
Comments
/sig vmware |
@kubernetes/sig-vmware-bugs |
@ksandermann: Reiterating the mentions to trigger a notification: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
It is even worse if you drain a node for an upgrade and then delete the VM. The disks that remain attached will be gone as well as the vm. This is not a recent issue, it has been like this since 1.8.x or 1.9.x. |
There's a KEP opened for this kubernetes/enhancements#719 /priority important-soon |
@yastij I see, thanks for the reference! |
The design is still in discussion, this will be landing on 1.15 |
@ksandermann Do you also see an issue where the NotReady node never gets cleaned up? I'm seeing the issue you mentioned, but the powered down node stays in the NotReady state and never goes away. I'm wondering if it has anything to do with the fact that the pod with the volume attached still "exists" on the downed node (though in an "Unknown" state), and these two issues are somehow tied to each other. |
I didn't test that case, so I can't say anything to that. |
@ksandermann Yeah, I would expect a node that is stuck in NotReady for some amount of time to be eventually removed from the cluster. Other cloud providers in different clouds have this behavior. |
isMultiAttachForbidden return true When setting pv with accessMode value 'ReadWriteOnce',
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
I'm also seeing this on 1.18.6, does anyone know any workarounds for this? fwiw I do not see any kind of "NotReady" nodes. I simply scaled the worker nodes down using the TKG cli, most pods got rescheduled successfully, others did not? How can I manually detach a volume? EDIT: Think I figured it out thanks to info in here: kubernetes-sigs/vsphere-csi-driver#221 (comment)
note: you may also need to ensure that no pods are running and holding onto the volume, so scale down your deployment. |
/assign @andrewsykim |
This issue is labeled with You can:
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/ /remove-triage accepted |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened:
I have a pod with a pv attached to it running on node1.
When I shutdown node1 to simulate node-failure, Kubernetes detects the unhealthy node in the configured timeframe and tries to re-schedule the pod on node2 following the --pod-eviction timeout.
When trying to start the pod on node2, the pv can not be attached as it is still attached to the shutdown node1:
Also, the pod on the shutdown node does not get deleted.
What you expected to happen:
As documented here:
The disk should be detached from the shutdown node and attached to the new node where the pod is scheduled on
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Also, kube-controller-manager does not log anything about this failure.
Detachment and attachment to another nodes works fine, as long as all nodes are healthy.
Force-deletion of the pod on the shutdown node also does nothing.
Environment:
kubectl version
): 1.12.5cat /etc/os-release
): Ubuntu 16.04uname -a
): Linux k8s-dev-master3 4.4.0-139-generic On delete, also attempt to update controller state #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/LinuxThanks in advance! :)
The text was updated successfully, but these errors were encountered: