-
Notifications
You must be signed in to change notification settings - Fork 39.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node VM failure doesn't automatically recreate a pod with attached PV #80040
Comments
/sig vmware |
Out of curiosity, what 6 min timeout are you referring to? |
@rhockenbury k8s timeout to force unmount the PV. |
I was able to reproduce this issue as well by doing the following:
|
looks a lot like an issue filed before: #71829 |
This is a known issue across all cloud providers at the moment, @yastij is working on a KEP right now that addresses this kubernetes/enhancements#1116, however, the problem is a bit thorny as it requires coordination from a number of different components (controller manager, scheduler and the kubelet) and errors can lead to data corruption in certain situations. Will let @yastij comment further if there's anything else to add. |
This is intended to be fixed by kubernetes/enhancements#1116, the current KEP reflects a part of the solution, but we still need to update it to make workload opt-in into this behavior. /priority important-soon |
/assign @SandeepPissay |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
@SandeepPissay Are you still looking at this? |
No, I'm not. |
@andrewsykim do you know anyone who can look at this? Do we know if its still a problem? |
I can confirm that this is very much still an issue as of Kubernetes 1.19 |
Referring a PR (kubernetes-sigs/vsphere-csi-driver#529) for a similar issue with the vSphere CSI driver. |
What happened:
After a deployment is created with PVC, a node hosting the pod shuts down. After 6 mins timeout, the replacement/new pod is created but cannot come up since the volume is attached to a terminating pod on a shutdown node. To work around this, the volume would be detached from the original pod when it's force deleted.
What you expected to happen:
According to the Node VM Failure scenario at https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/high-availability.html, the recovery mechanism is completely automatic. In reality, it requires manual intervention to force delete a pod.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kubectl version
): v1.12.7-gke.19cat /etc/os-release
):uname -a
):/sig vmware
The text was updated successfully, but these errors were encountered: