Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet leaves orphaned volume and pod stuck on Terminating state #85834

Open
Pingan2017 opened this issue Dec 3, 2019 · 1 comment
Open

kubelet leaves orphaned volume and pod stuck on Terminating state #85834

Pingan2017 opened this issue Dec 3, 2019 · 1 comment

Comments

@Pingan2017
Copy link
Member

@Pingan2017 Pingan2017 commented Dec 3, 2019

What happened:
deploy a pod with nfs volume. When nfs server and kubelet abonrmal, delete the pod.
then recover the kubelet, u can see the pod stucking in Terminating states. It's normal b/c nfs server working incorrectly.
But at this time, even if nfs is restored, the pod still stuck in the Terminating state. And will never recover.

What you expected to happen:
when kubelet and nfs server restored, the volume can be unmoiunted corretly and pod can be deleted.

How to reproduce it (as minimally and precisely as possible):

  1. deploy a pod with nfs volume
  2. stop kubelet and nfs server
  3. delete pod
  4. start kubelet, dsp will add the pod to desire state world, reconciler.reconcile() attach and mount nfs volume first. b/c the nfs server abonrmal , mount will be failed.
  5. when containers in the pod stopped, dsp will delete the pod and volume from desire state, reconciler.reconcile() will umount and detach the nfs volume. Umount will skipe b/c mount failed.
    the pod and volume wil be deleted in dsw and asw totally.
  6. pod stuck in Terminating state. Restore nfs server at this time, pod still stuck in Terminaiting and nfs volume won't be unmount.

Anything else we need to know?:
the nfs volume won't be unmount b/c the reconciler.syncStates() just run once when kueblet staring. No chance to add nfs volume to actual state world.

the logic modified in PR #58177
@jingxu97 What the reason for modify the logic to run once?
IMO , if reconciler.syncStates() run cyclically, the volume will be unmount eventually.

/sig storage

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:
@Pingan2017

This comment has been minimized.

Copy link
Member Author

@Pingan2017 Pingan2017 commented Dec 3, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.