Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vSphere cloud provider unable to detachVolume if node is deleted but not vSphere vm #71829

Closed
maplain opened this issue Dec 7, 2018 · 11 comments

Comments

@maplain
Copy link

commented Dec 7, 2018

/kind bug

What happened:
We had a pod with persistent storage. We drained the node firstly. Then, we deleted the node through kubectl, but the worker vm keeps there. The PV was not attached to the node that the Pod migrated too. As a result, rescheduled statefulset pod is stuck in pending state forever.

What you expected to happen:
PV is detached from old vm successfully so that statefulset pod can be recreated on another host without pending forever.

How to reproduce it (as minimally and precisely as possible):

  1. deploy storage-class by this yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: ci-storage
provisioner: kubernetes.io/vsphere-volume
parameters:
  diskformat: zeroedthick
  1. deploy stateful-set by this yaml
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: sample-statefulset
spec:
  serviceName: "pv-ex-svc"
  replicas: 3
  template:
    metadata:
      labels:
        app: pv-ex
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: pv-ex
        image: ${PATH_TO_STATEFUL_SET}
        ports:
        - containerPort: 3000
        volumeMounts:
        - name: www
          mountPath: /usr/share/pv-ex
        env:
        - name: LOG_FILE
          value: /usr/share/pv-ex/log.txt
  updateStrategy:
    type: RollingUpdate
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: ci-storage
      resources:
        requests:
          storage: 1Gi
  1. run
  kubectl --kubeconfig xxx drain "${node_name}" --grace-period 10 --force --delete-local-data  --ignore-daemonsets
  1. run
kubectl --kubeconfig xxx delete node "${node_name}" --ignore-not-found
  1. wait for 6 minutes
  2. have following logs in kube-controller-manager:
I1207 04:12:50.864000       1 node_status_updater.go:68] Could not update node status. Failed to find node "997186e0-4c8c-42ba-8bc0-624d67d76c13" in NodeInformer cache. Error: 'node "997186e0-4c8c-42ba-8bc0-624d67d76c13" not found'
W1207 04:12:50.864336       1 reconciler.go:231] attacherDetacher.DetachVolume started for volume "pvc-135ea502-f926-11e8-b7a5-005056a39e82" (UniqueName: "kubernetes.io/vsphere-volume/[iscsi-ds-0] kubevols/kubernetes-dynamic-pvc-135ea502-f926-11e8-b7a5-005056a39e82.vmdk") on node "997186e0-4c8c-42ba-8bc0-624d67d76c13" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching
E1207 04:12:50.864403       1 nodemanager.go:282] Error "No VM found" node info for node "997186e0-4c8c-42ba-8bc0-624d67d76c13" not found
E1207 04:12:50.864715       1 vsphere.go:550] Cannot find node "997186e0-4c8c-42ba-8bc0-624d67d76c13" in cache. Node not found!!!
E1207 04:12:50.864983       1 attacher.go:259] Error checking if volume ("[iscsi-ds-0] kubevols/kubernetes-dynamic-pvc-135ea502-f926-11e8-b7a5-005056a39e82.vmdk") is already attached to current node ("997186e0-4c8c-42ba-8bc0-624d67d76c13"). Will continue and try detach anyway. err=No VM found
E1207 04:12:50.865103       1 nodemanager.go:282] Error "No VM found" node info for node "997186e0-4c8c-42ba-8bc0-624d67d76c13" not found
E1207 04:12:50.865161       1 vsphere.go:550] Cannot find node "997186e0-4c8c-42ba-8bc0-624d67d76c13" in cache. Node not found!!!
I1207 04:12:50.865417       1 vsphere.go:869] Node "997186e0-4c8c-42ba-8bc0-624d67d76c13" does not exist, disk [iscsi-ds-0] kubevols/kubernetes-dynamic-pvc-135ea502-f926-11e8-b7a5-005056a39e82.vmdk is already detached from node.
I1207 04:12:50.865592       1 operation_generator.go:419] DetachVolume.Detach succeeded for volume "pvc-135ea502-f926-11e8-b7a5-005056a39e82" (UniqueName: "kubernetes.io/vsphere-volume/[iscsi-ds-0] kubevols/kubernetes-dynamic-pvc-135ea502-f926-11e8-b7a5-005056a39e82.vmdk") on node "997186e0-4c8c-42ba-8bc0-624d67d76c13"
E1207 04:13:13.760273       1 virtualmachine.go:137] Failed to attach the disk with storagePolicy: "" on VM: "". err - Failed to add disk 'scsi1:0'.

Anything else we need to know?:
It's caused by:

  1. vcp removes node from cache in face of a node delete event:
    https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/vsphere/vsphere.go#L1295
  2. vcp depends on the cache to find node in DetachDisk method:
    https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/vsphere/vsphere.go#L910
  3. if there is no node information in cache, vcp doesn't even check with vCenter to update cache, but directly asserts node is gone
    https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/vsphere/nodemanager.go#L232
  4. vcp assumes if node could not be found, disk is detached successfully in DetachDisk:
    https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/vsphere/vsphere.go#L202

Therefore, even though vm is still on vCenter with PV attached, DetachDisk could not successfully finish its job because of a cache miss. And once DetachDisk is called, no matter it's invoked by attachdetach.reconciler forcifully because of timeout, or following normal flow, it'll fool controller into thinking PV is detached successfully. This results in a pending stateful-set pod on another host.

Environment:

  • Kubernetes version (use kubectl version):
kubo@jumper:~$ kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.0", GitCommit:"ddf47ac13c1a9483ea035a79cd7c10005ff21a6d", GitTreeState:"clean", BuildDate:"2018-12-03T21:04:45Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.5", GitCommit:"753b2dbc622f5cc417845f0ff8a77f539a4213ea", GitTreeState:"clean", BuildDate:"2018-11-26T14:31:35Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
    vSphere
  • OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="16.04.5 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.5 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
  • Kernel (e.g. uname -a):
Linux 56870dca-33af-48ca-919e-3bec31e14ed4 4.15.0-39-generic #42~16.04.1-Ubuntu SMP Wed Oct 24 17:09:54 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: PKS
  • Others:

/kind bug

@maplain

This comment has been minimized.

Copy link
Author

commented Dec 7, 2018

/sig vmware

@maplain

This comment has been minimized.

Copy link
Author

commented Dec 7, 2018

@gnufied

This comment has been minimized.

Copy link
Member

commented Dec 7, 2018

This is known problem. vsphere cloud provider needs UUID from node object for successful detach and if you delete the node object - detach will no longer work. I opened a PR to fix this awhile back - #69691 (based on #63413) which stores the UUID in a configmap, so as even if node object is deleted, we can use the configmap to retrieve it and detach the volume.

We ended up not merging that PR because originally I opened that PR to fix - issue of detaching volumes from shutdown nodes and we fixed that issue in a different way. But I think that, it may be worth revisiting this problem because it seems incorrect that, one can't detach volumes if node object is deleted somehow. :(

Unfortunately, vsphere cloudprovider is unique in that sense. on AWS or GCE - one can describe the volume and get node's ID from volume itself but I have been told that, there is no such API call in vsphere and hence we must save node's UUID somewhere else.

@maplain

This comment has been minimized.

Copy link
Author

commented Dec 7, 2018

@gnufied thanks for getting back and sharing your thoughts on this issue!

I'm aware of your this fix:#70291 but it could not handle the situation where k8s node is removed manually. In which case, vcp will receive the nodeDelete event and clean up cache inevitably. and host itself is not powered off all the time.

vcp has a RediscoverNode method but as far as I know, it doesn't necessarily rediscover this node on vCenter if the node doesn't exist in the cache in the first place.
having node information in vcp in sync with k8s node object sounds kind of weird, because k8s depends on vcp to understand vsphere specific knowledge in situation like this.

@dougm probably have something to share regarding vSphere API...

@gnufied

This comment has been minimized.

Copy link
Member

commented Dec 7, 2018

@maplain I wasn't talking about that fix. I was talking about - #69691 which will fix this issue.

@SandeepPissay

This comment has been minimized.

Copy link
Contributor

commented Dec 10, 2018

@maplain

We drained the node firstly. Then, we deleted the node through kubectl,

How long did you wait until you deleted the node? Did you wait until kubernetes rescheduled the pods in the drained node to some other node? Did you not see kubernetes/vcp detaching volumes from drained node to the new(scheduled) node before you manually deleted the node?

In step 3 (where you are draining the node), is the grace period in seconds. If yes, then 10 seconds is probably too less. Did you try with a higher grace period like 60s, 300s or 600s? If no, could you let me know after you try this? The key thing here is that if you remove the node before kubernetes has successfully recheduled the pods, you will hit this issue (VCP unable to detach disks).

PR #69691 is another solution we can consider to get around this detach issue.

@mordebites

This comment has been minimized.

Copy link

commented Dec 12, 2018

We have the same problem.
We are changing our script to wait for the volumes to detach, but just now it took 50 minutes to process a single worker in a 3 worker cluster.
We have the logs and can share them with you if you need.

@fejta-bot

This comment has been minimized.

Copy link

commented Mar 12, 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@fejta-bot

This comment has been minimized.

Copy link

commented Apr 11, 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@fejta-bot

This comment has been minimized.

Copy link

commented May 11, 2019

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented May 11, 2019

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.