Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vSphere unable to detachVolume if node VM has been deleted #61707

Closed
bstick12 opened this issue Mar 26, 2018 · 3 comments · Fixed by #62220

Comments

@bstick12
Copy link

commented Mar 26, 2018

/kind bug

What happened:

We had a pod with persistent storage. We drained the node. The node VM was deleted. The PV was not attached to the node that the Pod migrated too. The kube-controller-manager logs contained the following.

I0323 13:21:54.954344    6021 reconciler.go:231] attacherDetacher.DetachVolume started for volume "nil" (UniqueName: "kubernetes.io/vsphere-volume/[***] kubevols/kubernetes-dynamic-pvc-7ece5266-2e9a-11e8-b4d2-005056b91f7a.vmdk") on node "21a524c0-7812-4f2f-a886-1e93b80e9881" 
I0323 13:21:54.959794    6021 operation_generator.go:1165] Verified volume is safe to detach for volume "nil" (UniqueName: "kubernetes.io/vsphere-volume/[***] kubevols/kubernetes-dynamic-pvc-7ece5266-2e9a-11e8-b4d2-005056b91f7a.vmdk") on node "21a524c0-7812-4f2f-a886-1e93b80e9881" 
E0323 13:21:54.968942    6021 datacenter.go:78] Unable to find VM by UUID. VM UUID: 21a524c0-7812-4f2f-a886-1e93b80e9881
E0323 13:21:54.968960    6021 nodemanager.go:275] Error "No VM found" node info for node "21a524c0-7812-4f2f-a886-1e93b80e9881" not found
E0323 13:21:54.968970    6021 vsphere.go:475] Cannot find node "21a524c0-7812-4f2f-a886-1e93b80e9881" in cache. Node not found!!!
E0323 13:21:54.968980    6021 attacher.go:260] Error checking if volume ("[***] kubevols/kubernetes-dynamic-pvc-7ece5266-2e9a-11e8-b4d2-005056b91f7a.vmdk") is already attached to current node ("21a524c0-7812-4f2f-a886-1e93b80e9881"). Will continue and try detach anyway. err=No VM found
E0323 13:21:54.976573    6021 datacenter.go:78] Unable to find VM by UUID. VM UUID: 21a524c0-7812-4f2f-a886-1e93b80e9881
E0323 13:21:54.976589    6021 nodemanager.go:275] Error "No VM found" node info for node "21a524c0-7812-4f2f-a886-1e93b80e9881" not found
E0323 13:21:54.976597    6021 vsphere.go:475] Cannot find node "21a524c0-7812-4f2f-a886-1e93b80e9881" in cache. Node not found!!!
E0323 13:21:54.976607    6021 attacher.go:274] Error detaching volume "[***] kubevols/kubernetes-dynamic-pvc-7ece5266-2e9a-11e8-b4d2-005056b91f7a.vmdk": No VM found

What you expected to happen:

The PV can be attached to the node it was migrated to.

How to reproduce it (as minimally and precisely as possible):

  • Create a 2 node cluster on vSphere
  • Deploy a pod with persistent storage attached
  • Drain the node
  • Delete the node VM immediately after draining

Anything else we need to know?:

The issue seems to be in the call to getVSphereInstance. This will return an error if the Node doesn't exist. Precluding the logic on line 804-813 from being executed.

vsi, err := vs.getVSphereInstance(nodeName)
if err != nil {
return err
}
// Ensure client is logged in and session is valid
err = vsi.conn.Connect(ctx)
if err != nil {
return err
}
vm, err := vs.getVMFromNodeName(ctx, nodeName)
if err != nil {
// If node doesn't exist, disk is already detached from node.
if err == vclib.ErrNoVMFound {
glog.Infof("Node %q does not exist, disk %s is already detached from node.", convertToString(nodeName), volPath)
return nil
}
glog.Errorf("Failed to get VM object for node: %q. err: +%v", convertToString(nodeName), err)
return err
}
err = vm.DetachDisk(ctx, volPath)
if err != nil {
glog.Errorf("Failed to detach disk: %s for node: %s. err: +%v", volPath, convertToString(nodeName), err)
return err
}
return nil

A restart of the kube-controller-manager resolved the issue.

The following e2e test is not triggered in vSphere environments -

descr: "node is deleted",
. Which might expose the issue.

Environment:

  • Kubernetes version (use kubectl version): 1.9.5
  • Cloud provider or hardware configuration: vSphere
  • OS (e.g. from /etc/os-release): ubuntu
  • Kernel (e.g. uname -a):
  • Install tools: CFCR
  • Others:

/sig storage
/sig vmware

@divyenpatel

This comment has been minimized.

Copy link
Member

commented Mar 26, 2018

@kubernetes/vmware

@afong94

This comment has been minimized.

Copy link

commented Apr 2, 2018

More logs:

I0402 22:38:04.202213    5762 reconciler.go:231] attacherDetacher.DetachVolume started for volume "pvc-ce8fc3a9-36bd-11e8-ab44-005056a2a4fe" (UniqueName: "kubernetes.io/vsphere-volume/[iscsi-ds-0] kubevols/kubernetes-dynamic-pvc-ce8fc3a9-36bd-11e8-ab44-005056a2a4fe.vmdk") on node "5cf159b4-2e79-4427-b88d-e0f63f09f9e1"
I0402 22:38:04.204334    5762 operation_generator.go:1165] Verified volume is safe to detach for volume "pvc-ce8fc3a9-36bd-11e8-ab44-005056a2a4fe" (UniqueName: "kubernetes.io/vsphere-volume/[iscsi-ds-0] kubevols/kubernetes-dynamic-pvc-ce8fc3a9-36bd-11e8-ab44-005056a2a4fe.vmdk") on node "5cf159b4-2e79-4427-b88d-e0f63f09f9e1"
E0402 22:38:04.250547    5762 datacenter.go:78] Unable to find VM by UUID. VM UUID: 5cf159b4-2e79-4427-b88d-e0f63f09f9e1
E0402 22:38:04.250594    5762 nodemanager.go:275] Error "No VM found" node info for node "5cf159b4-2e79-4427-b88d-e0f63f09f9e1" not found
E0402 22:38:04.250605    5762 vsphere.go:475] Cannot find node "5cf159b4-2e79-4427-b88d-e0f63f09f9e1" in cache. Node not found!!!
E0402 22:38:04.250622    5762 attacher.go:260] Error checking if volume ("[iscsi-ds-0] kubevols/kubernetes-dynamic-pvc-ce8fc3a9-36bd-11e8-ab44-005056a2a4fe.vmdk") is already attached to current node ("5cf159b4-2e79-4427-b88d-e0f63f09f9e1"). Will continue and try detach anyway. err=No VM found
E0402 22:38:04.273874    5762 datacenter.go:78] Unable to find VM by UUID. VM UUID: 5cf159b4-2e79-4427-b88d-e0f63f09f9e1
E0402 22:38:04.273944    5762 nodemanager.go:275] Error "No VM found" node info for node "5cf159b4-2e79-4427-b88d-e0f63f09f9e1" not found
E0402 22:38:04.273957    5762 vsphere.go:475] Cannot find node "5cf159b4-2e79-4427-b88d-e0f63f09f9e1" in cache. Node not found!!!
E0402 22:38:04.273971    5762 attacher.go:274] Error detaching volume "[iscsi-ds-0] kubevols/kubernetes-dynamic-pvc-ce8fc3a9-36bd-11e8-ab44-005056a2a4fe.vmdk": No VM found
E0402 22:38:04.274117    5762 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/vsphere-volume/[iscsi-ds-0] kubevols/kubernetes-dynamic-pvc-ce8fc3a9-36bd-11e8-ab44-005056a2a4fe.vmdk\"" failed. No retries permitted until 2018-04-02 22:38:04.774057358 +0000 UTC m=+39.368451717 (durationBeforeRetry 500ms). Error: "DetachVolume.Detach failed for volume \"pvc-ce8fc3a9-36bd-11e8-ab44-005056a2a4fe\" (UniqueName: \"kubernetes.io/vsphere-volume/[iscsi-ds-0] kubevols/kubernetes-dynamic-pvc-ce8fc3a9-36bd-11e8-ab44-005056a2a4fe.vmdk\") on node \"5cf159b4-2e79-4427-b88d-e0f63f09f9e1\" : No VM found"
@abrarshivani

This comment has been minimized.

Copy link
Member

commented Apr 12, 2018

/assign @abrarshivani

k8s-github-robot pushed a commit that referenced this issue Apr 18, 2018
Kubernetes Submit Queue
Merge pull request #62220 from vmware/detach_bug_fix
Automatic merge from submit-queue (batch tested with PRs 62568, 62220, 62743, 62751, 62753). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

[vSphere Cloud Provider] Fix detach disk when VM is not found

**What this PR does / why we need it**:
When VM is deleted from VC inventory and detach request is issued detach returns error since VM cannot be found. In this scenario, detach should return no error if VM is not found. This PR fixes this.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #61707.

**Special notes for your reviewer**:

**Release note**:

```release-note
None
```
@kubernetes/vmware
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.