Err: No vSphere disk ID/Name found #178

Elegant996 · 2019-04-03T05:40:46Z

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:
Volume failed to attach to pod after it was restarted.

What you expected to happen:
Volume to successfully reattach as the PersistentVolumeClaim and PersistentVolume still exist as well as the vmdk.

How to reproduce it (as minimally and precisely as possible):
After a grace period (a day or two?), scale the replicas of Deployment/StatefulSet to 0 that has a storage volume attached through the CSI driver. Set the scale back to its original value and CSI driver should fail to attach the volume on the new pod(s):

Warning FailedAttachVolume 5s (x4 over 6m12s) attachdetach-controller AttachVolume.Attach failed for volume "pvc-2e07359a-4f90-11e9-a939-000c29616bad" : rpc error: code = Internal desc = WhichVCandDCByFCDId(927ef76f-2312-4a4e-b634-8fbb13134462) failed. Err: No vSphere disk ID/Name found

Please note that this is independent of the node that the pod previously resided on. The disk does not appear to unmount from the node in vCenter either which seems problematic.

Anything else we need to know?:
csi-attacher logs:

I0403 05:37:54.655419 1 controller.go:173] Started VA processing "csi-94a7031ea7c96543a069a44a5cba3b8f4db0aa7f2cd7f6800610578ba01e3b63"
I0403 05:37:54.655541 1 csi_handler.go:93] CSIHandler: processing VA "csi-94a7031ea7c96543a069a44a5cba3b8f4db0aa7f2cd7f6800610578ba01e3b63"
I0403 05:37:54.655565 1 csi_handler.go:120] Attaching "csi-94a7031ea7c96543a069a44a5cba3b8f4db0aa7f2cd7f6800610578ba01e3b63"
I0403 05:37:54.655915 1 csi_handler.go:259] Starting attach operation for "csi-94a7031ea7c96543a069a44a5cba3b8f4db0aa7f2cd7f6800610578ba01e3b63"
I0403 05:37:54.656107 1 csi_handler.go:388] Saving attach error to "csi-94a7031ea7c96543a069a44a5cba3b8f4db0aa7f2cd7f6800610578ba01e3b63"
I0403 05:37:54.660047 1 controller.go:139] Ignoring VolumeAttachment "csi-94a7031ea7c96543a069a44a5cba3b8f4db0aa7f2cd7f6800610578ba01e3b63" change
I0403 05:37:54.660383 1 csi_handler.go:398] Saved attach error to "csi-94a7031ea7c96543a069a44a5cba3b8f4db0aa7f2cd7f6800610578ba01e3b63"
I0403 05:37:54.660413 1 csi_handler.go:103] Error processing "csi-94a7031ea7c96543a069a44a5cba3b8f4db0aa7f2cd7f6800610578ba01e3b63": failed to attach: persistentvolume "pvc-7df261c8-4c50-11e9-a573-000c29616bad" not found

vsphere-csi-controller logs:

I0403 05:43:21.955579 1 datacenter.go:615] DoesFirstClassDiskExist(927ef76f-2312-4a4e-b634-8fbb13134462): NOT FOUND
E0403 05:43:21.955607 1 search.go:329] Error while looking for FCD= in vc=10.0.10.25 and datacenter=homelab: No vSphere disk ID/Name found
time="2019-04-03T05:43:21Z" level=error msg="WhichVCandDCByFCDId(927ef76f-2312-4a4e-b634-8fbb13134462) failed. Err: No vSphere disk ID/Name found"

Environment:

vsphere-cloud-controller-manager version: v0.1.1 and master
OS (e.g. from /etc/os-release): Fedora 28
Kernel (e.g. uname -a): 4.18.8-200.fc28.x86_64
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

codenrhoden · 2019-04-03T18:16:16Z

Hi @Elegant996, thanks for the report.

Just to clarify, you had an existing Deployment that was working as expected, then scaled it down to 0 and then back up?

And you noticed that when scaling to 0, the existing PV was not detached from the node? Just trying to get a clearer picture for when I try to recreate.

Elegant996 · 2019-04-03T19:47:53Z

Hi @codenrhoden,

That is correct, this happens with both my Deployments and StatefulSets when scaling. The PVC and PV are still marked as bound in kubectl.

vCenter does not appear to detach the volumes after scaling as it no longer seems to be able to determine the associated vmdk. This issue appears to occur in 90% of my pods.

One StatefulSet (replicas: 1) survived as the CSI driver was able to determine the correct vmdk but for whatever reason was unable to detach it from the node/VM despite being given administrator credentials for testing. Once the affected vmdk was forcefully detached using vCenter, the driver quickly attached it the pod's current node and mounted the volume without issue.

In another scenario with a StatefulSet (replicas: 2), only one survived while the other produced the above warnings/errors.

This will become an issue down the road when attempting to upgrade nodes as draining them will render most if not all of the pods unusable.

Thanks!

codenrhoden · 2019-04-19T16:50:25Z

This issue has been migrated to the new CSI repo.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 3, 2019

codenrhoden mentioned this issue Apr 19, 2019

Err: No vSphere disk ID/Name found kubernetes-sigs/vsphere-csi-driver#4

Closed

codenrhoden closed this as completed Apr 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Err: No vSphere disk ID/Name found #178

Err: No vSphere disk ID/Name found #178

Elegant996 commented Apr 3, 2019 •

edited

codenrhoden commented Apr 3, 2019

Elegant996 commented Apr 3, 2019 •

edited

codenrhoden commented Apr 19, 2019

Err: No vSphere disk ID/Name found #178

Err: No vSphere disk ID/Name found #178

Comments

Elegant996 commented Apr 3, 2019 • edited

codenrhoden commented Apr 3, 2019

Elegant996 commented Apr 3, 2019 • edited

codenrhoden commented Apr 19, 2019

Elegant996 commented Apr 3, 2019 •

edited

Elegant996 commented Apr 3, 2019 •

edited