Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Err: No vSphere disk ID/Name found #178

Closed
Elegant996 opened this issue Apr 3, 2019 · 3 comments
Closed

Err: No vSphere disk ID/Name found #178

Elegant996 opened this issue Apr 3, 2019 · 3 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@Elegant996
Copy link

Elegant996 commented Apr 3, 2019

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:
Volume failed to attach to pod after it was restarted.

What you expected to happen:
Volume to successfully reattach as the PersistentVolumeClaim and PersistentVolume still exist as well as the vmdk.

How to reproduce it (as minimally and precisely as possible):
After a grace period (a day or two?), scale the replicas of Deployment/StatefulSet to 0 that has a storage volume attached through the CSI driver. Set the scale back to its original value and CSI driver should fail to attach the volume on the new pod(s):

Warning FailedAttachVolume 5s (x4 over 6m12s) attachdetach-controller AttachVolume.Attach failed for volume "pvc-2e07359a-4f90-11e9-a939-000c29616bad" : rpc error: code = Internal desc = WhichVCandDCByFCDId(927ef76f-2312-4a4e-b634-8fbb13134462) failed. Err: No vSphere disk ID/Name found

Please note that this is independent of the node that the pod previously resided on. The disk does not appear to unmount from the node in vCenter either which seems problematic.

Anything else we need to know?:
csi-attacher logs:

I0403 05:37:54.655419 1 controller.go:173] Started VA processing "csi-94a7031ea7c96543a069a44a5cba3b8f4db0aa7f2cd7f6800610578ba01e3b63"
I0403 05:37:54.655541 1 csi_handler.go:93] CSIHandler: processing VA "csi-94a7031ea7c96543a069a44a5cba3b8f4db0aa7f2cd7f6800610578ba01e3b63"
I0403 05:37:54.655565 1 csi_handler.go:120] Attaching "csi-94a7031ea7c96543a069a44a5cba3b8f4db0aa7f2cd7f6800610578ba01e3b63"
I0403 05:37:54.655915 1 csi_handler.go:259] Starting attach operation for "csi-94a7031ea7c96543a069a44a5cba3b8f4db0aa7f2cd7f6800610578ba01e3b63"
I0403 05:37:54.656107 1 csi_handler.go:388] Saving attach error to "csi-94a7031ea7c96543a069a44a5cba3b8f4db0aa7f2cd7f6800610578ba01e3b63"
I0403 05:37:54.660047 1 controller.go:139] Ignoring VolumeAttachment "csi-94a7031ea7c96543a069a44a5cba3b8f4db0aa7f2cd7f6800610578ba01e3b63" change
I0403 05:37:54.660383 1 csi_handler.go:398] Saved attach error to "csi-94a7031ea7c96543a069a44a5cba3b8f4db0aa7f2cd7f6800610578ba01e3b63"
I0403 05:37:54.660413 1 csi_handler.go:103] Error processing "csi-94a7031ea7c96543a069a44a5cba3b8f4db0aa7f2cd7f6800610578ba01e3b63": failed to attach: persistentvolume "pvc-7df261c8-4c50-11e9-a573-000c29616bad" not found

vsphere-csi-controller logs:

I0403 05:43:21.955579 1 datacenter.go:615] DoesFirstClassDiskExist(927ef76f-2312-4a4e-b634-8fbb13134462): NOT FOUND
E0403 05:43:21.955607 1 search.go:329] Error while looking for FCD= in vc=10.0.10.25 and datacenter=homelab: No vSphere disk ID/Name found
time="2019-04-03T05:43:21Z" level=error msg="WhichVCandDCByFCDId(927ef76f-2312-4a4e-b634-8fbb13134462) failed. Err: No vSphere disk ID/Name found"

Environment:

  • vsphere-cloud-controller-manager version: v0.1.1 and master
  • OS (e.g. from /etc/os-release): Fedora 28
  • Kernel (e.g. uname -a): 4.18.8-200.fc28.x86_64
  • Install tools:
  • Others:
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 3, 2019
@codenrhoden
Copy link
Contributor

Hi @Elegant996, thanks for the report.

Just to clarify, you had an existing Deployment that was working as expected, then scaled it down to 0 and then back up?

And you noticed that when scaling to 0, the existing PV was not detached from the node? Just trying to get a clearer picture for when I try to recreate.

@Elegant996
Copy link
Author

Elegant996 commented Apr 3, 2019

Hi @codenrhoden,

That is correct, this happens with both my Deployments and StatefulSets when scaling. The PVC and PV are still marked as bound in kubectl.

vCenter does not appear to detach the volumes after scaling as it no longer seems to be able to determine the associated vmdk. This issue appears to occur in 90% of my pods.

One StatefulSet (replicas: 1) survived as the CSI driver was able to determine the correct vmdk but for whatever reason was unable to detach it from the node/VM despite being given administrator credentials for testing. Once the affected vmdk was forcefully detached using vCenter, the driver quickly attached it the pod's current node and mounted the volume without issue.

In another scenario with a StatefulSet (replicas: 2), only one survived while the other produced the above warnings/errors.

This will become an issue down the road when attempting to upgrade nodes as draining them will render most if not all of the pods unusable.

Thanks!

@codenrhoden
Copy link
Contributor

This issue has been migrated to the new CSI repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants