CSI migration: failed to attach volume #1447

jsafrane · 2021-12-16T12:57:59Z

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug

What happened:
A PV provisioned before CSI migration was enabled fails to be attached after the migration is enabled:

rpc error: code = Internal desc = failed to get VolumeID from volumeMigrationService for volumePath: "[WorkloadDatastore] 5137595f-7ce3-e95a-5c03-06d835dea807/jsafrane-ws95r-dynamic-pvc-e54172ad-ba68-4876-8d24-a1a1fc5c855c.vmdk"

What you expected to happen:
The pod can be used.

How to reproduce it (as minimally and precisely as possible):

Dynamically provision a PV without CSI migration enabled. I got PV:

apiVersion: v1
kind: PersistentVolume
    metadata: [snip]
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 500Mi
  claimRef: [snip]
  storageClassName: thin
  volumeMode: Filesystem
  vsphereVolume:
    fsType: ext4
    volumePath: '[WorkloadDatastore] 5137595f-7ce3-e95a-5c03-06d835dea807/jsafrane-ws95r-dynamic-pvc-e54172ad-ba68-4876-8d24-a1a1fc5c855c.vmdk'

Enable the CSI migration (and wait for everything to restart).
Use the PV in a Pod.

Anything else we need to know?:
ControllerPublish logs from the CSI driver:

{"level":"info","time":"2021-12-15T14:20:54.458672303Z","caller":"vanilla/controller.go:951","msg":"ControllerPublishVolume: called with args {VolumeId:[WorkloadDatastore] 5137595f-7ce3-e95a-5c03-06d835dea807/jsafrane-ws95r-dynamic-pvc-e54172ad-ba68-4876-8d24-a1a1fc5c855c.vmdk NodeId:jsafrane-ws95r-worker-s5qtp VolumeCapability:mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER >  Readonly:false Secrets:map[] VolumeContext:map[] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"4d3141f2-3a38-495a-826e-ff76ea4a10c9"}
{"level":"info","time":"2021-12-15T14:20:54.458742439Z","caller":"migration/migration.go:209","msg":"Could not retrieve VolumeID from cache for Volume Path: \"[WorkloadDatastore] 5137595f-7ce3-e95a-5c03-06d835dea807/jsafrane-ws95r-dynamic-pvc-e54172ad-ba68-4876-8d24-a1a1fc5c855c.vmdk\". Registering Volume with CNS","TraceId":"4d3141f2-3a38-495a-826e-ff76ea4a10c9"}
{"level":"info","time":"2021-12-15T14:20:54.459270966Z","caller":"migration/migration.go:472","msg":"Registering volume: \"[WorkloadDatastore] 5137595f-7ce3-e95a-5c03-06d835dea807/jsafrane-ws95r-dynamic-pvc-e54172ad-ba68-4876-8d24-a1a1fc5c855c.vmdk\" using backingDiskURLPath :\"https://vcenter.sddc-44-236-21-251.vmwarevmc.com/folder/5137595f-7ce3-e95a-5c03-06d835dea807/jsafrane-ws95r-dynamic-pvc-e54172ad-ba68-4876-8d24-a1a1fc5c855c.vm?dcPath=SDDC-Datacenter&dsName=WorkloadDatastore\"","TraceId":"4d3141f2-3a38-495a-826e-ff76ea4a10c9"}
{"level":"info","time":"2021-12-15T14:20:54.725811103Z","caller":"volume/manager.go:504","msg":"CreateVolume: VolumeName: \"36ae26fa-5db2-11ec-b5c5-005056ace6ee\", opId: \"15d958c0\"","TraceId":"4d3141f2-3a38-495a-826e-ff76ea4a10c9"}
{"level":"info","time":"2021-12-15T14:20:54.72587704Z","caller":"volume/util.go:364","msg":"Extract vimfault type: +types.CnsFault  vimFault: +{<nil> CNS: Failed to register disk: Fault cause: vmodl.fault.InvalidArgument\n} Fault: &{DynamicData:{} Fault:{BaseMethodFault:<nil> Reason:CNS: Failed to register disk: Fault cause: vmodl.fault.InvalidArgument\n} LocalizedMessage:CnsFault error: CNS: Failed to register disk: Fault cause: vmodl.fault.InvalidArgument\n} from resp: +&{{} {{} } 0xc000b19340}","TraceId":"4d3141f2-3a38-495a-826e-ff76ea4a10c9"}

The syncer container has very similar logs (note, this is from a different test, with a different PV!):

{"level":"warn","time":"2021-12-16T12:36:15.249750765Z","caller":"migration/migration.go:478","msg":"failed to register volume \"[WorkloadDatastore] 5137595f-7ce3-e95a-5c03-06d835dea807/jsafrane-xnrl5-dynamic-pvc-ca891f97-8576-4a31-b834-1747ddc80f59.vmdk\" with createSpec: &{{} c170cff8-5e6c-11ec-a295-005056ac9d57 BLOCK [] {{} {{} KUBERNETES jsafrane-xnrl5 LDAP.VMC.CI.OPENSHIFT.ORG\\jsafrane VANILLA } [] [{{} KUBERNETES jsafrane-xnrl5 jsafrane@ldap.vmc.ci.openshift.org VANILLA }]} 0xc000dce150 [] <nil> <nil>}. error: failed to create volume with fault: \"(*types.LocalizedMethodFault)(0xc0006853e0)({\\n DynamicData: (types.DynamicData) {\\n },\\n Fault: (types.CnsFault) {\\n  BaseMethodFault: (types.BaseMethodFault) <nil>,\\n  Reason: (string) (len=71) \\\"CNS: Failed to register disk: Fault cause: vmodl.fault.InvalidArgument\\\\n\\\"\\n },\\n LocalizedMessage: (string) (len=87) \\\"CnsFault error: CNS: Failed to register disk: Fault cause: vmodl.fault.InvalidArgument\\\\n\\\"\\n})\\n\""}

It is quite possible I have mis-configured something, but InvalidArgument error without any indication what is wrong does not help at all

Note that I am able to dynamically provision an in-tree PV after CSI migration is enabled and use that PV in a Pod without any issues. The PV looks like:

  vsphereVolume:
    fsType: ext4
    volumePath: '[WorkloadDatastore] 8b577f5f-ea23-1e5e-bac9-0693938623e5/_0002/_00d0/_0136/37d4b232633f4af79b4955aac64f9288.vmdk'

Notice _0002/_00d0 in the volume path. This is not present in in-tree PVs provisioned by in-tree volume plugin. I'm not sure it's related to the issue or not.

Environment:

csi-vsphere version: 2.4.0
vsphere-cloud-controller-manager version:
Kubernetes version: 1.23
vSphere version: 7.0.2
OS (e.g. from /etc/os-release): RHEL 8
Install tools: OpenShift

The text was updated successfully, but these errors were encountered:

jsafrane · 2021-12-16T18:35:15Z

Found it, strings.Trim trims too much here:

vsphere-csi-driver/pkg/apis/migration/migration.go

Line 391 in 6252e16

    
           vmdkPath := strings.TrimSpace(strings.Trim(volumeSpec.VolumePath, datastoreFullPath))

Trim takes its second argument as set of characters, so if my datastore name is [WorkloadDatastore], it will trim any character from that name from the end of VolumePath. I.e. from path ending with .vmdk it cuts d and k, and file with .vm suffix is then missing on the CNS side.

BTW, returning InvalidArgument without any description about what is wrong is bad - can you fix the CNS side of things? And unit tests would help a lot too.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 16, 2021

jsafrane mentioned this issue Dec 16, 2021

Fix parsing of volume path #1451

Merged

divyenpatel added this to the vSphere CSI Migration milestone Dec 18, 2021

k8s-ci-robot closed this as completed in #1451 Dec 18, 2021

bertinatto mentioned this issue Jan 26, 2022

Bug 2031045: Update to upstream v2.4.1 openshift/vmware-vsphere-csi-driver#30

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSI migration: failed to attach volume #1447

CSI migration: failed to attach volume #1447

jsafrane commented Dec 16, 2021

jsafrane commented Dec 16, 2021

CSI migration: failed to attach volume #1447

CSI migration: failed to attach volume #1447

Comments

jsafrane commented Dec 16, 2021

jsafrane commented Dec 16, 2021