Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI migration: failed to attach volume #1447

Closed
jsafrane opened this issue Dec 16, 2021 · 1 comment · Fixed by #1451
Closed

CSI migration: failed to attach volume #1447

jsafrane opened this issue Dec 16, 2021 · 1 comment · Fixed by #1451
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@jsafrane
Copy link
Contributor

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug

What happened:
A PV provisioned before CSI migration was enabled fails to be attached after the migration is enabled:

rpc error: code = Internal desc = failed to get VolumeID from volumeMigrationService for volumePath: "[WorkloadDatastore] 5137595f-7ce3-e95a-5c03-06d835dea807/jsafrane-ws95r-dynamic-pvc-e54172ad-ba68-4876-8d24-a1a1fc5c855c.vmdk"

What you expected to happen:
The pod can be used.

How to reproduce it (as minimally and precisely as possible):

  1. Dynamically provision a PV without CSI migration enabled. I got PV:

    apiVersion: v1
    kind: PersistentVolume
        metadata: [snip]
    spec:
      accessModes:
      - ReadWriteOnce
      capacity:
        storage: 500Mi
      claimRef: [snip]
      storageClassName: thin
      volumeMode: Filesystem
      vsphereVolume:
        fsType: ext4
        volumePath: '[WorkloadDatastore] 5137595f-7ce3-e95a-5c03-06d835dea807/jsafrane-ws95r-dynamic-pvc-e54172ad-ba68-4876-8d24-a1a1fc5c855c.vmdk'
  2. Enable the CSI migration (and wait for everything to restart).

  3. Use the PV in a Pod.

Anything else we need to know?:
ControllerPublish logs from the CSI driver:

{"level":"info","time":"2021-12-15T14:20:54.458672303Z","caller":"vanilla/controller.go:951","msg":"ControllerPublishVolume: called with args {VolumeId:[WorkloadDatastore] 5137595f-7ce3-e95a-5c03-06d835dea807/jsafrane-ws95r-dynamic-pvc-e54172ad-ba68-4876-8d24-a1a1fc5c855c.vmdk NodeId:jsafrane-ws95r-worker-s5qtp VolumeCapability:mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER >  Readonly:false Secrets:map[] VolumeContext:map[] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"4d3141f2-3a38-495a-826e-ff76ea4a10c9"}
{"level":"info","time":"2021-12-15T14:20:54.458742439Z","caller":"migration/migration.go:209","msg":"Could not retrieve VolumeID from cache for Volume Path: \"[WorkloadDatastore] 5137595f-7ce3-e95a-5c03-06d835dea807/jsafrane-ws95r-dynamic-pvc-e54172ad-ba68-4876-8d24-a1a1fc5c855c.vmdk\". Registering Volume with CNS","TraceId":"4d3141f2-3a38-495a-826e-ff76ea4a10c9"}
{"level":"info","time":"2021-12-15T14:20:54.459270966Z","caller":"migration/migration.go:472","msg":"Registering volume: \"[WorkloadDatastore] 5137595f-7ce3-e95a-5c03-06d835dea807/jsafrane-ws95r-dynamic-pvc-e54172ad-ba68-4876-8d24-a1a1fc5c855c.vmdk\" using backingDiskURLPath :\"https://vcenter.sddc-44-236-21-251.vmwarevmc.com/folder/5137595f-7ce3-e95a-5c03-06d835dea807/jsafrane-ws95r-dynamic-pvc-e54172ad-ba68-4876-8d24-a1a1fc5c855c.vm?dcPath=SDDC-Datacenter&dsName=WorkloadDatastore\"","TraceId":"4d3141f2-3a38-495a-826e-ff76ea4a10c9"}
{"level":"info","time":"2021-12-15T14:20:54.725811103Z","caller":"volume/manager.go:504","msg":"CreateVolume: VolumeName: \"36ae26fa-5db2-11ec-b5c5-005056ace6ee\", opId: \"15d958c0\"","TraceId":"4d3141f2-3a38-495a-826e-ff76ea4a10c9"}
{"level":"info","time":"2021-12-15T14:20:54.72587704Z","caller":"volume/util.go:364","msg":"Extract vimfault type: +types.CnsFault  vimFault: +{<nil> CNS: Failed to register disk: Fault cause: vmodl.fault.InvalidArgument\n} Fault: &{DynamicData:{} Fault:{BaseMethodFault:<nil> Reason:CNS: Failed to register disk: Fault cause: vmodl.fault.InvalidArgument\n} LocalizedMessage:CnsFault error: CNS: Failed to register disk: Fault cause: vmodl.fault.InvalidArgument\n} from resp: +&{{} {{} } 0xc000b19340}","TraceId":"4d3141f2-3a38-495a-826e-ff76ea4a10c9"}

The syncer container has very similar logs (note, this is from a different test, with a different PV!):

{"level":"warn","time":"2021-12-16T12:36:15.249750765Z","caller":"migration/migration.go:478","msg":"failed to register volume \"[WorkloadDatastore] 5137595f-7ce3-e95a-5c03-06d835dea807/jsafrane-xnrl5-dynamic-pvc-ca891f97-8576-4a31-b834-1747ddc80f59.vmdk\" with createSpec: &{{} c170cff8-5e6c-11ec-a295-005056ac9d57 BLOCK [] {{} {{} KUBERNETES jsafrane-xnrl5 LDAP.VMC.CI.OPENSHIFT.ORG\\jsafrane VANILLA } [] [{{} KUBERNETES jsafrane-xnrl5 jsafrane@ldap.vmc.ci.openshift.org VANILLA }]} 0xc000dce150 [] <nil> <nil>}. error: failed to create volume with fault: \"(*types.LocalizedMethodFault)(0xc0006853e0)({\\n DynamicData: (types.DynamicData) {\\n },\\n Fault: (types.CnsFault) {\\n  BaseMethodFault: (types.BaseMethodFault) <nil>,\\n  Reason: (string) (len=71) \\\"CNS: Failed to register disk: Fault cause: vmodl.fault.InvalidArgument\\\\n\\\"\\n },\\n LocalizedMessage: (string) (len=87) \\\"CnsFault error: CNS: Failed to register disk: Fault cause: vmodl.fault.InvalidArgument\\\\n\\\"\\n})\\n\""}

It is quite possible I have mis-configured something, but InvalidArgument error without any indication what is wrong does not help at all

Note that I am able to dynamically provision an in-tree PV after CSI migration is enabled and use that PV in a Pod without any issues. The PV looks like:

  vsphereVolume:
    fsType: ext4
    volumePath: '[WorkloadDatastore] 8b577f5f-ea23-1e5e-bac9-0693938623e5/_0002/_00d0/_0136/37d4b232633f4af79b4955aac64f9288.vmdk'

Notice _0002/_00d0 in the volume path. This is not present in in-tree PVs provisioned by in-tree volume plugin. I'm not sure it's related to the issue or not.

Environment:

  • csi-vsphere version: 2.4.0
  • vsphere-cloud-controller-manager version:
  • Kubernetes version: 1.23
  • vSphere version: 7.0.2
  • OS (e.g. from /etc/os-release): RHEL 8
  • Install tools: OpenShift
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 16, 2021
@jsafrane
Copy link
Contributor Author

Found it, strings.Trim trims too much here:

vmdkPath := strings.TrimSpace(strings.Trim(volumeSpec.VolumePath, datastoreFullPath))

Trim takes its second argument as set of characters, so if my datastore name is [WorkloadDatastore], it will trim any character from that name from the end of VolumePath. I.e. from path ending with .vmdk it cuts d and k, and file with .vm suffix is then missing on the CNS side.

BTW, returning InvalidArgument without any description about what is wrong is bad - can you fix the CNS side of things? And unit tests would help a lot too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants