Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to mount volumes on custom hardened and node driver vSphere cluster using CSI charts. #35173

Closed
vivek-shilimkar opened this issue Oct 19, 2021 · 14 comments
Assignees
Labels
area/vsphere feature/charts-vsphere kind/bug Issues that are defects reported by users or that we know have reached a real release priority/1 team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support team/infracloud

Comments

@vivek-shilimkar
Copy link
Member

vivek-shilimkar commented Oct 19, 2021

Rancher Server Setup

  • Rancher version: v2.6.1
  • Installation option (Helm Chart):

Information about the Cluster

  • Kubernetes version: v1.18.20
  • Cluster Type (Local/Downstream): Downstream
    • Custom hardened cluster. Used this RKE template for provisioning, with cloud_provider set to external.

Describe the bug

  • Redis/Cockroachdb pods fails to become active/running.
  • CSI is able to provision the PVs according to PVC specs, but the PVs fail to mount to vSphere cluster nodes
  • Recent Events of Pods shows following error.
    Unable to attach or mount volumes: unmounted volumes=[vol0], unattached volumes=[vol0 default-token-87m75]: timed out waiting for the condition
  • Log of vsphere-csi-node in kube-system namespace also shows an error. Log is attached in a file.
    vsphere-csi-node.log

Steps To Reproduce

  • Provision a vSphere hardened cluster v1.18.20 with RKE v1.3.1 on ESXi7+ env.
  • Install Rancher v2.6.1 on it using helm.
  • Follow hardening guide to provision a downstream vSphere hardened custom cluster v1.18.20 with this RKE template with cloud_provider set to external.
  • Install vSphere CPI and CSI charts.
  • Install a cocroachdb and/or Redis pod with storage set to 'Create Persistent Volume Claim'.

Expected Result

Redis/Cockroachdb pods become active in Rancher UI with persistent volume attached to it.

@vivek-shilimkar vivek-shilimkar added this to the v2.6.2 milestone Oct 19, 2021
@samkulkarni20 samkulkarni20 changed the title Unable to attach or mount volumes on custom hardened cluster using CSI charts. Unable to mount volumes on custom vSphere hardened cluster using CSI charts. Oct 19, 2021
@sowmyav27 sowmyav27 modified the milestones: v2.6.2, v2.6.3 Oct 19, 2021
@sowmyav27 sowmyav27 added kind/bug Issues that are defects reported by users or that we know have reached a real release area/vsphere labels Oct 19, 2021
@deniseschannon deniseschannon added release-note Note this issue in the milestone's release notes feature/charts-vsphere [zube]: To Triage and removed [zube]: Next Up labels Nov 19, 2021
@deniseschannon deniseschannon modified the milestones: v2.6.3, v2.6.4 Nov 22, 2021
@MKlimuszka MKlimuszka modified the milestones: v2.6.4, v2.6.x Dec 2, 2021
@deniseschannon deniseschannon removed this from the v2.6.x milestone Dec 5, 2021
@SheilaghM SheilaghM modified the milestone: v2.6.4 Jan 11, 2022
@doflamingo721 doflamingo721 self-assigned this Jan 14, 2022
@vivek-shilimkar
Copy link
Member Author

vivek-shilimkar commented Jan 24, 2023

@J-tt Okay. As per the latest analysis as mentioned here. Node driver cluster is working fine. Issue is active only on hardened cluster.

@snasovich
Copy link
Collaborator

@vivek-shilimkar , any chance you could retest it on 2.7.5 since it's been a while since the last update?
FYI @daviswill2 @Sahota1225

@vivek-shilimkar
Copy link
Member Author

vivek-shilimkar commented Jul 17, 2023

@snasovich @Sahota1225 @daviswill2
Re-tested the volume mount issue on custom hardened cluster, the issue is still active on rancher v2.7.5. I tested two combinations.

  • k8s v1.23.16 :

    • CPI version : 102.1.0+up1.5.1
    • CSI version : 102.0.0+up2.6.2
  • k8s v1.26.6 :

    • CPI version : 102.1.0+up1.5.1
    • CSI version : 102.0.0+up3.0.1

Issue is still active.

@jiaqiluo
Copy link
Member

According to the CSI documentation, the disk.EnableUUID parameter must be enabled on each node(VM) participating in the Kubernetes cluster with vSphere.

We can see that the disk.EnableUUID=true is set in the node template created via Rancher UI.

Screenshot 2023-07-18 at 11 41 15 AM
nodeTemplate - Vsphere node
apiVersion: management.cattle.io/v3
kind: NodeTemplate
metadata:
  annotations:
    field.cattle.io/creatorId: u-pxem7w2jtj
    ownerBindingsCreated: "true"
  creationTimestamp: "2023-07-18T18:43:24Z"
  generateName: nt-
  generation: 1
  labels:
    cattle.io/creator: norman
  name: nt-b9h97
  namespace: cattle-global-nt
  resourceVersion: "22882658"
  uid: d64a3c95-95f7-4d96-a519-1bb2ebdbe728
spec:
  cloudCredentialName: cattle-global-data:cc-xrzhm
  displayName: jack-test-vsphere-1
  driver: vmwarevsphere
  engineInstallURL: https://releases.rancher.com/install-docker/24.0.sh
  engineRegistryMirror: []
  useInternalIpAddress: true
vmwarevsphereConfig:
  boot2dockerUrl: ""
  cfgparam:
  - disk.enableUUID=TRUE
  cloneFrom: ""
  cloudConfig: '#cloud-config'
  cloudinit: ""
  contentLibrary: ""
  cpuCount: "2"
  creationType: template
  customAttribute: []
  datacenter: ""
  datastore: ""
  datastoreCluster: ""
  diskSize: "20000"
  folder: ""
  hostsystem: ""
  memorySize: "2048"
  network: []
  os: linux
  pool: ""
  sshPassword: tcuser
  sshPort: "22"
  sshUser: docker
  sshUserGroup: staff
  tag: []
  vappIpallocationpolicy: ""
  vappIpprotocol: ""
  vappProperty: []
  vappTransport: ""

@vivek-shilimkar can you confirm if the disk.EnableUUID parameter is set on the VMs when you tested custom node clusters, and if not can you give it a try and share the findings?

@vivek-shilimkar
Copy link
Member Author

@jiaqiluo Thanks for pointing out.
if the disk.EnableUUID=TRUE parameter is set on the VMs the volume mounts successfully. Hence closing this issue.

@snasovich
Copy link
Collaborator

Removing the milestone as there was not a fix applied - just misunderstanding of test setup.

@snasovich snasovich removed this from the 2023-Q3-v2.7x milestone Jul 19, 2023
@snasovich snasovich removed the release-note Note this issue in the milestone's release notes label Jul 19, 2023
@zube zube bot removed the [zube]: Done label Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/vsphere feature/charts-vsphere kind/bug Issues that are defects reported by users or that we know have reached a real release priority/1 team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support team/infracloud
Projects
None yet
Development

No branches or pull requests