Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for out-of-tree vSphere Cloud Provider Interface (CPI) and Cloud Storage Interface (CSI) #23357

Closed
dnoland1 opened this issue Oct 10, 2019 · 25 comments
Assignees
Labels
area/storage internal kind/feature Issues that represent larger new pieces of functionality, not enhancements to existing functionality
Milestone

Comments

@dnoland1
Copy link
Contributor

dnoland1 commented Oct 10, 2019

What kind of request is this (question/bug/enhancement/feature request):
Feature request

Description:
There is a new out-of-tree vSphere Cloud Provider Interface (CPI), see https://cloud-provider-vsphere.sigs.k8s.io/ . In the k8s 1.20 timeframe, the in-tree cloud providers will be removed and everyone will need to use out-of-tree cloud providers.

The Rancher UI could be enhanced to allow users to select "vSphere" as a Cloud Provider option when creating a custom cluster. User could enter configuration information for vSphere, such as virtual center IP, port, username, password, network, etc. When deploying the cluster, Rancher would automatically create the ConfigMap, Service, DaemonSet, etc. needed for the vsphere-cloud-controller-manager workload.

Support for the vSphere Cloud Storage Interface (CSI) would be great to include as part of this feature request.

This issue may also be relevant: #20131

gz#6513
gz#9676
gz#12549
gz#12592
gz#14500

@dnoland1 dnoland1 changed the title Support for out-of-tree vSphere Cloud Provider Support for out-of-tree vSphere Cloud Provider Interface (CPI) and Cloud Storage Interface (CSI) Oct 17, 2019
@terafirmanz
Copy link

As part of the move to the cloud controller manager could rancher be extended to run cloud controller manager as part of its own deployment in tree. All k8's clusters would then talk to rancher, rancher would then proxy the requests to the cloud provider.

This would help with clusters that don't have direct connectivity to the cloud provider. It would also make cluster deployment simpler as rancher can auto configure the cluster to point to itself and use stored cloud credential to proxy any requests.

@axeal axeal added area/storage internal kind/feature Issues that represent larger new pieces of functionality, not enhancements to existing functionality labels Oct 28, 2019
@Tejeev
Copy link

Tejeev commented Oct 1, 2020

Our customer would find the following features useful, and it appears they are not available in the in tree providers:

  • VM independent volumes
  • provisioning form multiple datastores

I couldn't pin down a date or version for the CPI, but it looks like both will phase out around k8s v1.21? This may not be tomorrow, but the feeling is that we'll blink and it'll be around the corner. It would be good to have an idea when support for this will be in Rancher so customers can have some confidence that they'll have time to shake it down in the field before it's required.

@Tejeev
Copy link

Tejeev commented Oct 8, 2020

@cloudnautique Looks like the (other) customer we spoke of is on Openstack and needs CPI and CSI as well.

@dnoland1
Copy link
Contributor Author

This could be useful for those looking for a community contributed Helm chart to deploy vSphere CSI/CPI - https://github.com/stefanvangastel/vsphere-cpi-csi-helm

@davidnuzik
Copy link
Contributor

Set milestone, assigned, etc as per Denise.

@Tejeev
Copy link

Tejeev commented Dec 15, 2020

I just wanted to make sure it wasn't lost that I was told to use this issue to note the other providers that are needed, rather than new issues. It is for that reason that I noted OpenStack CPI and CSI here. @cloudnautique please let me know if this changes and we need to open more issues.

@Tejeev Tejeev changed the title Support for out-of-tree vSphere Cloud Provider Interface (CPI) and Cloud Storage Interface (CSI) Support for out-of-tree vSphere and OpenStack Cloud Provider Interface (CPI) and Cloud Storage Interface (CSI) Dec 15, 2020
@sowmyav27 sowmyav27 self-assigned this Jan 21, 2021
@Tejeev
Copy link

Tejeev commented Feb 4, 2021

@mrajashree
Copy link
Contributor

Migration can work using the steps from the doc linked above and the chart we're adding in rancher catalog. But due to an existing bug in vsphere CSI driver, it will only work for volumes provisioned using a certain cloud-config format, this issue explains the bug in detail: kubernetes-sigs/vsphere-csi-driver#628

Rancher issue to track it: #31105

@mitchellmaler
Copy link

@deniseschannon
One thing we have noticed using the "external" cloud provider is that it taints all nodes with the "node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule". Trying to deploy the CSI app/chart using Continuous Delivery fails to start since the fleet agent is stuck Pending after the cluster is brought up. For net new clusters that want to add the CSI App the fleet agent will need tolerations along with launched jobs else this app cannot be installed.

@bmdepesa
Copy link
Member

@mitchellmaler can you provide some details on what your workflow looks like from the perspective of creating the cluster and deploying the external cloud provider charts? Are you using fleet to deploy the cpi/csi apps?

@mitchellmaler
Copy link

We use the rancher terraform provider to bring up the vsphere rancher provisioned cluster with the rke config set cloud provider to external which starts the clean cluster with all the nodes tainted. After that we deploy the vsphere cpi/csi first (using manifests or helm cli) so it would register and untaint the nodes. After that all other addons are done using apps v2 (fleet agent helm operation) since it can be scheduled. It would be great to be able to just deploy csi/cpi using the apps v2 as well but since the nodes are all tainted before they can be deployed it causes some workflow issues since fleet does not tolerate those. I haven't thought about using the old apps to deploy this which might solve the problem in our workflow as a temporary workaound if it can run tolerating the taints (will have to give this a try).

A few others potential issues from the pr rancher/helm3-charts#53

  1. the resizer container is added by default to csi deployment without a way to disable it. This is only supported on vsphere 7+ and there needs to be a way to toggle it off. (https://github.com/cormachogan/vsphere-csi-helmchart/blob/master/charts/vsphere-csi/values.yaml#L78)
  2. I am not sure if the csi node volumes are going to cause an issue since the point to host path "/var/lib/kubelet" where RKE nodes use the path "/opt/rke/var/lib/kubelet"

@mitchellmaler
Copy link

mitchellmaler commented Feb 18, 2021

I guess for 2 that depends on the OS you use such as CoreOS variant or RancherOS. Really the prefix path should be something that can be added to the hostpath values.

@bmdepesa
Copy link
Member

bmdepesa commented Mar 2, 2021

rancher/rancher:v2.5.6-rc6
rancher/rancher:master-head 465bb3b

Tested fresh installs, migration, and volume expansion on a vSphere 7.0 environment:

Fresh install:

  • Set cluster's cloud provider to External
  • Nodes are tainted with uninitialized
  • Install vSphere CPI chart
  • kubectl describe nodes | grep "ProviderID" shows the provider IDs correctly and the above taint is removed
  • Install vSphere CSI chart
  • Use the created storage class to create a PV/PVC and attach to a workload
  • Volume is provisioned successfully

Volume Expansion:

  • Follow the above steps first
  • Remove the workload using the PV (leaving the PV in-place)
  • Edit the PV to have a larger size
  • See the size reflected in the vSphere volume (Cloud Native Storage)
  • Attach the PV to a workload
  • Data persists and the new size is reflected

Migration:

  • Create a cluster using the in-tree cloud provider
  • Provision a volume using a storage class, attach to a workload
  • Create some data in the volume
  • Follow the steps from the docs PR: vSphere out-of-tree CPI+CSI docs docs#3000
    • Taint the nodes with the uninitialized taint
    • Install the CPI chart, check for ProviderID as in fresh install case, and install CSI chart
    • Edit the cluster to add the feature-gates to kubelet and kube-controller and set drain to true with force and delete local data; Save the updates
    • Cluster is updated and nodes are drained
  • When the cluster becomes active, check for migrations with: kubectl get cnsvspherevolumemigrations
  • Check that the PV and PVC have the pv.kubernetes.io/migrated-to: csi.vsphere.vmware.com annotation
  • Check the volume attached to the workload and see the data persist

Our QA vSphere 6.7 environment is not at 6.7U3 and as such does not support the out of tree cloud provider, so I was not able to validate the out of tree cloud provider on that environment, however I was able to see the issue mentioned above:

the resizer container is added by default to csi deployment without a way to disable it. This is only supported on vsphere 7+ and there needs to be a way to toggle it off. (https://github.com/cormachogan/vsphere-csi-helmchart/blob/master/charts/vsphere-csi/values.yaml#L78)

I have logged an issue for this here: #31550

@mitchellmaler
Copy link

@bmdepesa how will it deal with the host paths when RKE created nodes that don’t use the standard /var/lib/kubelet paths and have a prefix set?

It is hard coded in the templates paths to use that host path https://github.com/rancher/helm3-charts/blob/429ac83cdb31a87be2d434ac463148cbe9988bc2/charts/vsphere-csi/v2.1.0/templates/vsphere-csi-node-ds.yaml#L122

Flatcar, CoreOS, RancherOS, etc. all use the /opt/rke/var/lib/kubelet paths instead of the standard paths. https://github.com/rancher/rke/blob/master/hosts/hosts.go#L60

The chart should have a way to provide a prefix to all the host paths for those OS types or just for users who set a prefix in their RKE config.

@bmdepesa
Copy link
Member

bmdepesa commented Mar 3, 2021

Thanks @mitchellmaler

I was able to confirm this behavior on RancherOS where the csi-node workload is failing to deploy due to invalid mount paths. We will be making a change to the chart to support adding the prefix.

@bmdepesa
Copy link
Member

bmdepesa commented Mar 4, 2021

rancher/rancher:v2.5.6-rc7
rancher-vsphere-cpi:1.0.000-rc01
rancher-vsphere-csi:2.1.000-rc01

We've moved the charts to the cluster explorer feature charts so they will be bundled in airgap installs, and mirrored all images.

As part of the chart refactoring we exposed the prefix path to /var/lib/kubelet.

  • Deployed a RancherOS cluster in vSphere with cloud provider set to External
  • Deployed vsphere-cpi chart
  • Deployed vsphere-csi chart with prefix path set to /opt/rke
  • Both charts deploy successfully
  • Attached a vsphere-csi provisioned volume

rancher/docs#3090

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/storage internal kind/feature Issues that represent larger new pieces of functionality, not enhancements to existing functionality
Projects
None yet
Development

No branches or pull requests