Install CSI driver by default in preparation of CSI Migration #10777

Jiawei0227 · 2021-02-09T17:44:22Z

1. Describe IN DETAIL the feature/behavior/change you would like to see.

CSI Migration is a Kubernetes feature that when turn on, it will redirect in-tree plugin traffic to the corresponding CSI driver. It has been Beta in k8s since v1.17 without turning on by default.

Recently, we decide to push this feature forward and it will be turn on by default in v1.22 for a lot of plugins according to our plan.

It would be good if kOps can prepare for this upcoming change. Specifically, kOps should deploy the corresponding CSI drivers by default for the corresponding cloud. The driver is a requisite for CSI migration to work.

AWS - AWS EBS CSI Driver
GCP - GCE PD CSI Driver
Azure - Azuredisk/Azurefile CSI Driver

2. Background

/kind feature
/cc @msau42

olemarkus · 2021-02-09T18:28:47Z

Can you clarify which feature you refer to? There are

CSIMigrationAWS, which will use the CSI driver if installed, but fallback otherwise. Hence does not strictly need the driver. Thi sis the flag that is currently in beta, but disabled
CSIMigrationAWSComplete, which strictly requires the driver and is currently in alpha.

We have support for the AWS CSI driver, where we will set the latter flag This is also used by the aws-cloud-provider e2e testing. I doubt the we will enabled the addon in 1.21 though. We recently discussed this and decided not to. We may extend our e2e grid to include testing with this enabled though. kOps 1.21 is many months away so there should be ample time reconsider later.

Azure cloud support is still being developed. Since Azure is alpha it may be we just go directly to using the CSI driver. I think that makes sense.

GCP does not support CSI yet. Unclear when that will be added.

Jiawei0227 · 2021-02-10T08:36:28Z

I am talking about CSIMigrationAWS/CSIMigrationGCE/CSIMigrationAzure.

which will use the CSI driver if installed, but fallback otherwise

This might not be true, if CSIMigrationAWS is enabled on both kube-controller-manager and kubelet, then CSI driver is a requirement. Otherwise the in-tree plugin will not work. This flag is currently in beta but not on by default. We are planning to turn it on by default in 1.22

CSIMigrationAWSComplete, which strictly requires the driver and is currently in alpha.

CSIMigrationAWSComplete has been replaced with new flag called InTreePluginAWSUnregister. This new flag InTreePluginAWSUnregister will serve the same purpose as CSIMigrationAWSComplete except that it is decoupled from migration. So if you do not want to support in-tree at all. You can turn it on without migration. This flag is still in alpha and remain in alpha.

We have support for the AWS CSI driver, where we will set the latter flag

The CSIMigrationAWSComplete flag, will only work when CSIMigrationAWS is turned on. Otherwise it is a noop. Does that mean the current e2e testing enable both the flag? If so, it seems good then. I think just need to make sure the csi driver is installed for AWS based cluster after 1.21. Then we are good.

Azure cloud support is still being developed. Since Azure is alpha it may be we just go directly to using the CSI driver. I think that makes sense.

Sounds right to me!

GCP does not support CSI yet. Unclear when that will be added.

Is there anyone working on GCP? I think it would be good to let them know that it should install GCE PD CSI Driver

olemarkus · 2021-02-10T08:45:40Z

Thanks for the clarifications. No only CSIMigrationAWSComplete was set, so I guess this is noop then. We will add CSIMigrationAWS as well if the addon is installed. I think turning it on for 1.22 by default should be fine.

We'll bring this up in our office hours and decide then what to do about the other cloud providers.

Jiawei0227 · 2021-02-10T08:56:21Z

No only CSIMigrationAWSComplete was set

Yep, this is exactly why we replaced CSIMigrationAWSComplete to InTreePluginAWSUnregister. Now you only need to enable InTreePluginAWSUnregister and the in-tree plugin will just not be registered so you do not need to support in-tree at all. And installing the aws ebs driver to use CSI directly should be good then.

olemarkus · 2021-02-10T09:06:17Z

InTreePluginAWSUnregister will be introduced in k8s 1.21? So we still need to go with CSIMigrationAWSComplete until then if we want to unregister the in-tree plugin

Jiawei0227 · 2021-02-10T18:05:13Z

InTreePluginAWSUnregister will be introduced in 1.21

johngmyers · 2021-04-09T16:16:58Z

Per office hours, will block 1.21

johngmyers · 2021-05-21T17:46:43Z

Per kOps office hours, remaining work does not block 1.21

olemarkus · 2021-05-26T04:47:29Z

@Jiawei0227 I was looking for an update on CSI migration, but it looks like many of the linked issues have gone stale. Is this still targeting 1.22?

Jiawei0227 · 2021-05-26T20:06:28Z

AWS EBS, GCE PD and Azuredisk is still targeting turning on by default in 1.22

olemarkus · 2021-05-31T07:34:52Z

@Jiawei0227

After migrating to using the AWS EBS CSI driver by default we see this test consistently flaking:
[sig-storage] Volume limits should verify that all nodes have volume limits

https://kubernetes.io/docs/concepts/storage/storage-limits/#dynamic-volume-limits hints at that volume limits are now in a different location (CSINode instead of Node), but I would assume the tests were aware of this.

It is unclear if this is a bug with kOps, the test, or the CSI driver. Would you know anything about this?

Jiawei0227 · 2021-06-01T18:19:02Z

To support VolumeLimite, the AWS CSI driver need to advertise this in their NodeGetInfoResponse call.

https://kubernetes-csi.github.io/docs/volume-limits.html

I am not very familiar with ebs driver. @wongma7 do you have any insights when running the migration tests?
/cc @msau42

johngmyers · 2021-08-28T17:59:27Z

I don't think GCP and Azure block 1.22.

Should we split them out into separate tickets and close this?

olemarkus · 2021-08-29T09:27:41Z

I suggest we just remove from milestone and use this one as an umbrella issue.

olemarkus · 2021-09-30T07:01:01Z

kubernetes/kubernetes#104670 Azure migration is now being enabled default.

k8s-triage-robot · 2021-12-30T07:43:48Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-01-29T08:10:00Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

olemarkus · 2022-01-29T09:07:19Z

/remove-lifecycle rotten

cmotta2016 · 2022-02-03T17:05:15Z

Hey guys, help here.
We're experiecing some bug after install CSI. #13197
Can you help us?

k8s-triage-robot · 2022-06-04T08:47:50Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-07-04T08:56:11Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-08-03T09:55:07Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-08-03T09:55:23Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

olemarkus added kind/feature Categorizes issue or PR as related to a new feature. kind/office-hours labels Feb 10, 2021

This was referenced Feb 11, 2021

Remaining work for Azure support #10412

Closed

Enable CSIMigrationAWS if CSI EBS driver is installed #10791

Merged

olemarkus mentioned this issue Feb 18, 2021

Fixes for 1.21 e2e tests #10879

Merged

justinsb self-assigned this Apr 9, 2021

justinsb added this to the v1.21 milestone Apr 9, 2021

johngmyers removed the kind/office-hours label Apr 9, 2021

rifelpet added the blocks-next label Apr 9, 2021

johngmyers removed the blocks-next label May 21, 2021

johngmyers modified the milestones: v1.21, v1.22 May 27, 2021

olemarkus mentioned this issue Jun 2, 2021

Kubernetes e2e suite.[sig-storage] Volume limits should verify that all nodes have volume limits kubernetes-sigs/aws-ebs-csi-driver#915

Closed

ericgraf mentioned this issue Aug 3, 2021

Removed EnableCSIMigrationAWSComplete feature gate flag giantswarm/k8scloudconfig#928

Merged

1 task

neolit123 mentioned this issue Sep 8, 2021

Install CSI driver by default in preparation of CSI Migration kubernetes-sigs/cluster-api#4166

Closed

olemarkus modified the milestones: v1.22, v1.23 Oct 1, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 30, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 29, 2022

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 29, 2022

olemarkus removed this from the v1.23 milestone Mar 6, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 4, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 4, 2022

k8s-ci-robot closed this as completed Aug 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Install CSI driver by default in preparation of CSI Migration #10777

Install CSI driver by default in preparation of CSI Migration #10777

Jiawei0227 commented Feb 9, 2021 •

edited by olemarkus

Loading

olemarkus commented Feb 9, 2021

Jiawei0227 commented Feb 10, 2021

olemarkus commented Feb 10, 2021

Jiawei0227 commented Feb 10, 2021

olemarkus commented Feb 10, 2021

Jiawei0227 commented Feb 10, 2021

johngmyers commented Apr 9, 2021

johngmyers commented May 21, 2021

olemarkus commented May 26, 2021

Jiawei0227 commented May 26, 2021

olemarkus commented May 31, 2021

Jiawei0227 commented Jun 1, 2021

johngmyers commented Aug 28, 2021

olemarkus commented Aug 29, 2021

olemarkus commented Sep 30, 2021

k8s-triage-robot commented Dec 30, 2021

k8s-triage-robot commented Jan 29, 2022

olemarkus commented Jan 29, 2022

cmotta2016 commented Feb 3, 2022

k8s-triage-robot commented Jun 4, 2022

k8s-triage-robot commented Jul 4, 2022

k8s-triage-robot commented Aug 3, 2022

k8s-ci-robot commented Aug 3, 2022

Install CSI driver by default in preparation of CSI Migration #10777

Install CSI driver by default in preparation of CSI Migration #10777

Comments

Jiawei0227 commented Feb 9, 2021 • edited by olemarkus Loading

olemarkus commented Feb 9, 2021

Jiawei0227 commented Feb 10, 2021

olemarkus commented Feb 10, 2021

Jiawei0227 commented Feb 10, 2021

olemarkus commented Feb 10, 2021

Jiawei0227 commented Feb 10, 2021

johngmyers commented Apr 9, 2021

johngmyers commented May 21, 2021

olemarkus commented May 26, 2021

Jiawei0227 commented May 26, 2021

olemarkus commented May 31, 2021

Jiawei0227 commented Jun 1, 2021

johngmyers commented Aug 28, 2021

olemarkus commented Aug 29, 2021

olemarkus commented Sep 30, 2021

k8s-triage-robot commented Dec 30, 2021

k8s-triage-robot commented Jan 29, 2022

olemarkus commented Jan 29, 2022

cmotta2016 commented Feb 3, 2022

k8s-triage-robot commented Jun 4, 2022

k8s-triage-robot commented Jul 4, 2022

k8s-triage-robot commented Aug 3, 2022

k8s-ci-robot commented Aug 3, 2022

Jiawei0227 commented Feb 9, 2021 •

edited by olemarkus

Loading