Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install CSI driver by default in preparation of CSI Migration #10777

Closed
1 of 3 tasks
Jiawei0227 opened this issue Feb 9, 2021 · 23 comments
Closed
1 of 3 tasks

Install CSI driver by default in preparation of CSI Migration #10777

Jiawei0227 opened this issue Feb 9, 2021 · 23 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@Jiawei0227
Copy link

Jiawei0227 commented Feb 9, 2021

1. Describe IN DETAIL the feature/behavior/change you would like to see.

CSI Migration is a Kubernetes feature that when turn on, it will redirect in-tree plugin traffic to the corresponding CSI driver. It has been Beta in k8s since v1.17 without turning on by default.

Recently, we decide to push this feature forward and it will be turn on by default in v1.22 for a lot of plugins according to our plan.

It would be good if kOps can prepare for this upcoming change. Specifically, kOps should deploy the corresponding CSI drivers by default for the corresponding cloud. The driver is a requisite for CSI migration to work.

  • AWS - AWS EBS CSI Driver
  • GCP - GCE PD CSI Driver
  • Azure - Azuredisk/Azurefile CSI Driver

2. Background

/kind feature
/cc @msau42

@olemarkus
Copy link
Member

Can you clarify which feature you refer to? There are

  • CSIMigrationAWS, which will use the CSI driver if installed, but fallback otherwise. Hence does not strictly need the driver. Thi sis the flag that is currently in beta, but disabled
  • CSIMigrationAWSComplete, which strictly requires the driver and is currently in alpha.

We have support for the AWS CSI driver, where we will set the latter flag This is also used by the aws-cloud-provider e2e testing. I doubt the we will enabled the addon in 1.21 though. We recently discussed this and decided not to. We may extend our e2e grid to include testing with this enabled though. kOps 1.21 is many months away so there should be ample time reconsider later.

Azure cloud support is still being developed. Since Azure is alpha it may be we just go directly to using the CSI driver. I think that makes sense.

GCP does not support CSI yet. Unclear when that will be added.

@Jiawei0227
Copy link
Author

I am talking about CSIMigrationAWS/CSIMigrationGCE/CSIMigrationAzure.

which will use the CSI driver if installed, but fallback otherwise

This might not be true, if CSIMigrationAWS is enabled on both kube-controller-manager and kubelet, then CSI driver is a requirement. Otherwise the in-tree plugin will not work. This flag is currently in beta but not on by default. We are planning to turn it on by default in 1.22

CSIMigrationAWSComplete, which strictly requires the driver and is currently in alpha.

CSIMigrationAWSComplete has been replaced with new flag called InTreePluginAWSUnregister. This new flag InTreePluginAWSUnregister will serve the same purpose as CSIMigrationAWSComplete except that it is decoupled from migration. So if you do not want to support in-tree at all. You can turn it on without migration. This flag is still in alpha and remain in alpha.

We have support for the AWS CSI driver, where we will set the latter flag

The CSIMigrationAWSComplete flag, will only work when CSIMigrationAWS is turned on. Otherwise it is a noop. Does that mean the current e2e testing enable both the flag? If so, it seems good then. I think just need to make sure the csi driver is installed for AWS based cluster after 1.21. Then we are good.

Azure cloud support is still being developed. Since Azure is alpha it may be we just go directly to using the CSI driver. I think that makes sense.

Sounds right to me!

GCP does not support CSI yet. Unclear when that will be added.

Is there anyone working on GCP? I think it would be good to let them know that it should install GCE PD CSI Driver

@olemarkus
Copy link
Member

Thanks for the clarifications. No only CSIMigrationAWSComplete was set, so I guess this is noop then. We will add CSIMigrationAWS as well if the addon is installed. I think turning it on for 1.22 by default should be fine.

We'll bring this up in our office hours and decide then what to do about the other cloud providers.

@olemarkus olemarkus added kind/feature Categorizes issue or PR as related to a new feature. kind/office-hours labels Feb 10, 2021
@Jiawei0227
Copy link
Author

No only CSIMigrationAWSComplete was set

Yep, this is exactly why we replaced CSIMigrationAWSComplete to InTreePluginAWSUnregister. Now you only need to enable InTreePluginAWSUnregister and the in-tree plugin will just not be registered so you do not need to support in-tree at all. And installing the aws ebs driver to use CSI directly should be good then.

@olemarkus
Copy link
Member

InTreePluginAWSUnregister will be introduced in k8s 1.21? So we still need to go with CSIMigrationAWSComplete until then if we want to unregister the in-tree plugin

@Jiawei0227
Copy link
Author

InTreePluginAWSUnregister will be introduced in 1.21

@johngmyers
Copy link
Member

Per office hours, will block 1.21

@johngmyers
Copy link
Member

Per kOps office hours, remaining work does not block 1.21

@olemarkus
Copy link
Member

@Jiawei0227 I was looking for an update on CSI migration, but it looks like many of the linked issues have gone stale. Is this still targeting 1.22?

@Jiawei0227
Copy link
Author

AWS EBS, GCE PD and Azuredisk is still targeting turning on by default in 1.22

@johngmyers johngmyers modified the milestones: v1.21, v1.22 May 27, 2021
@olemarkus
Copy link
Member

@Jiawei0227

After migrating to using the AWS EBS CSI driver by default we see this test consistently flaking:
[sig-storage] Volume limits should verify that all nodes have volume limits

https://kubernetes.io/docs/concepts/storage/storage-limits/#dynamic-volume-limits hints at that volume limits are now in a different location (CSINode instead of Node), but I would assume the tests were aware of this.

It is unclear if this is a bug with kOps, the test, or the CSI driver. Would you know anything about this?

@Jiawei0227
Copy link
Author

To support VolumeLimite, the AWS CSI driver need to advertise this in their NodeGetInfoResponse call.

https://kubernetes-csi.github.io/docs/volume-limits.html

I am not very familiar with ebs driver. @wongma7 do you have any insights when running the migration tests?
/cc @msau42

@johngmyers
Copy link
Member

I don't think GCP and Azure block 1.22.

Should we split them out into separate tickets and close this?

@olemarkus
Copy link
Member

I suggest we just remove from milestone and use this one as an umbrella issue.

@olemarkus
Copy link
Member

kubernetes/kubernetes#104670 Azure migration is now being enabled default.

@olemarkus olemarkus modified the milestones: v1.22, v1.23 Oct 1, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 30, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 29, 2022
@olemarkus
Copy link
Member

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 29, 2022
@cmotta2016
Copy link

Hey guys, help here.
We're experiecing some bug after install CSI. #13197
Can you help us?

@olemarkus olemarkus removed this from the v1.23 milestone Mar 6, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 4, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 4, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

8 participants