Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of tree EBS CSI driver needed for EKS v1.23 #106

Closed
2 tasks done
richardcase opened this issue Feb 14, 2023 · 17 comments
Closed
2 tasks done

Out of tree EBS CSI driver needed for EKS v1.23 #106

richardcase opened this issue Feb 14, 2023 · 17 comments
Assignees
Labels
kind/bug Something isn't working QA/M QA LOE ~24Hrs/3 days
Milestone

Comments

@richardcase
Copy link
Contributor

richardcase commented Feb 14, 2023

With EKS v1.23 and higher it requires the use of the out of tree EBS CSI driver. If this isn't available then and PVC will not be bound and pods using the volumes will stay in pending.

This is a problem for creating new clusters and for upgrading clusters from 1.22

Docs: https://docs.amazonaws.cn/en_us/eks/latest/userguide/ebs-csi.html

PR's

@richardcase richardcase self-assigned this Feb 14, 2023
@sanjay920
Copy link

the aws docs are quite confusing.I added a wiki in the opni repo that our engineers use to get around this problem if it comes in handy https://github.com/rancher/opni/wiki/Install-AWS-EBS-CSI-driver

@richardcase
Copy link
Contributor Author

Thanks @sanjay920

@richardcase
Copy link
Contributor Author

A support utility has been created to handle this until we can include the functionality in eks-operator.

When including in the operator should we automatically enable the EBS CSI addon? Doing this will require that the IAM permissions that Rancher uses will need to be updated (including the docs). Or should it be an opt-in by annotating the namespace?

@richardcase
Copy link
Contributor Author

Wanting on feedback from rancherlabs/support-tools#217 before implementing in eks-operator

@kkaempf kkaempf modified the milestones: 2023-Q2-v2.6.x, 2023-Q2-v2.7x Mar 21, 2023
@kkaempf
Copy link

kkaempf commented Mar 21, 2023

Wontfix for 2023-Q2-2.6.x, workaround (script) is available.

Planned for 2023-Q2-2.7.x

@sowmyav27 sowmyav27 added the QA/M QA LOE ~24Hrs/3 days label Mar 24, 2023
@richardcase richardcase removed their assignment May 2, 2023
@salasberryfin
Copy link
Contributor

/assign

@richardcase
Copy link
Contributor Author

From discussion with @salasberryfin & @mjura we thought this would be the best approach:

  • Installation of the EBS CSI driver will be optional (as is the case when using the AWS console)
  • We will add a new field to EKSClusterConfigSpec to indicate that the driver should be installed
  • If the field is true we will do all the steps to enable the driver
  • Update the EKS create cluster UI to have an option to set this new field. (optionally) In the UI we can also warn the user that additional permissions are required.
  • Update the Rancher docs around the minimum EKS permissions: https://ranchermanager.docs.rancher.com/v2.5/reference-guides/amazon-eks-permissions/minimum-eks-permissions

@salasberryfin
Copy link
Contributor

salasberryfin commented May 10, 2023

On the eks-operator side, the issue can be divided in the following tasks:

  • Add new boolean field to EKSClusterConfigSpec
  • Implement logic to create OIDC provider and IAM role and install EKS addon
  • Tests

@salasberryfin
Copy link
Contributor

PR was merged and this is now waiting for rancher/dashboard#9043 to be implemented.

@richardcase
Copy link
Contributor Author

@salasberryfin - we'll also need to make an update to the Rancher Docs so that the new permissions are included.

@salasberryfin
Copy link
Contributor

Added rancher/rancher-docs#704 to update Rancher docs with the permissions required to enable the add-on.

@kkaempf kkaempf modified the milestones: 2023-Q3-v2.7x, 2023-Q4-v2.8x Jul 11, 2023
@kkaempf kkaempf removed this from the v2.8.0 milestone Oct 17, 2023
@mjura
Copy link
Contributor

mjura commented Jun 25, 2024

UI and docs were merged, back to testing

@cpinjani cpinjani self-assigned this Jun 25, 2024
@cpinjani
Copy link
Contributor

cpinjani commented Jul 1, 2024

Adjusting the milestone, as the UI/Doc issues are slated for v2.9
cc: @kkaempf @mjura

@cpinjani cpinjani modified the milestones: v2.8-Next1, v2.9.0 Jul 1, 2024
@cpinjani
Copy link
Contributor

cpinjani commented Jul 2, 2024

Validated on build:

v2.9-a57f7025abdd80a2912f7fe53247ef7e1d4993de-head
eks-operator:v1.9.0-rc.9

Enabled Out of tree EBS CSI driver while provisioning cluster, its set to ebsCSIDriver: true in config, and the cluster in EKS has the Add-On enabled:

time="2024-07-02T10:02:15Z" level=info msg="enabling [ebs csi driver add-on] for cluster [cpinjani-eks29]"
time="2024-07-02T10:02:22Z" level=info msg="cluster [c-t7tm4] finished updating"

image


However, after cluster becomes Active its set to ebsCSIDriver: null in spec, hence UI reports it as not set.
Expected value: ebsCSIDriver: true

@mjura @salasberryfin Can you please have a look ?

@salasberryfin
Copy link
Contributor

Thanks for reporting @cpinjani. Could you confirm if this is null in upstreamSpec, eksConfig or both?

@cpinjani
Copy link
Contributor

null in both, upstreamSpec & eksConfig

@kkaempf kkaempf modified the milestones: v2.9.0, v2.9-Next1 Jul 23, 2024
@mjura mjura self-assigned this Jul 29, 2024
mjura pushed a commit to mjura/eks-operator that referenced this issue Aug 1, 2024
mjura pushed a commit to mjura/eks-operator that referenced this issue Aug 1, 2024
Issue: rancher#106
(cherry picked from commit c58527c)
@kkaempf kkaempf added the kind/bug Something isn't working label Aug 6, 2024
@cpinjani
Copy link
Contributor

Validation passed on below build, able to set Out of tree EBS CSI driver while creation/edit of cluster.
ebsCSIDriver: true is set in cluster config.

v2.9-92f15949d71efb11baf63c878cc2f64bcd25b1e8-head
eks-operator:v1.9.1-rc.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working QA/M QA LOE ~24Hrs/3 days
Development

No branches or pull requests

8 participants