Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to create access point: AccessPointLimitExceeded: You have reached the maximum number of access points #517

Closed
chandrab-on opened this issue Jul 17, 2021 · 17 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@chandrab-on
Copy link

/kind bug

What happened?
aws-efs-csi-driver throws and error saying that "Failed to create access point: AccessPointLimitExceeded: You have reached the maximum number of access points" after migrating to aws-efs-csi-driver version to v1.3.2

Access Points are not being deleted all PVCs in the namespace are deleted

What you expected to happen?
Access Points needs ro deleted when corresponding PVC is deleted

How to reproduce it (as minimally and precisely as possible)?
Install the driver using helm:
helm repo add aws-efs-csi-driver https://kubernetes-sigs.github.io/aws-efs-csi-driver/
helm repo update
helm upgrade --install aws-efs-csi-driver --namespace kube-system aws-efs-csi-driver/aws-efs-csi-driver

Anything else we need to know?:

Environment
AWS EKS

  • Kubernetes version (use kubectl version):
    v1.20.4
  • Driver version:
    v1.3.2
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 17, 2021
@chandrab-on
Copy link
Author

{
RespMetadata: {
StatusCode: 403,
RequestID: "3a4f7d4e-4132-4a77-b41c-993152b3c0f5"
},
ErrorCode: "AccessPointLimitExceeded",
Message_: "You have reached the maximum number of access points (120) for your file system fs-4acdf681. Delete an access point and add a new one."
}
Warning ProvisioningFailed 14m efs.csi.aws.com_ip-10-128-93-166.eu-west-1.compute.internal_b2ab1b88-ab90-49ee-a354-adaba28e8170 failed to provision volume with StorageClass "aws-efs": rpc err
or: code = Internal desc = Failed to create Access point in File System fs-4acdf681 : Failed to create access point: AccessPointLimitExceeded: You have reached the maximum number of access points (
120) for your file system fs-4acdf681. Delete an access point and add a new one.
{
RespMetadata: {
StatusCode: 403,
RequestID: "b8ff1a28-bfe5-46d5-9b36-bee6e25528ad"
},
ErrorCode: "AccessPointLimitExceeded",
Message_: "You have reached the maximum number of access points (120) for your file system fs-4acdf681. Delete an access point and add a new one."
}
Warning ProvisioningFailed 12m efs.csi.aws.com_ip-10-128-93-166.eu-west-1.compute.internal_b2ab1b88-ab90-49ee-a354-adaba28e8170 failed to provision volume with StorageClass "aws-efs": rpc err
or: code = Internal desc = Failed to create Access point in File System fs-4acdf681 : Failed to create access point: AccessPointLimitExceeded: You have reached the maximum number of access points (
120) for your file system fs-4acdf681. Delete an access point and add a new one.
{
RespMetadata: {
StatusCode: 403,
RequestID: "2ed76d15-ae0f-4fd9-b6e3-8000a4b15e92"
},
ErrorCode: "AccessPointLimitExceeded",
Message_: "You have reached the maximum number of access points (120) for your file system fs-4acdf681. Delete an access point and add a new one."
}
Warning ProvisioningFailed 7m58s efs.csi.aws.com_ip-10-128-93-166.eu-west-1.compute.internal_b2ab1b88-ab90-49ee-a354-adaba28e8170 (combined from similar events): failed to provision volume wi
th StorageClass "aws-efs": rpc error: code = Internal desc = Failed to create Access point in File System fs-4acdf681 : Failed to create access point: AccessPointLimitExceeded: You have reached the
maximum number of access points (120) for your file system fs-4acdf681. Delete an access point and add a new one.
{
RespMetadata: {
StatusCode: 403,
RequestID: "776fbdd1-216c-44e6-ba9d-ce6b07c8443d"
},
ErrorCode: "AccessPointLimitExceeded",
Message_: "You have reached the maximum number of access points (120) for your file system fs-4acdf681. Delete an access point and add a new one."
}
Normal ExternalProvisioning 3m42s (x204 over 53m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "efs.csi.aws.com" or manually created by syste
m administrator

@kbasv
Copy link

kbasv commented Jul 20, 2021

@chandrab-on EFS has a hard limit of 120 access points per file system according to resource quotas in on the official documentation. This includes the access points which are not created by the driver.
You will either need to delete your access points or switch you PVs to use a new file system by creating a new storage class.

@MarkSpencerTan
Copy link

@kbasv is there a way EFS CSI Driver can have a different implementation type without using the Access Points? Seems like the hard limit of 120 access points can be extremely limiting for using it in Kubernetes since that would essentially mean you're limited to 120 pvcs right? Then you'll need to create another EFS with another storage class to scale further... I think it defeats the purpose of the scalability aspect of EFS and will require manual intervention or using another completely different provisioner to take advantage of EFS. This is the current limitation we've faced for my project's use case with using EFS CSI Driver at the moment

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 16, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 16, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@artificial-aidan
Copy link

/reopen

@k8s-ci-robot
Copy link
Contributor

@artificial-aidan: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jgoeres
Copy link

jgoeres commented Jul 1, 2022

I think this issue should be reopened. Even if we might not get an implementation that does not use an AP for each PV(C), at least the APs for released PV(C)s should be deleted, IMO.
We are currently working on a project where we deploy a lot of applications to a cluster each day and delete the ones from the day before, each having multiple EFS-backed PVCs. This leads to a lot of newly "Released" PVs each day, eating up our EFS share's PVs. We now need to implement a workaround using a cronjob, scheduled lambda or similar mechanism to identify the no longer used APs and delete them. I think such workarounds should not be needed.

@thesuperzapper
Copy link

@jonathanrainer is proposing a directory-based approach (not using access points) in PR #732, you may want to check it out!

But in the meantime, the official NFS CSI Driver already uses a directory-based approach (so has no 120 limit), and works with any arbitrary NFS servers (including EFS).

@thesuperzapper
Copy link

@Ashley-wenyizha @wongma7 @jsafrane this issue is critical (and still present), can one of you please reopen it?

@jgoeres
Copy link

jgoeres commented Aug 22, 2022

I second reopening this issue. @Ashley-wenyizha @wongma7
However, there seems to be a kind of workaround that might work for those people that do not actually need more than 120 PVs at once backed by the same EFS share, but are running into the issue that we had (deletion of PVs not deleting the APs):
When using a storage class with ReclaimPolicy set to Delete, the deletion of the PVCs and subsequent deletion of the PV will also delete the AP.
The question remains: why is using SC with "Retain" reclaim policy and manually deleting PVs resulting in a different behavior when it comes to AP deletion? IMO, in both cases the EFS CSI driver should delete all AWS resources related to the PV.

@FleetAdmiralButter
Copy link

I'm in support of reopening this issue as well @Ashley-wenyizha @wongma7. We just ran headfirst into the 120 AP limit and have little recourse other than migrating all our workloads to a different EFS/NFS driver.

If this project does not want to support alternate provisioning methods, the documentation should at least make it very clear that a maximum of 120 PVs per storage class are supported so that others who require a large number of PVs know to look somewhere else.

@LucaSoato
Copy link

Why is this closed? This can be critical.

@FoodyFood
Copy link

Hey,

https://aws.amazon.com/about-aws/whats-new/2023/01/amazon-efs-1000-access-points-file-system/

They upped it to 1000 last night.

I asked my rep because I did not see it in my service quotas, and I got this further info:

I’ve communicated with the EFS PM on this. All existing and newly created file systems now automatically support up to 1,000 EFS Access Points, no limit increase request is needed from your end.

He has pointed out that what you’re seeing is a visual lag and will get this updated as soon as possible.


Hope this helps everyone

@artificial-aidan
Copy link

This is still a problem. 1000 volumes isn't that many

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests