Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS: remove volume IOPS limit #90014

Merged
merged 1 commit into from Sep 2, 2020
Merged

AWS: remove volume IOPS limit #90014

merged 1 commit into from Sep 2, 2020

Conversation

jacobmarble
Copy link

@jacobmarble jacobmarble commented Apr 9, 2020

What type of PR is this?

/kind bug

What this PR does / why we need it:

Max IOPS for SSD (io1) volumes was increased from 20,000 to 32,000:
https://aws.amazon.com/about-aws/whats-new/2017/12/amazon-ebs-provisioned-iops-ssd--io1--volumes-now-support-32-000-iops-and-500-mbs-per-volume/

and later to 64,000:
https://aws.amazon.com/about-aws/whats-new/2018/11/amazon-elastic-block-store-announces-double-the-performance-of-provisioned-iops-volumes/

Rather than chasing this limit, allow the AWS SDK to surface errors for IOPS value out of range.

Which issue(s) this PR fixes:

(none found)

Special notes for your reviewer:

I'm proposing the same for CSI:
kubernetes-sigs/aws-ebs-csi-driver#483

Does this PR introduce a user-facing change?:

Increased maximum IOPS of AWS EBS io1 volumes to 64,000 (current AWS maximum).

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 9, 2020
@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Apr 9, 2020
@k8s-ci-robot
Copy link
Contributor

Welcome @jacobmarble!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 9, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @jacobmarble. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added area/cloudprovider sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 9, 2020
@jacobmarble
Copy link
Author

I have signed the CLA. Let's see if this message triggers a recheck.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Apr 16, 2020
@jacobmarble
Copy link
Author

I have manually tested this change. A PV/PVC based on io1 with <64,000 IOPS created successfully. A PV/PVC based on io1 with >64,000 IOPS did not. The errors exposed in logs and kubectl described are short, clear, useful.

Resource config:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-io1-1000g-50000iops
  namespace: default
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1000G
  storageClassName: io1-50
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-io1-1500g-75000iops
  namespace: default
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1500G
  storageClassName: io1-50
---
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: io1-50
parameters:
  iopsPerGB: "50"
  type: io1
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete
volumeBindingMode: Immediate

Results of kubectl describe on failing PVC:

Name:          data-io1-1500g-75000iops
Namespace:     default
StorageClass:  io1-50
Status:        Pending
Volume:
Labels:        <none>
Annotations:   volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Mounted By:    <none>
Events:
  Type     Reason              Age   From                         Message
  ----     ------              ----  ----                         -------
  Warning  ProvisioningFailed  89s   persistentvolume-controller  Failed to provision volume with StorageClass "io1-50": InvalidParameterValue: Volume iops of 69850 is too high; maximum is 64000.
           status code: 400, request id: 20c19f61-7ceb-4275-baf3-a51dd9dd067d
  Warning  ProvisioningFailed  86s  persistentvolume-controller  Failed to provision volume with StorageClass "io1-50": InvalidParameterValue: Volume iops of 69850 is too high; maximum is 64000.
           status code: 400, request id: 2abbb3e3-ffa3-4815-bfb1-cd4c410455c2
  Warning  ProvisioningFailed  71s  persistentvolume-controller  Failed to provision volume with StorageClass "io1-50": InvalidParameterValue: Volume iops of 69850 is too high; maximum is 64000.
           status code: 400, request id: 666f9642-e454-4c3d-b9a9-f347223ce033
  Warning  ProvisioningFailed  56s  persistentvolume-controller  Failed to provision volume with StorageClass "io1-50": InvalidParameterValue: Volume iops of 69850 is too high; maximum is 64000.
           status code: 400, request id: c7bcee49-d5c4-4015-b32a-c5fd409fcfba
  Warning  ProvisioningFailed  41s  persistentvolume-controller  Failed to provision volume with StorageClass "io1-50": InvalidParameterValue: Volume iops of 69850 is too high; maximum is 64000.
           status code: 400, request id: e0e5d299-014c-4c60-ab50-524283b24679
  Warning  ProvisioningFailed  26s  persistentvolume-controller  Failed to provision volume with StorageClass "io1-50": InvalidParameterValue: Volume iops of 69850 is too high; maximum is 64000.
           status code: 400, request id: b1bf92e1-76b1-4a7e-a589-418d470f1b97

Kubernetes log:

I0430 05:58:31.041343   15843 aws_util.go:110] Error creating EBS Disk volume: InvalidParameterValue: Volume iops of 69850 is too high; maximum is 64000.
	status code: 400, request id: 6ccfc493-5141-454d-a578-04e659708067
E0430 05:58:31.041377   15843 aws_ebs.go:494] Provision failed: InvalidParameterValue: Volume iops of 69850 is too high; maximum is 64000.
	status code: 400, request id: 6ccfc493-5141-454d-a578-04e659708067
I0430 05:58:31.041415   15843 pv_controller.go:1499] failed to provision volume for claim "default/data-io1-1500g-75000iops" with StorageClass "io1-50": InvalidParameterValue: Volume iops of 69850 is too high; maximum is 64000.
	status code: 400, request id: 6ccfc493-5141-454d-a578-04e659708067
E0430 05:58:31.041467   15843 goroutinemap.go:150] Operation for "provision-default/data-io1-1500g-75000iops[5bab79c7-7517-4a15-a6af-485d2ceb55e7]" failed. No retries permitted until 2020-04-30 05:59:35.041448037 +0000 UTC m=+3039.192762601 (durationBeforeRetry 1m4s). Error: "InvalidParameterValue: Volume iops of 69850 is too high; maximum is 64000.\n\tstatus code: 400, request id: 6ccfc493-5141-454d-a578-04e659708067"
I0430 05:58:31.041528   15843 event.go:278] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"data-io1-1500g-75000iops", UID:"5bab79c7-7517-4a15-a6af-485d2ceb55e7", APIVersion:"v1", ResourceVersion:"934", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' Failed to provision volume with StorageClass "io1-50": InvalidParameterValue: Volume iops of 69850 is too high; maximum is 64000.
	status code: 400, request id: 6ccfc493-5141-454d-a578-04e659708067
I0430 05:58:31.232585   15843 pvc_protection_controller.go:263] PVC default/data-io1-1500g-75000iops is unused
I0430 05:58:31.235708   15843 pvc_protection_controller.go:209] Removed protection finalizer from PVC default/data-io1-1500g-75000iops

@gnufied @justinsb please share your thoughts.

@leakingtapan
Copy link

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 1, 2020
@leakingtapan
Copy link

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 1, 2020
@jacobmarble
Copy link
Author

Matching PR for AWS/EBS CSI driver has also been tested.

Please advise next steps.

@jacobmarble
Copy link
Author

Friendly note here. I believe that this and kubernetes-sigs/aws-ebs-csi-driver#483 are ready for consideration to be merged.

@leakingtapan
Copy link

/lgtm
/approve

@gnufied mind approve as well?

@nikhita
Copy link
Member

nikhita commented Jul 9, 2020

/assign @jsafrane
for approval

@philjb
Copy link
Contributor

philjb commented Jul 21, 2020

/retest

@leakingtapan
Copy link

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 21, 2020
@jsafrane
Copy link
Member

Thanks a lot!
/approve

@jsafrane
Copy link
Member

/retest

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jacobmarble, jsafrane, leakingtapan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 22, 2020
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@philjb
Copy link
Contributor

philjb commented Jul 22, 2020

@jsafrane - could you add the milestone?

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jul 23, 2020
@jsafrane
Copy link
Member

No. @saad-ali, can you please add 1.19 milestone? This one is harmless.

@philjb
Copy link
Contributor

philjb commented Jul 27, 2020

@saad-ali - just following up here on adding the milestone. thanks for your help.

@philjb
Copy link
Contributor

philjb commented Aug 3, 2020

@saad-ali - friendly weekly reminder here for the milestone.

@philjb
Copy link
Contributor

philjb commented Aug 12, 2020

@jsafrane - could you help find @saad-ali ?

@saad-ali
Copy link
Member

/assign

@saad-ali
Copy link
Member

/milestone v1.19

@k8s-ci-robot k8s-ci-robot added this to the v1.19 milestone Aug 26, 2020
@kubernetes kubernetes deleted a comment from k8s-ci-robot Aug 26, 2020
@saad-ali saad-ali added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Aug 26, 2020
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Aug 26, 2020
@justaugustus justaugustus modified the milestones: v1.19, v1.20-phase-bug, v1.20 Aug 27, 2020
@justaugustus
Copy link
Member

/test pull-kubernetes-e2e-kind-ipv6

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit d2a0b6f into kubernetes:master Sep 2, 2020
@philjb philjb deleted the aws-max-total-iops branch September 21, 2020 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cloudprovider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants