-
Notifications
You must be signed in to change notification settings - Fork 786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using Fast Snapshot Restores could not create the second volume's fast snapshot #1608
Comments
Some references: |
Hi @Phaow thanks for reporting this, I was able to reproduce it. This seems to be a bug with the The external-snapshotter sidecar that watches for More specifically, the retry interval should double with each failure:
This log demonstrates the retry mechanism is not working as expected:
|
The second
This would also happen in a single namespace by adding more AZs to your
You may view your applied quota value and request an increase in the Service Quotas dashboard: The retry interval not doubling is still a bug and causing the observed |
Hi @torredil ,got it, thanks a lot for the detail debugging and clarifying! Sorry for I missedd the |
See kubernetes-sigs#1608 See kubernetes-csi/external-snapshotter#778 This does not seek to be a comprehensive rate-limiting solution, but rather to add a temporary workaround for the bug in the snapshotter sidecar by refusing to call the CreateSnapshot for a specific volume unless it has been 30 seconds since the last attempt. Signed-off-by: Connor Catlett <conncatl@amazon.com>
See kubernetes-sigs#1608 See kubernetes-csi/external-snapshotter#778 This does not seek to be a comprehensive rate-limiting solution, but rather to add a temporary workaround for the bug in the snapshotter sidecar by refusing to call the CreateSnapshot for a specific volume unless it has been 30 seconds since the last attempt. Signed-off-by: Connor Catlett <conncatl@amazon.com>
See kubernetes-sigs#1608 See kubernetes-csi/external-snapshotter#778 This does not seek to be a comprehensive rate-limiting solution, but rather to add a temporary workaround for the bug in the snapshotter sidecar by refusing to call the CreateSnapshot for a specific volume unless it has been 30 seconds since the last attempt. Signed-off-by: Connor Catlett <conncatl@amazon.com>
See kubernetes-sigs#1608 See kubernetes-csi/external-snapshotter#778 This does not seek to be a comprehensive rate-limiting solution, but rather to add a temporary workaround for the bug in the snapshotter sidecar by refusing to call the CreateSnapshot for a specific volume unless it has been 30 seconds since the last attempt. Signed-off-by: Connor Catlett <conncatl@amazon.com>
Hi @Phaow to loop you in on our current plan; we are introducing a change in the next release to temporarily address the |
@torredil Good to know, many thanks again! |
See kubernetes-sigs#1608 See kubernetes-csi/external-snapshotter#778 Adds a 15 second rate limit to CreateSnapshot when the failure originates from CreateSnapshot in cloud (i.e. the error likely originates from the AWS API). This prevents the driver from getting stuck in an infinite loop if snapshot creation fails, where it will indefinately retry creating a snapshot and continue to receive an error because it is going too fast. Signed-off-by: Connor Catlett <conncatl@amazon.com>
Closing this issue because it should be fixed with external-snapshotter constantly retrying CreateSnapshot calls on error w/o backoff #871, many thanks @Phaow! /close |
/close |
@AndrewSirenko: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@AndrewSirenko Thanks for the info and update! |
/kind bug
What happened?
fsr-test1
andfsr-test2
pvc1
,pod1
in namespacefsr-test1
and wait for pod running then create snapshot for thepvc1
, the volumesnapshot/volumesnapshotcontent could becomereadyToUse:true
pvc2
,pod2
in namespacefsr-test2
and wait for pod running then create snapshot for thepvc2
, the volumesnapshot/volumesnapshotcontent stuck atreadyToUse:false
What you expected to happen?
In step 4 the volumesnapshot/volumesnapshotcontent could become
readyToUse:true
How to reproduce it (as minimally and precisely as possible)?
Always
Anything else we need to know?:
Check the controller csi-snapshotter container logs, lots of
Could not create snapshot SnapshotCreationPerVolumeRateExceeded: The maximum per volume CreateSnapshot request rate has been exceeded. Use an increasing or variable sleep interval between requests.
errorEnvironment
kubectl version
):v1.18.0
The text was updated successfully, but these errors were encountered: