New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry interval of failed volume snapshot creation or deletion does not double after each failure: v6.0.1 #778
Comments
@zhucan Can you take a look? |
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
Workaround solution: |
After some debugging we noticed some details that might be affecting the workflow:
When creating a snapshot,
ResourceEventHandlerFuncs is a implementation of ResourceEventHandler, whose method external-snapshotter/vendor/k8s.io/client-go/tools/cache/controller.go Lines 203 to 208 in e746d07
Considering the object is modified more than once during the workflow (for example when there is an error), it might be possible that
|
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@torredil: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@xing-yang: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
See kubernetes-sigs#1608 See kubernetes-csi/external-snapshotter#778 This does not seek to be a comprehensive rate-limiting solution, but rather to add a temporary workaround for the bug in the snapshotter sidecar by refusing to call the CreateSnapshot for a specific volume unless it has been 30 seconds since the last attempt. Signed-off-by: Connor Catlett <conncatl@amazon.com>
See kubernetes-sigs#1608 See kubernetes-csi/external-snapshotter#778 This does not seek to be a comprehensive rate-limiting solution, but rather to add a temporary workaround for the bug in the snapshotter sidecar by refusing to call the CreateSnapshot for a specific volume unless it has been 30 seconds since the last attempt. Signed-off-by: Connor Catlett <conncatl@amazon.com>
See kubernetes-sigs#1608 See kubernetes-csi/external-snapshotter#778 This does not seek to be a comprehensive rate-limiting solution, but rather to add a temporary workaround for the bug in the snapshotter sidecar by refusing to call the CreateSnapshot for a specific volume unless it has been 30 seconds since the last attempt. Signed-off-by: Connor Catlett <conncatl@amazon.com>
See kubernetes-sigs#1608 See kubernetes-csi/external-snapshotter#778 This does not seek to be a comprehensive rate-limiting solution, but rather to add a temporary workaround for the bug in the snapshotter sidecar by refusing to call the CreateSnapshot for a specific volume unless it has been 30 seconds since the last attempt. Signed-off-by: Connor Catlett <conncatl@amazon.com>
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@ambiknai: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@zhucan @xing-yang @gfariasalves-ionos we have tried to revert to original fix but still issue does exist. Can we get help from community to fix this ASAP ? This is big issue causing DDOS in case we have scenarios were source volume was deleted or the volume is not attached for the respective snapshot and snapshot content object. This can be of great concern for any consumer of this sidecar.
|
See kubernetes-sigs#1608 See kubernetes-csi/external-snapshotter#778 Adds a 15 second rate limit to CreateSnapshot when the failure originates from CreateSnapshot in cloud (i.e. the error likely originates from the AWS API). This prevents the driver from getting stuck in an infinite loop if snapshot creation fails, where it will indefinately retry creating a snapshot and continue to receive an error because it is going too fast. Signed-off-by: Connor Catlett <conncatl@amazon.com>
/remove-lifecycle rotten |
What happened:
Created VolumeSnapshot but the request failed due to authorisation issue in storage provider. In logs, I could see frequent calls to
CreateSnapshot
method. Ad per doc, it should double after each failure but logs doesn't show that behaviour.What you expected to happen:
retry interval of failed volume snapshot creation or deletion should double after each failure
How to reproduce it:
Create a negative test scenario where VolumeSnapshot Creation fails and observe csi-snapshotter sidecar logs.
Anything else we need to know?:
csi-snapshotter
has errors loggedExternal-Snapshotter version : v6.0.1
Related PR for reference : #651
Environment:
kubectl version
): 1.25uname -a
):The text was updated successfully, but these errors were encountered: