-
Notifications
You must be signed in to change notification settings - Fork 771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 30 second rate limit to CreateSnapshot #1611
Add 30 second rate limit to CreateSnapshot #1611
Conversation
5e48e93
to
4354040
Compare
pkg/cloud/cloud.go
Outdated
|
||
// ErrSnapshotRateLimit is returned when cloud.go refuses to send a CreateSnapshot AWS API request | ||
// Used to work around https://github.com/kubernetes-csi/external-snapshotter/issues/778 | ||
ErrSnapshotRateLimit = fmt.Errorf("Refusing to send CreateSnapshot to AWS (see https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/1608#issuecomment-1554748900)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error messages should not contain external links; they should be descriptive enough on their own. Suggest changing this to something like
fmt.Errorf("snapshot creation rate-limited")
pkg/cloud/cloud.go
Outdated
@@ -776,7 +781,17 @@ func (c *cloud) IsExistInstance(ctx context.Context, nodeID string) bool { | |||
return true | |||
} | |||
|
|||
var lastCreateSnapshot sync.Map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use sync.Map
instead of a normal map
with a mutex
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not part of cloud
struct?
pkg/cloud/cloud.go
Outdated
func (c *cloud) CreateSnapshot(ctx context.Context, volumeID string, snapshotOptions *SnapshotOptions) (snapshot *Snapshot, err error) { | ||
if lastTimeUncast, ok := lastCreateSnapshot.Load(volumeID); ok { | ||
if lastTime, ok := lastTimeUncast.(time.Time); ok { | ||
if time.Since(lastTime) < time.Second*30 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pkg/cloud/cloud.go
Outdated
@@ -796,6 +811,7 @@ func (c *cloud) CreateSnapshot(ctx context.Context, volumeID string, snapshotOpt | |||
Description: aws.String(descriptions), | |||
} | |||
|
|||
lastCreateSnapshot.Store(volumeID, time.Now()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May also need to delete entry at some point, otherwise memory consumed by driver would balloon.
See kubernetes-sigs#1608 See kubernetes-csi/external-snapshotter#778 Adds a 15 second rate limit to CreateSnapshot when the failure originates from CreateSnapshot in cloud (i.e. the error likely originates from the AWS API). This prevents the driver from getting stuck in an infinite loop if snapshot creation fails, where it will indefinately retry creating a snapshot and continue to receive an error because it is going too fast. Signed-off-by: Connor Catlett <conncatl@amazon.com>
4354040
to
56c9407
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/close superseded by new methodology |
Is this a bug fix or adding new feature?
Bug fix
What is this PR about? / Why do we need it?
See #1608
See kubernetes-csi/external-snapshotter#778
This does not seek to be a comprehensive rate-limiting solution, but rather to add a temporary workaround for the bug in the snapshotter sidecar by refusing to call the CreateSnapshot for a specific volume unless it has been 30 seconds since the last attempt.
What testing is done?
Manual/New unit test/CI