Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 30 second rate limit to CreateSnapshot #1611

Closed

Conversation

ConnorJC3
Copy link
Contributor

@ConnorJC3 ConnorJC3 commented May 22, 2023

Is this a bug fix or adding new feature?

Bug fix

What is this PR about? / Why do we need it?

See #1608
See kubernetes-csi/external-snapshotter#778

This does not seek to be a comprehensive rate-limiting solution, but rather to add a temporary workaround for the bug in the snapshotter sidecar by refusing to call the CreateSnapshot for a specific volume unless it has been 30 seconds since the last attempt.

What testing is done?

Manual/New unit test/CI

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 22, 2023
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 22, 2023
@ConnorJC3 ConnorJC3 force-pushed the createsnapshot-ratelimit branch 3 times, most recently from 5e48e93 to 4354040 Compare May 22, 2023 18:56

// ErrSnapshotRateLimit is returned when cloud.go refuses to send a CreateSnapshot AWS API request
// Used to work around https://github.com/kubernetes-csi/external-snapshotter/issues/778
ErrSnapshotRateLimit = fmt.Errorf("Refusing to send CreateSnapshot to AWS (see https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/1608#issuecomment-1554748900)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error messages should not contain external links; they should be descriptive enough on their own. Suggest changing this to something like

fmt.Errorf("snapshot creation rate-limited")

@@ -776,7 +781,17 @@ func (c *cloud) IsExistInstance(ctx context.Context, nodeID string) bool {
return true
}

var lastCreateSnapshot sync.Map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use sync.Map instead of a normal map with a mutex?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not part of cloud struct?

func (c *cloud) CreateSnapshot(ctx context.Context, volumeID string, snapshotOptions *SnapshotOptions) (snapshot *Snapshot, err error) {
if lastTimeUncast, ok := lastCreateSnapshot.Load(volumeID); ok {
if lastTime, ok := lastTimeUncast.(time.Time); ok {
if time.Since(lastTime) < time.Second*30 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -796,6 +811,7 @@ func (c *cloud) CreateSnapshot(ctx context.Context, volumeID string, snapshotOpt
Description: aws.String(descriptions),
}

lastCreateSnapshot.Store(volumeID, time.Now())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May also need to delete entry at some point, otherwise memory consumed by driver would balloon.

See kubernetes-sigs#1608
See kubernetes-csi/external-snapshotter#778

Adds a 15 second rate limit to CreateSnapshot when the failure
originates from CreateSnapshot in cloud (i.e. the error likely
originates from the AWS API).

This prevents the driver from getting stuck in an infinite loop if
snapshot creation fails, where it will indefinately retry creating a
snapshot and continue to receive an error because it is going too fast.

Signed-off-by: Connor Catlett <conncatl@amazon.com>
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from connorjc3. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 26, 2023
@ConnorJC3
Copy link
Contributor Author

/close superseded by new methodology

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants