Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate AWS SDK for Go v2 default retry configuration #36024

Closed
ewbankkit opened this issue Feb 29, 2024 · 4 comments · Fixed by #36467
Closed

Investigate AWS SDK for Go v2 default retry configuration #36024

ewbankkit opened this issue Feb 29, 2024 · 4 comments · Fixed by #36467
Labels
aws-sdk-go-migration Issues that are related to the providers migration to AWS SDK for Go v2. client-connections Pertains to the AWS Client and service connections. technical-debt Addresses areas of the codebase that need refactoring or redesign.
Milestone

Comments

@ewbankkit
Copy link
Contributor

ewbankkit commented Feb 29, 2024

Background

AWS services maintain API rate limits which are used to protect the performance and availability of the service. If a service’s API rate limit is exceeded then the caller will receive a throttling error (usually RequestLimitExceeded or ThrottlingException) and subsequent API calls are throttled. AWS SDKs implement techniques to limit the impact of throttling on applications, including:

  • Retry - SDKs implement configurable, automatic retry logic.
  • Backoff - SDKs implement an exponential backoff algorithm, using increasingly longer waits between retries when consecutive throttling errors are encountered.
  • Jitter - A random delay is added to each retry waiting period to help prevent large numbers of clients retrying at the same time (a variant of the thundering herd problem).

AWS SDK for Go v1

The Terraform AWS Provider’s max_retries configuration attribute populates the SDK’s Config.MaxRetries field. The default value is 25.
The DefaultRetryer is used, with only NumMaxRetries modified from its default value.
Retry logic is similar for throttling and non-throttling errors (only min and max delays differ); most notably, the max number of retries is the same. Note that the retries are per-request, not shared.

AWS SDK for Go v2

In addition to max_retries the Terraform AWS Provider’s retry_mode configuration attribute determines whether standard or adaptive retry strategies are used (the default is the standard strategy).
The AWS SDK for Go v2 SDK retryer uses a token bucket for retrying throttling errors. This token bucket is shared across ALL requests for an API client.

The Problem

After migration of services from AWS SDK for Go v1 to v2 we have received multiple reports (see linked GitHub Issues) of frequent operation failures: failed to get rate limit token, retry quota exceeded, 0 available, 5 requested, which is a QuotaExceededException.
Configuring retry_mode = "adaptive" is only a partial solution as retry tokens (as opposed to request tokens) don’t cause a sleep before retry.
Increasing token_bucket_rate_limiter_capacity to a large value (it's default is 500) seems to resolve the issue but as we migrate more (and eventually all) services to AWS SDK for Go v2 we need to have a useful default configuration for this functionality.

Relations

Relates #34669.
Relates #32976.
Relates #35926.
Relates hashicorp/aws-sdk-go-base#933.
Relates hashicorp/aws-sdk-go-base#932.
Relates hashicorp/aws-sdk-go-base#918.
Relates hashicorp/aws-sdk-go-base#915.
Relates #36094.
Relates #25552.

Copy link

Community Note

Voting for Prioritization

  • Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
  • Please see our prioritization guide for information on how we prioritize.
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

  • If you are interested in working on this issue, please leave a comment.
  • If this would be your first contribution, please review the contribution guide.

@ewbankkit ewbankkit added technical-debt Addresses areas of the codebase that need refactoring or redesign. client-connections Pertains to the AWS Client and service connections. labels Feb 29, 2024
@ewbankkit ewbankkit added the aws-sdk-go-migration Issues that are related to the providers migration to AWS SDK for Go v2. label Mar 7, 2024
@ewbankkit
Copy link
Contributor Author

ewbankkit commented Mar 19, 2024

After discussion with the AWS Go SDK team, we have decided to follow the newly published guidance and by default configure a RateLimiter of ratelimit.None unless token_bucket_rate_limiter_capacity has been configured.
With ratelimit.None in effect the max_retries value is still respected, enabling AWS SDK for Go v1 equivalent functionality.

Copy link

This functionality has been released in v5.42.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
aws-sdk-go-migration Issues that are related to the providers migration to AWS SDK for Go v2. client-connections Pertains to the AWS Client and service connections. technical-debt Addresses areas of the codebase that need refactoring or redesign.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant