Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement isResourceTimeoutError() Conditionals After All resource.Retry() Usage #7873

Closed
bflad opened this issue Mar 8, 2019 · 2 comments
Closed
Assignees
Labels
provider Pertains to the provider itself, rather than any interaction with AWS. technical-debt Addresses areas of the codebase that need refactoring or redesign.

Comments

@bflad
Copy link
Member

bflad commented Mar 8, 2019

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

When implementing resource.Retry() to allow Terraform AWS Provider resources to retry AWS API requests either for eventual consistency errors or for known concurrency errors, the code immediately following should include a conditional check for isResourceTimeoutError() to retry the request outside the timeout.

If the AWS Go SDK encounters an error such as 5XX AWS API responses, networking related errors, or certain throttling errors, it will automatically retry the AWS API requests itself without returning to the calling code. If these automatic AWS Go SDK retries last longer than the resource.Retry() timeout and we do not retry the request outside the resource.Retry() block before returning the error, operators will receive an "empty" timeout error without any context about the underlying AWS Go SDK error:

timeout while waiting for state to become 'success' (timeout: 1m0s)

Debugging these with Terraform is only possible by enabling debug logging, e.g. TF_LOG=debug terraform plan/apply.

Example incorrect implementation in pseudocode:

err := resource.Retry(1 * time.Minute, func() *resource.RetryError {
  _, err := conn.SomeRequest(input)
  // ... other logic including resource.RetryableError() ...
})

if err != nil {
  return err
}

Fixed implementation:

err := resource.Retry(1 * time.Minute, func() *resource.RetryError {
  _, err := conn.SomeRequest(input)
  // ... other logic including resource.RetryableError() ...
})

if isResourceTimeoutError(err) {
  // Using = here intentionally to overwrite the err
  _, err = conn.SomeRequest(input)
}

if err != nil {
  return err
}

References

There are numerous bugs and pull requests relating to this behavior, but most recently: #7871

@bflad bflad added technical-debt Addresses areas of the codebase that need refactoring or redesign. provider Pertains to the provider itself, rather than any interaction with AWS. labels Mar 8, 2019
bflad added a commit that referenced this issue Mar 13, 2019
… after creation due to eventual consistency

References:
* #7891
* #6560
* #7873
* hashicorp/terraform#17220

The KMS service has eventual consistency considerations and the `aws_kms_alias` resource immediately tries to read the KMS alias after creation, which may not find the KMS alias. When not able to find the KMS alias, the resource logic returns an empty API object instead of an error. Since a `nil` check was already performed on the error, the error will always be `nil`. Invoking `return resource.RetryableError(nil)`  is equivalent to `return nil`. The resource during its Read performs an error check first which will skip because its `nil`, then assumes the resource has been deleted outside Terraform and triggers recreation.

Here when we cannot find a KMS alias after allowing some time for eventual consistency, we return a resource not found error and ensure we handle any timeouts due to automatic AWS Go SDK retries.

Output from acceptance testing:

```
--- PASS: TestAccAWSKmsAlias_no_name (37.63s)
--- PASS: TestAccAWSKmsAlias_name_prefix (37.80s)
--- PASS: TestAccAWSKmsAlias_multiple (38.38s)
--- PASS: TestAccAWSKmsAlias_importBasic (40.13s)
--- PASS: TestAccAWSKmsAlias_ArnDiffSuppress (43.61s)
--- PASS: TestAccAWSKmsAlias_basic (46.76s)
```
nywilken pushed a commit that referenced this issue Mar 14, 2019
… after creation due to eventual consistency (#7907)

References:
* #7891
* #6560
* #7873
* hashicorp/terraform#17220

The KMS service has eventual consistency considerations and the `aws_kms_alias` resource immediately tries to read the KMS alias after creation, which may not find the KMS alias. When not able to find the KMS alias, the resource logic returns an empty API object instead of an error. Since a `nil` check was already performed on the error, the error will always be `nil`. Invoking `return resource.RetryableError(nil)`  is equivalent to `return nil`. The resource during its Read performs an error check first which will skip because its `nil`, then assumes the resource has been deleted outside Terraform and triggers recreation.

Here when we cannot find a KMS alias after allowing some time for eventual consistency, we return a resource not found error and ensure we handle any timeouts due to automatic AWS Go SDK retries.

Output from acceptance testing:

```
--- PASS: TestAccAWSKmsAlias_no_name (37.63s)
--- PASS: TestAccAWSKmsAlias_name_prefix (37.80s)
--- PASS: TestAccAWSKmsAlias_multiple (38.38s)
--- PASS: TestAccAWSKmsAlias_importBasic (40.13s)
--- PASS: TestAccAWSKmsAlias_ArnDiffSuppress (43.61s)
--- PASS: TestAccAWSKmsAlias_basic (46.76s)
```
@ryndaniels ryndaniels self-assigned this May 13, 2019
@ryndaniels
Copy link
Contributor

HURRAY WE DID IT

@ghost
Copy link

ghost commented Nov 1, 2019

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked and limited conversation to collaborators Nov 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
provider Pertains to the provider itself, rather than any interaction with AWS. technical-debt Addresses areas of the codebase that need refactoring or redesign.
Projects
None yet
Development

No branches or pull requests

2 participants