Inconsistent Timeout on CloudFront distribution creation (150+ distros) #6197

Djiit · 2018-10-18T14:55:16Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

Terraform v0.11.7

provider.aws v1.40.0

Affected Resource(s)

aws_cloudfront_distribution

Terraform Configuration Files

100 CF distributions with 50 rules each.

Debug Output

* aws_cloudfront_distribution.xxx.1: error updating CloudFront Distribution (xxx): timeout while waiting for state to become 'success' (timeout: 1m0s)
(...)

(20x or so, mileage varies).

Expected Behavior

Terraform should have applied our changes to AWS. (here, create 100-ish CloudFront (CF) distributions). Also we didn't expect a "1 minute timeout" on this resource. According to https://github.com/terraform-providers/terraform-provider-aws/blob/master/aws/resource_aws_cloudfront_distribution.go, it's 70 minute for CF distributions

Actual Behavior

Terraform errored and leave multiple CF distros on AWS, forcing us to clean all this mess by hand.

Steps to Reproduce

Create a .tf file with approx. 100 CF distributions
terraform apply

The text was updated successfully, but these errors were encountered:

bflad · 2018-10-18T15:22:19Z

Hi @Djiit 👋 Very sorry for the trouble! Would it be possible to share what change(s) you were attempting? The terraform plan output would be very helpful in troubleshooting. If you also potentially have the debug logging from Terraform, that might help determine if the timeout was occurring because of the AWS Go SDK automatically retrying or not. Its likely that when so many changes are occurring at once that the CloudFront API is throttling the requests in some fashion so we may need to tweak the logic around this.

That error message (with its 1 minute timeout for retries) occurs here in the code, which is called during updates or during deletion (to disable the distribution first).

Djiit · 2018-10-18T16:11:16Z

I'm currently destroying theses CF distros (might have to wait for a couple hours), but i'll send you that as soon as I can. The planoutput is huge (100*50 changes to apply).

Thanks for your quick answer, I appreciate this.

bflad · 2018-10-18T16:53:26Z

If its giving you lots of trouble, you can also try reducing your concurrency with the terraform plan/apply -parallelism flag -- it defaults to 10. This will obviously slow down the process, but it might workaround it not working at all or leaving behind resources.

Djiit · 2018-10-18T17:20:33Z

Hmmm the plan output encoding is weird. I'll try again tomorrow.

Djiit · 2018-10-19T09:21:43Z

So, applying again on a fresh, empty environment with parallelism set to 2, now get this error :

* aws_cloudfront_distribution.front-app.157: error creating CloudFront Distribution: CNAMEAlreadyExists: One or more of the CNAMEs you provided are already associated with a different resource.

on 20 or so resources.

But AFAIK, there isn't any resource associated with this specific CNAME. //EDIT// More on that, all the concerned distributions are here, activated and seems healthy ! But the linked route53 records are not created (as the need to be created after the distributions)

Do you have an email adresse where I can send you my plan and debug ouput (as it might contain some sensitive informations) ?

bflad · 2018-10-19T13:49:12Z

Feel free to drop a Gist which can be encrypted with the HashiCorp GPG Key.

Djiit · 2018-10-19T13:53:17Z

Thanks, i'll do that. Some new information here :

When I try to re-apply, it tells me it needs to create 21 resources out of 161 (CF distributions) (like it doesn't know there are created), the same resources that errored ("CNAMEAlreadyExists"). The "apply" then fail the the exact same 21 errors. I can ran this multiple times with the same effect.

(edited)

When I manually Deactivate, then Destroy the 21 distributions, there is no more error.

Djiit · 2018-10-19T14:04:43Z

Here is the encrypted output : https://gist.github.com/Djiit/cd40c6ad858b3ffa797ae466a3adf734, I hope I'm doing this well ahah

Djiit · 2018-11-12T14:23:31Z

Hi there, any update on this ?

FWIW when we force the parallelism to 1, it's ok. But hell this is long.

… timeout retry for AWS Go SDK retries Reference: * #6197 When using `resource.Retry()` for handling eventual consistency, it will timebox the inner function to the configured timeout, which we generally set to a minute or two. The AWS Go SDK, when it encounters recoverable conditions such as 5XX errors or throttling errors, will automatically retry within itself up to the configured session `MaxRetries` (Terraform AWS Provider `max_retries` configuration) before returning to the calling code. For heavily utilized AWS accounts, the throttling errors will cause the outer timeout, which does not give the resource the opportunity to keep retrying outside the timebox. Here we implement this final retry by checking for timeout error from `resource.Retry()` outside the timeboxing, so the AWS Go SDK can return the proper error messaging in these situations or (hopefully) finally succeed in the case of throttling. Since this error handling condition would require extraneous amounts of resources to only potentially trigger the handling, we do not generally implement covering acceptance testing for this code, but it may be a good candidate for special Terraform AWS Provider handling within a future planned Terraform Provider linting tool. Output from acceptance testing: ``` --- PASS: TestAccAWSCloudFrontDistribution_Origin_EmptyOriginID (2.08s) --- PASS: TestAccAWSCloudFrontDistribution_Origin_EmptyDomainName (2.08s) --- PASS: TestAccAWSCloudFrontDistribution_ViewerCertificate_AcmCertificateArn (1821.71s) --- PASS: TestAccAWSCloudFrontDistribution_ViewerCertificate_AcmCertificateArn_ConflictsWithCloudFrontDefaultCertificate (1821.72s) --- PASS: TestAccAWSCloudFrontDistribution_noCustomErrorResponseConfig (2086.99s) --- PASS: TestAccAWSCloudFrontDistribution_orderedCacheBehavior (2090.63s) --- PASS: TestAccAWSCloudFrontDistribution_HTTP11Config (2092.43s) --- PASS: TestAccAWSCloudFrontDistribution_noOptionalItemsConfig (2092.72s) --- PASS: TestAccAWSCloudFrontDistribution_IsIPV6EnabledConfig (2097.43s) --- PASS: TestAccAWSCloudFrontDistribution_S3Origin (2277.83s) --- PASS: TestAccAWSCloudFrontDistribution_multiOrigin (2280.49s) --- PASS: TestAccAWSCloudFrontDistribution_customOrigin (2282.05s) --- PASS: TestAccAWSCloudFrontDistribution_S3OriginWithTags (3345.90s) ```

bflad · 2019-03-04T22:46:53Z

Pull request submitted: #7809

bflad · 2019-03-05T15:59:51Z

The fix for this has been merged and will release with version 2.1.0 of the Terraform AWS Provider, likely in the next day or two.

bflad · 2019-03-08T00:12:13Z

This has been released in version 2.1.0 of the AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

ghost · 2020-03-31T17:35:49Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

bflad added the service/cloudfront Issues and PRs that pertain to the cloudfront service. label Oct 18, 2018

bflad added the bug Addresses a defect in current functionality. label Mar 4, 2019

bflad mentioned this issue Mar 4, 2019

resource/aws_cloudfront_distribution: Allow final creation and update timeout retry for AWS Go SDK retries #7809

Merged

bflad added this to the v2.1.0 milestone Mar 5, 2019

bflad closed this as completed in #7809 Mar 5, 2019

finferflu mentioned this issue Jun 17, 2019

Modifying CloudFront throws a 500 error #9015

Closed

ghost locked and limited conversation to collaborators Mar 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent Timeout on CloudFront distribution creation (150+ distros) #6197

Inconsistent Timeout on CloudFront distribution creation (150+ distros) #6197

Djiit commented Oct 18, 2018 •

edited

Loading

bflad commented Oct 18, 2018

Djiit commented Oct 18, 2018 •

edited

Loading

bflad commented Oct 18, 2018

Djiit commented Oct 18, 2018

Djiit commented Oct 19, 2018 •

edited

Loading

bflad commented Oct 19, 2018

Djiit commented Oct 19, 2018 •

edited

Loading

Djiit commented Oct 19, 2018 •

edited

Loading

Djiit commented Nov 12, 2018

bflad commented Mar 4, 2019

bflad commented Mar 5, 2019

bflad commented Mar 8, 2019

ghost commented Mar 31, 2020

Inconsistent Timeout on CloudFront distribution creation (150+ distros) #6197

Inconsistent Timeout on CloudFront distribution creation (150+ distros) #6197

Comments

Djiit commented Oct 18, 2018 • edited Loading

Community Note

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Expected Behavior

Actual Behavior

Steps to Reproduce

bflad commented Oct 18, 2018

Djiit commented Oct 18, 2018 • edited Loading

bflad commented Oct 18, 2018

Djiit commented Oct 18, 2018

Djiit commented Oct 19, 2018 • edited Loading

bflad commented Oct 19, 2018

Djiit commented Oct 19, 2018 • edited Loading

Djiit commented Oct 19, 2018 • edited Loading

Djiit commented Nov 12, 2018

bflad commented Mar 4, 2019

bflad commented Mar 5, 2019

bflad commented Mar 8, 2019

ghost commented Mar 31, 2020

Djiit commented Oct 18, 2018 •

edited

Loading

Djiit commented Oct 18, 2018 •

edited

Loading

Djiit commented Oct 19, 2018 •

edited

Loading

Djiit commented Oct 19, 2018 •

edited

Loading

Djiit commented Oct 19, 2018 •

edited

Loading