Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build in tollerance for retryable errors with Azure Provisioner #13934

Closed
rorychatterton opened this issue Apr 25, 2017 · 3 comments
Closed

Build in tollerance for retryable errors with Azure Provisioner #13934

rorychatterton opened this issue Apr 25, 2017 · 3 comments

Comments

@rorychatterton
Copy link

rorychatterton commented Apr 25, 2017

Hi there,

Terraform is quick. So quick infact that sometimes the Azure CLI isn't ready to accept the next create/destroy command in the graph sequence.

As a result, we can see errors like:

Error Creating/Updating LoadBalancer network.LoadBalancersClient#CreateOrUpdate: Failure responding to request: StatusCode=429 -- Original Error: autorest/azure: Service returned an error. Status=429 Code="RetryableError" Message="A retryable error occured." Details=[{"code":"ReferencedResourceNotProvisioned","message":"XXX-as used by resource bluelb-private is not in Succeeded state. Resource is in Updating state and the last operation that updated/is updating the resource is InternalOperation."}]

If you run Terraform again, the resource is correctly updated and the build concludes.

It would be awesome if we could build a retry tolerance into the Azure Resources (or resources in general) for errors that are flagged as "RetryableError" to reduce manual intravention. I can hackily get around this by adding sleeps into my code, but native implementation would be amazing.

Terraform Version

9.3

Similar errors are seen in this issue. #7986 . In this case, I'm not convinced it's a graph/ordering error as the referenced lb has been completed and deployed before the error occurs. It feels like there is a lag in the Azure CLI before referencing created resources (Especially as it only happens sporadically).

@tombuildsstuff
Copy link
Contributor

Hi @rorychatt

Thanks for raising this issue :)

We're making use of the Azure SDK for Go to provide this functionality, which supports automatic retries depending on the status code, however currently this isn't supported for a HTTP Status Code of 429.

In general Azure uses the HTTP Status Code 429 to mean too many requests - however based on this issue the Networking API's appear to be using it to indicate a request can be retried. There's an issue tracking this in the Azure SDK for Go repository, where once this has been fixed - we can pull the updated SDK into Terraform :)

Thanks!

@rorychatterton
Copy link
Author

Thanks for following up - I'll keep an eye on the related issue in the GO Azure SDK.

We have some Microsoft Staff on site for our Azure tenant - if we don't get any traction I'll raise a ticket internally.

@ghost
Copy link

ghost commented Apr 9, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants