-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: retry Resource Options #7932
Comments
This is a problem I have with certain Azure resources as well. Where the only workaround at the moment is to retry the pulumi up operation |
My provider has eventual consistency in their API. Sometimes a resource I created takes a few seconds to be available. Retrying "pulumi up" results in a second 40 minute provisioning wait. Retying at the provider level would reduce that to an additional 5 seconds or so. The most basic version of this could implement "attempts" and "delay" and probably satisfy the most urgent use cases. |
I could definitely do with this. Creating an app service with a custom domain on Azure requires a TXT record to be created, I guess sometimes DNS takes a while to work, so having the retry would be super useful |
This reminds me of the AWS Lambda bug, where querying to see if the Lambda caches "no" for upwards of 5+ seconds, but waiting a short amount of time before querying returns "yes" much sooner. |
our app is very large. This would save us ~30-40 mins. |
+1 I would LOVE to see something like this :) |
We are facing this very same issue with a rather large states in the terraform and I did hope that this is one of the things Pulumi would help us with - being forced to run the whole terraform apply / pulumi up because of transient network error is quite frustrating. I do get that it would probably need to be implemented in the providers and therefor its not an easy thing to achieve, but it would be an awesome competitive advantage. |
so I understand it that a regular error handling inside the pulumi program won't cut it? The pulumi program will exit anyhow, even if we have smth like |
I have a case where this would help right now. I'm creating an Aurora serverless v2 instance, and while the resource is created successfully, it sometimes takes a bit for the hostname to be advertised on DNS. Because of this, next step (which is creating a database) usually fails and I have to re-run the job for it to finish up creating the rest of resources. Having a retry mechanism on the database resource would resolve this issue and minimize the need of manual intervention. |
Similar to
customTimeouts
addcustomRetries
options to resource.In ideal world resource providers would handle retrying common errors (similar to what terraform does), but this probably won't happen for autogenerated providers.
So this is a feature request to allow providing custom retry logic to resources. Basic usage can look like this:
Upon errors on creation/update/etc pulumi would compare error code with provided config, and if there is a match - retry with given settings. If no
retriableErrors
is provided - just retry on any error.Workarounds
In some cases it is possible to add custom retry checker in
output.apply
that will wait for the condition to be resolved (see workaround here pulumi/pulumi-aws#673 (comment)). But it's not always possible, so a more general solution with retriable errors would be nice.Related issues
#3715 requested the same, but was closed by author.
pulumi/pulumi-azure#1084, pulumi/pulumi-aws#673, pulumi/pulumi-azure-native#903, https://stackoverflow.com/questions/69085796/how-to-wait-for-group-permission-to-have-been-applied - issues where resource creation fails because of eventual consistency in infrastructure, where retrying on some specific error would help.
The text was updated successfully, but these errors were encountered: