Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: retry Resource Options #7932

Open
gordonbondon opened this issue Sep 9, 2021 · 9 comments
Open

Feature request: retry Resource Options #7932

gordonbondon opened this issue Sep 9, 2021 · 9 comments
Labels
area/resource-options kind/enhancement Improvements or new features

Comments

@gordonbondon
Copy link
Contributor

gordonbondon commented Sep 9, 2021

Similar to customTimeouts add customRetries options to resource.

In ideal world resource providers would handle retrying common errors (similar to what terraform does), but this probably won't happen for autogenerated providers.

So this is a feature request to allow providing custom retry logic to resources. Basic usage can look like this:

const key = new azure.dbforpostgresql.ServerKey("key", {/*...*/},
  {
    customRetries: {
      create: {
        maxAttempts: 5,
        delay: 10,
        retriableErrors: [
          "AzureKeyVaultMissingPermissions",
        ],
      }
    },
  },
);

Upon errors on creation/update/etc pulumi would compare error code with provided config, and if there is a match - retry with given settings. If no retriableErrors is provided - just retry on any error.

Workarounds

In some cases it is possible to add custom retry checker in output.apply that will wait for the condition to be resolved (see workaround here pulumi/pulumi-aws#673 (comment)). But it's not always possible, so a more general solution with retriable errors would be nice.

Related issues

#3715 requested the same, but was closed by author.

pulumi/pulumi-azure#1084, pulumi/pulumi-aws#673, pulumi/pulumi-azure-native#903, https://stackoverflow.com/questions/69085796/how-to-wait-for-group-permission-to-have-been-applied - issues where resource creation fails because of eventual consistency in infrastructure, where retrying on some specific error would help.

@gordonbondon gordonbondon added the kind/enhancement Improvements or new features label Sep 9, 2021
@emiliza emiliza added kind/enhancement Improvements or new features and removed kind/enhancement Improvements or new features labels Sep 10, 2021
@lkt82
Copy link

lkt82 commented Dec 17, 2021

This is a problem I have with certain Azure resources as well. Where the only workaround at the moment is to retry the pulumi up operation

@richard-fairthorne
Copy link

richard-fairthorne commented Dec 25, 2022

My provider has eventual consistency in their API. Sometimes a resource I created takes a few seconds to be available. Retrying "pulumi up" results in a second 40 minute provisioning wait. Retying at the provider level would reduce that to an additional 5 seconds or so.

The most basic version of this could implement "attempts" and "delay" and probably satisfy the most urgent use cases.

@jameswoodley
Copy link

I could definitely do with this. Creating an app service with a custom domain on Azure requires a TXT record to be created, I guess sometimes DNS takes a while to work, so having the retry would be super useful

@RobbieMcKinstry
Copy link
Contributor

This reminds me of the AWS Lambda bug, where querying to see if the Lambda caches "no" for upwards of 5+ seconds, but waiting a short amount of time before querying returns "yes" much sooner.

@andrewdibiasio6
Copy link

andrewdibiasio6 commented May 11, 2023

our app is very large. This would save us ~30-40 mins.

@alextricity25
Copy link

+1 I would LOVE to see something like this :)

@JiriKovar
Copy link

JiriKovar commented Nov 6, 2023

We are facing this very same issue with a rather large states in the terraform and I did hope that this is one of the things Pulumi would help us with - being forced to run the whole terraform apply / pulumi up because of transient network error is quite frustrating. I do get that it would probably need to be implemented in the providers and therefor its not an easy thing to achieve, but it would be an awesome competitive advantage.

@hellt
Copy link

hellt commented Mar 4, 2024

so I understand it that a regular error handling inside the pulumi program won't cut it? The pulumi program will exit anyhow, even if we have smth like try/except block?

@criskurtin
Copy link

I have a case where this would help right now. I'm creating an Aurora serverless v2 instance, and while the resource is created successfully, it sometimes takes a bit for the hostname to be advertised on DNS. Because of this, next step (which is creating a database) usually fails and I have to re-run the job for it to finish up creating the rest of resources. Having a retry mechanism on the database resource would resolve this issue and minimize the need of manual intervention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/resource-options kind/enhancement Improvements or new features
Projects
Status: 💡 Opportunity
Development

No branches or pull requests