Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error deleting VPC after deleting members/dependencies #105

Closed
snipebin opened this issue May 29, 2020 · 5 comments
Closed

Error deleting VPC after deleting members/dependencies #105

snipebin opened this issue May 29, 2020 · 5 comments

Comments

@snipebin
Copy link

Hi Pulumi team,

I'm getting the following error when running pulumi destroy

0977G3QN:infra vinnie$ pulumi destroy
Previewing destroy (nonprod):
     Type                                     Name                     Plan       
 -   pulumi:pulumi:Stack                      infra-nonprod            delete     
 -   ├─ digitalocean:index:DatabaseCluster    postgres-cluster-shared  delete     
 -   ├─ digitalocean:index:KubernetesCluster  kube-cluster             delete     
 -   └─ digitalocean:index:Vpc                vpc                      delete     
 
Resources:
    - 4 to delete

Do you want to perform this destroy? yes
Destroying (nonprod):
     Type                                     Name                     Status                  Info
     pulumi:pulumi:Stack                      infra-nonprod            **failed**              1 error
 -   ├─ digitalocean:index:DatabaseCluster    postgres-cluster-shared  deleted                 
 -   ├─ digitalocean:index:KubernetesCluster  kube-cluster             deleted                 
 -   └─ digitalocean:index:Vpc                vpc                      **deleting failed**     1 error
 
Diagnostics:
  pulumi:pulumi:Stack (infra-nonprod):
    error: update failed
 
  digitalocean:index:Vpc (vpc):
    error: deleting urn:pulumi:nonprod::infra::digitalocean:index/vpc:Vpc::vpc: DELETE https://api.digitalocean.com/v2/vpcs/2c1c33f9-c398-44d8-a7b8-85e1c679b139: 403 (request "df3113c3-889c-4bd4-8b51-bb568b320cd8") Can not delete VPC with members
 
Resources:
    - 2 deleted

Duration: 1m3s

As you can see, DO returns an error while trying to delete the VPC although members have been deleted. I can confirm through DO dashboard that members were in fact deleted.

Seems to be a problem with D.O.

Running a second time deletes the VPC without error - note the dependencies do show up in the diff.

0977G3QN:infra vinnie$ pulumi destroy
Previewing destroy (nonprod):
     Type                       Name           Plan       
 -   pulumi:pulumi:Stack        infra-nonprod  delete     
 -   └─ digitalocean:index:Vpc  vpc            delete     
 
Resources:
    - 2 to delete

Do you want to perform this destroy? yes
Destroying (nonprod):
     Type                       Name           Status      
 -   pulumi:pulumi:Stack        infra-nonprod  deleted     
 -   └─ digitalocean:index:Vpc  vpc            deleted     
 
Resources:
    - 2 deleted

Duration: 2s

Here's the pulumi program:

pulumi.Run(func(ctx *pulumi.Context) error {
		c := config.New(ctx, "")
    stackNameStr := pulumi.String(ctx.Stack())
    doRegionStr := pulumi.String("nyc3")

		// Create VPC where Kube and the Database will be deployed into
		vpc, err := do.NewVpc(ctx, "vpc", &do.VpcArgs{
		  Name: stackNameStr,
      Region: doRegionStr,
		})

    if err != nil {
      return err
    }

		// Create kube cluster
		_, err = do.NewKubernetesCluster(ctx, "kube-cluster", &do.KubernetesClusterArgs{
			Name: stackNameStr,
			Region: doRegionStr,
			Version: pulumi.String("1.17.5-do.0"),
			NodePool: &do.KubernetesClusterNodePoolArgs{
				Name: pulumi.String("std"),
				Size: pulumi.String("s-1vcpu-2gb"),
				NodeCount: pulumi.Int(c.RequireInt("kube-cluster-std-node-count")),
			},
			// NOTE: D.O. DOES NOT SUPPORT MIGRATING CLUSTER BETWEEN VPCs
			// MUST BE DESTROYED AND RE-PROVISIONED FROM SCRATCH
			VpcUuid: vpc.ID(),
		}, pulumi.DependsOn([]pulumi.Resource{vpc}))

    if err != nil {
      return err
    }

		_, err = do.NewDatabaseCluster(ctx, "postgres-cluster-shared", &do.DatabaseClusterArgs{
		  Region: doRegionStr,
			Name: stackNameStr,
			Engine: pulumi.String("pg"),
			Size: pulumi.String("db-s-1vcpu-2gb"),
			NodeCount: pulumi.Int(c.RequireInt("postgres-cluster-shared-node-count")),
      PrivateNetworkUuid: vpc.ID(),
      // Postgres Version
      Version: pulumi.String("11"),
		}, pulumi.DependsOn([]pulumi.Resource{vpc}))

    if err != nil {
      return err
    }

		return nil
	})
@leezen
Copy link

leezen commented May 29, 2020

Based on the above, this looks like a DO issue as you stated? It would appear they have some kind of eventual consistency issue where their control plane still thinks that the members exist despite being deleted. I'm not sure there's anything we can easily do here. Please let me know if you feel otherwise.

@leezen leezen closed this as completed May 29, 2020
@snipebin
Copy link
Author

snipebin commented Jun 6, 2020

What do you think about automatic retry with a delay? Waiting 30 seconds usually does it. Another option is to treat a the VPC delete operation as a long-polling tasking with a timeout - Since Pulumi tracks dependencies and there's a deterministic expectation that it should be deletable given deps have been deleted.

I get a 100% failure rate running pulumi destroy on first try today - I believe either approach would result in a significant improvement.

@leezen
Copy link

leezen commented Jun 7, 2020

That makes sense -- though, I think trying to do this generally via something like pulumi/pulumi#1691 where you could add the wait yourself would be preferable to doing it specifically for DO, so let's use that to track?

@snipebin
Copy link
Author

That can work. However I do think every user will run into the same problem so it may make sense for this lib to implement that.

@georges-journeyvpn
Copy link

Stumbled across this bug myself, but I'm pretty sure it's not strictly a DO problem as I've seen it also with Vultr. The real problem seems to be a race condition where pulumi isn't waiting for the resources to be released from the VPC before trying to delete it....they're still in flight when the VPC destroy starts and it errors out. This is why when you look at the console (or run it again), it works successfully because by then the other resources have finished being removed. Retrying is an ugly hack to the situation, we need a more deterministic advancement such that the VPC isn't even attempted to be destroyed prior to all the items in it being CONFIRMED as destroyed...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants