Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck uninstall when deleting a ClusterDeployment before provisioning is complete #114

Closed
elad661 opened this issue Nov 28, 2018 · 8 comments

Comments

@elad661
Copy link

elad661 commented Nov 28, 2018

I deleted a ClusterDeployment before provisioning was complete. Now the uninstall pod is stuck trying to delete network interfaces:

time="2018-11-28T02:14:23Z" level=debug msg="Exiting deleting EIPs (map[tectonicClusterID:db802f7a-0af7-4c5a-983e-dc5eea2d78d2])"
time="2018-11-28T05:32:47Z" level=debug msg="Deleting internet gateways (map[tectonicClusterID:db802f7a-0af7-4c5a-983e-dc5eea2d78d2])"
time="2018-11-28T05:32:47Z" level=debug msg="deleting internet gateway: igw-0597f04b2b21e9150"
time="2018-11-28T05:32:47Z" level=debug msg="detaching Internet GW igw-0597f04b2b21e9150 from VPC vpc-02dbc31a784cf1a70"
time="2018-11-28T05:32:47Z" level=debug msg="error detaching igw: error detaching internet gateway: DependencyViolation: Network vpc-02dbc31a784cf1a70 has some mapped public address(es). Please unmap those public address(es) before detaching the gateway.\n\tstatus code: 400, request id: 3f1c9453-42d5-4680-b708-1538dc27b1d8"
time="2018-11-28T05:32:47Z" level=debug msg="Exiting deleting internet gateways (map[tectonicClusterID:db802f7a-0af7-4c5a-983e-dc5eea2d78d2])"
time="2018-11-28T05:32:57Z" level=debug msg="Deleting subnets (map[tectonicClusterID:db802f7a-0af7-4c5a-983e-dc5eea2d78d2])"
time="2018-11-28T05:32:57Z" level=debug msg="error deleting subnet: DependencyViolation: The subnet 'subnet-0aab27e3305a5f066' has dependencies and cannot be deleted.\n\tstatus code: 400, request id: ad047fa6-2244-41c5-99c4-067780dc3c8b"
time="2018-11-28T05:32:57Z" level=debug msg="error deleting subnet: DependencyViolation: The subnet 'subnet-0524453910b3f2445' has dependencies and cannot be deleted.\n\tstatus code: 400, request id: c8acf976-0942-4b66-abf0-1636aea03791"
time="2018-11-28T05:32:57Z" level=debug msg="Exiting deleting subnets (map[tectonicClusterID:db802f7a-0af7-4c5a-983e-dc5eea2d78d2])"
time="2018-11-28T05:32:57Z" level=debug msg="Deleting VPCs (map[tectonicClusterID:db802f7a-0af7-4c5a-983e-dc5eea2d78d2])"
time="2018-11-28T05:32:58Z" level=debug msg="Deleting load balancers (vpc-02dbc31a784cf1a70)"
time="2018-11-28T05:32:58Z" level=debug msg="from 16 total load balancers, 0 scheduled for deletion"
time="2018-11-28T05:32:58Z" level=debug msg="Exiting deleting load balancers (vpc-02dbc31a784cf1a70)"
time="2018-11-28T05:32:58Z" level=debug msg="Deleting V2 load balancers (vpc-02dbc31a784cf1a70)"
time="2018-11-28T05:32:58Z" level=debug msg="from 4 total V2 load balancers, 0 scheduled for deletion"
time="2018-11-28T05:32:58Z" level=debug msg="Deleting target groups (vpc-02dbc31a784cf1a70)"
time="2018-11-28T05:32:58Z" level=debug msg="from 7 total target groups, 0 scheduled for deletion"
time="2018-11-28T05:32:58Z" level=debug msg="Exiting deleting target groups (vpc-02dbc31a784cf1a70)"
time="2018-11-28T05:32:58Z" level=debug msg="Exiting deleting V2 load balancers (vpc-02dbc31a784cf1a70)"
time="2018-11-28T05:32:58Z" level=debug msg="deleting VPC: vpc-02dbc31a784cf1a70"
time="2018-11-28T05:32:58Z" level=debug msg="error deleting VPC vpc-02dbc31a784cf1a70: DependencyViolation: The vpc 'vpc-02dbc31a784cf1a70' has dependencies and cannot be deleted.\n\tstatus code: 400, request id: 623dcf6e-7f39-491d-8379-2b3b0aa84a8c"
time="2018-11-28T05:32:58Z" level=debug msg="Exiting deleting VPCs (map[tectonicClusterID:db802f7a-0af7-4c5a-983e-dc5eea2d78d2])"
time="2018-11-28T05:33:09Z" level=debug msg="Deleting EIPs (map[tectonicClusterID:db802f7a-0af7-4c5a-983e-dc5eea2d78d2])"
time="2018-11-28T05:33:09Z" level=debug msg="deleting EIP: eni-0dbf268139e5a6263"
time="2018-11-28T05:33:09Z" level=debug msg="deleting network interface: eni-0dbf268139e5a6263"
time="2018-11-28T05:33:09Z" level=debug msg="error deleting network iface: InvalidParameterValue: Network interface 'eni-0dbf268139e5a6263' is currently in use.\n\tstatus code: 400, request id: 5714fd9f-840a-4cc7-92dc-fe8b808297a4"
time="2018-11-28T05:33:09Z" level=debug msg="error deleting network iface: InvalidParameterValue: Network interface 'eni-0dbf268139e5a6263' is currently in use.\n\tstatus code: 400, request id: 5714fd9f-840a-4cc7-92dc-fe8b808297a4"
time="2018-11-28T05:33:09Z" level=debug msg="deleting EIP: eni-091fed9de78f91bd6"
time="2018-11-28T05:33:09Z" level=debug msg="deleting network interface: eni-091fed9de78f91bd6"
time="2018-11-28T05:33:10Z" level=debug msg="error deleting network iface: InvalidParameterValue: Network interface 'eni-091fed9de78f91bd6' is currently in use.\n\tstatus code: 400, request id: d3554626-ac73-4168-a020-a49d37299205"
time="2018-11-28T05:33:10Z" level=debug msg="error deleting network iface: InvalidParameterValue: Network interface 'eni-091fed9de78f91bd6' is currently in use.\n\tstatus code: 400, request id: d3554626-ac73-4168-a020-a49d37299205"
time="2018-11-28T05:33:10Z" level=debug msg="Exiting deleting EIPs (map[tectonicClusterID:db802f7a-0af7-4c5a-983e-dc5eea2d78d2])"

These lines have been repeating for ~18 hours now.
ealfassa-test-7-8xqhk-uninstall-bvzrb 1/1 Running 0 18h

@dgoodwin
Copy link
Contributor

@elad661 can you take a look in AWS console and see if you can figure out what's blocking deletion of these objects?

@joelddiaz can you take a look? This keeps popping up, I think I've seen it a few times.

@elad661
Copy link
Author

elad661 commented Nov 28, 2018

I don't have access to the AWS console. Maybe @cben can help?

@elad661
Copy link
Author

elad661 commented Nov 28, 2018

@dgoodwin looks like this EIP ID is used for the VPC's NAT Gateway. I don't know why the NAT Gateway is not being deleted.

@joelddiaz
Copy link
Contributor

@elad661 is the EIP tagged with the tectonicClusterID tag?

@joelddiaz
Copy link
Contributor

@elad661 and while we're at it, is the NAT Gateway tagged with tectonicClusterID?

@elad661
Copy link
Author

elad661 commented Dec 3, 2018

@joelddiaz the EIP is tagged, the network interface it's attached to is not, and the NAT gateway is not tagged either.

This might've been a race condition in which those resources were created but not tagged yet, and then deprovisioning was triggered.

How do you suggest we avoid such cases from our side? Should we disallow deprovisioning before the cluster is fully provisioned? or will Hive be changed to become more robust?

@joelddiaz
Copy link
Contributor

It's okay for the network interface to be untagged as we delete them based on being attached to a particular EIP ( https://github.com/openshift/hive/blob/master/contrib/pkg/awstagdeprovision/awstagdeprovision.go#L588-L597 ).

The untagged NAT Gateways would be a problem as they are found via their tagging. We've been dumping all the objects that are untagged as dependencies in the VPC deletion path ( https://github.com/openshift/hive/blob/master/contrib/pkg/awstagdeprovision/awstagdeprovision.go#L450-L476 ).

I do worry a little bit about having to care about the ordering of the deletions that are piling up in the VPC code path though...

@dgoodwin
Copy link
Contributor

This should be resolved now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants