-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stuck uninstall when deleting a ClusterDeployment before provisioning is complete #114
Comments
@elad661 can you take a look in AWS console and see if you can figure out what's blocking deletion of these objects? @joelddiaz can you take a look? This keeps popping up, I think I've seen it a few times. |
I don't have access to the AWS console. Maybe @cben can help? |
@dgoodwin looks like this EIP ID is used for the VPC's NAT Gateway. I don't know why the NAT Gateway is not being deleted. |
@elad661 is the EIP tagged with the tectonicClusterID tag? |
@elad661 and while we're at it, is the NAT Gateway tagged with tectonicClusterID? |
@joelddiaz the EIP is tagged, the network interface it's attached to is not, and the NAT gateway is not tagged either. This might've been a race condition in which those resources were created but not tagged yet, and then deprovisioning was triggered. How do you suggest we avoid such cases from our side? Should we disallow deprovisioning before the cluster is fully provisioned? or will Hive be changed to become more robust? |
It's okay for the network interface to be untagged as we delete them based on being attached to a particular EIP ( https://github.com/openshift/hive/blob/master/contrib/pkg/awstagdeprovision/awstagdeprovision.go#L588-L597 ). The untagged NAT Gateways would be a problem as they are found via their tagging. We've been dumping all the objects that are untagged as dependencies in the VPC deletion path ( https://github.com/openshift/hive/blob/master/contrib/pkg/awstagdeprovision/awstagdeprovision.go#L450-L476 ). I do worry a little bit about having to care about the ordering of the deletions that are piling up in the VPC code path though... |
This should be resolved now. |
I deleted a ClusterDeployment before provisioning was complete. Now the uninstall pod is stuck trying to delete network interfaces:
These lines have been repeating for ~18 hours now.
ealfassa-test-7-8xqhk-uninstall-bvzrb 1/1 Running 0 18h
The text was updated successfully, but these errors were encountered: