-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seed cluster cannot recover after failed reconciliation when trying to delete network resources #461
Comments
Hello @namsral. This is indicative that there are resources on the infrastructure that fail to be deleted. These keep the subnet "busy" and openstack refuses to delete it. In my experience, the usual suspect in such cases are either loadbalancers or ports. It would be helpful if you can check what resources have not been deleted so that we can find the root cause easier. |
Thanks @kon-angelo, removing the port connecting the shoot's subnet and router resolved the issue. As both the port, subnet and router are managed by Gardener I consider this a bug but I'm not sure in which system. Although not tested, it might have been sufficient to clear the port's device_owner containing |
@namsral Its good that you managed to resolve it on your own. If you see that happening consistently then please let us know about the orphan resources you find and we can discuss about the responsible component. As a point of reference, if the issues is with Loadbalancers then its most likely the problem of openstack's cloud-controller-manager. If however the ports are used by the nodes, then it is a problem with our MCM. |
New information reveals that ports of new spawned machines prevents removal of the subnet. Seed failed on similar error:
Steps to recover failed seed:
This looks like a race condition between the removal of the subnet and the spawning of machines in the subnet. |
For future reference, the issue was caused by a syntax error in shoot manifest and was resolved by correcting the syntax in the shoot manifest and infra config. Although the shoot's subnet was correctly created and functional, the difference in notation caused Terraform to recreate the subnet during a reconciliation:
|
How to categorize this issue?
/area control-plane
/kind bug
/platform openstack
What happened:
Managed seed cluster fails reconciliation because network resources fail to delete which prevents me from updating the cluster and its extensions.
Reconciliation is attempting to delete following resources:
Status:
Error
Error in gardenlet:
What you expected to happen:
I expect the subnet not being deleted during a reconcile as a dozen managed ports are attached.
How to reproduce it (as minimally and precisely as possible):
Deploy a shoot cluster, convert it to a managed seed then force reconcile.
Anything else we need to know?:
Environment:
kubectl version
): v1.23.6The text was updated successfully, but these errors were encountered: