-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RoleBindings deleted by namespace finalizer before other objects using those permissions finish stopping/finalizing #115070
Comments
@karlkfi: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig auth |
@karlkfi since you are already using finalizers, what prevents you from setting those on the RBAC objects themselves? My general recommendation would be to avoid having the cleanup code live in the same namespace, or to use things like owner refs to handle cleanup. Avoiding finalizers and having some kind of external GC is also an option. RBAC makes no attempt at protecting your access (you can delete the bindings that give you access for example). RBAC is not required or even enabled by default in Kube, so I certainly do not see core bits like namespace deletion as treating it as "special." With that I am going to close this issue (since I do not think it is actionable). /close |
@enj: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@enj other auth systems besides RBAC don't have this problem since they don't reside in the namespace. I think it is reasonable to view this as a deficiency in RBAC. People who do happen to use RBAC and run the rules and the workloads out of the same namespace (which is reasonable and/or encouraged by RBAC) seem likely to have a bad time. It is not obvious how to solve this by adding finalizers and/or owner refs to the RBAC bindings. Can you expand that comment? If it's trivial then maybe that's an OK solution, if it's not trivial at the very least users deserve documentation. I don't think the namespace deletion code is very well thought out and it wouldn't offend me very much to make it delete RBAC rules last, and I don't think it'd be that hard. |
There's three strategies I can think of: Client-side dependency apply/delete: Server-side dependency deletion enforcement: Simpler standalone option: |
/reopen I think this deserves another triage pass :) |
@lavalamp: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Sure. I got busy and was not able to respond, will try again next week. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale /assign @enj |
If a particular resource is required to complete a finalizer task, it is logical to place the same finalizer on the other resource. In a world with CRDs, cleanup (and other dependencies) can logically exist between many different resources. Additionally, such a solution would at least make a client aware should that client attempt to delete a rolebinding regardless of namespace cleanup status. It's also worth noting that liens is a good design for improving the handling dependent resources if you're looking for something even better than finalizers, though the sig isn't actively pursuing it at this time. /close |
@deads2k: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened?
When you delete a Namespace, Kubernetes automatically deletes all the objects in that namespace.
However, if you have a workload in that namespace (e.g. a custom resource) AND the workload has a finalizer AND the controller that manages that finalizer is granted permission by a RoleBinding in the same namespace, then when the Namespace is deleted, the workload object and the RoleBinding are deleted at roughly the same time. So the finalizer controller may (often) deadlock, unable to use the permissions that have been revoked and unable to remove the finalizer on the workload object. This then also blocks the namespace finalizer and namespace deletion.
The only way to recover fromt his situation is the manually remove the finalizer on the workload object. But removing finalizers manually is often unsafe and can cause memory leaks or worse. Worse, namespace tenants won't be able to delete the finalizer either, because their RoleBindings will have been deleted too; a cluster admin would have to do it.
What did you expect to happen?
This behavior is expected, given the current implimentation, but it could be improved and the problem solved by delaying the deletion of RoleBindings (and probably also Secrets, ServiceAccounts, and Roles) until after all other objects in the namespace are fully deleted (not found).
How can we reproduce it (as minimally and precisely as possible)?
This problem was observed while testing a new feature in Config Sync called deletion propagation, which deletes managed objects that have previously been applied, when the RootSync or RepoSync is deleted. To reproduce that exactly would require deploying Config Sync from HEAD of main.
But conceptually, you would follow these steps:
configsync.gke.io/deletion-propagation-policy: Foreground
annotation to the RepoSyncThe namespace finalizer will delete all the objects in the namespace, and the RepoSync will block, because the "reconciler" won't have permission to delete the objects in namespace X, nor check that they exist, nor even update the RepoSync to remove the finalizer.
Anything else we need to know?
An alternative solution would be to enhance ClusterRoleBinding to allow scoping to a namespace without being IN that namespace (like a RoleBinding). That way the RBAC would exist outside the namespace and not be subject to deletion when the namespace is deleted.
It might also be possible to reproduce this problem more simply with a Deployment that blocks its own deletion with a finalizer until it can do some cleanup (tho that's not a very realistic use case).
Kubernetes version
Any
Cloud provider
Any
OS version
Any
Install tools
No response
Container runtime (CRI) and version (if applicable)
No response
Related plugins (CNI, CSI, ...) and versions (if applicable)
No response
The text was updated successfully, but these errors were encountered: