-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csi: add mode csi pods to the list to force delete pod #12681
Conversation
Add holder pods label to the list so that Rook can get all the pods on the failed node and force delete the pods stuck in terminating state. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Add nfs pods label to the list so that Rook can get all the pods on the failed node and force delete the pods stuck in terminating state. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
@BlaineEXE @Madhu-1 I may have merged this too soon, would be good to discuss... If we force delete the holder pod, this could cause mounts to not be unmounted on that node and could require the admin to reboot the node if any rbd/cephfs mounts are remaining on the node. If we allow the holder pod to stay running and not force delete it, it would allow the holder pod to run as long as possible. Should we consider not adding the holder pods to this list? |
I think this force deletes pods already in terminating state right? It could still be useful to keep the network namespace "alive" until all other pods on the node that are using CSI volumes are terminated, but I'm not sure that is necessary. And definitely, once a pod is terminating, it needs to be terminated at some point. I think that it's probably fine to delete holder pods in terminating state. I think that should only happen in an edge case like node failure or entire node eviction. And I'm not sure the added complication of figuring out all the what-ifs add very much right now. The risk would be that there are unknown corner cases where the holders might be deleted too soon, and cause ... something. The other alternative is to not include the holder pods in the termination list. The onus then is on the user to make sure those are deleted. That could help avoid some unmount corner cases that could exist, but it means that Rook wouldn't be fully automated when handling node failure/eviction cases. |
If the node were responsive, I guess we would anyway expect the pods to be deleted. This case for force deleting is only to clean up the pod if the node is not responding. So I don't expect it will make a difference to the PV cleanup on the node whether we force delete these pods or not. Sounds good to keep them in the list. |
@travisn I hope we are settled in the discussion, as there are the node critical pods that will be tried to evicted at the last and if it's stuck in termination means the sandbox pod which holds the network might already be deleted and as already mentioned above this pod will only go to termination or kubelet will try to terminate pod at the last after all the other application pods are deleted. I think we are good here. |
@Madhu-1 Agreed, no more concern from my side. |
csi: add mode csi pods to the list to force delete pod (backport #12681)
csi: add mode csi pods to the list to force delete pod (backport #12681)
Description of your changes:
Add holder pods label to the list so that Rook can get all the pods on the failed node and force delete the pods stuck in the terminating state.
Add nfs pods label to the list so that Rook can get all the pods on the failed node and force delete the pods stuck in the terminating state.
Which issue is resolved by this Pull Request:
Updates #12645
Checklist:
skip-ci
on the PR.