csi: add mode csi pods to the list to force delete pod #12681

Madhu-1 · 2023-08-08T07:25:23Z

Description of your changes:

Add holder pods label to the list so that Rook can get all the pods on the failed node and force delete the pods stuck in the terminating state.

Add nfs pods label to the list so that Rook can get all the pods on the failed node and force delete the pods stuck in the terminating state.

Which issue is resolved by this Pull Request:
Updates #12645

Checklist:

Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
Skip Tests for Docs: If this is only a documentation change, add the label skip-ci on the PR.
Reviewed the developer guide on Submitting a Pull Request
Pending release notes updated with breaking and/or notable changes for the next minor release.
Documentation has been updated, if necessary.
Unit tests have been added, if necessary.
Integration tests have been added, if necessary.

Add holder pods label to the list so that Rook can get all the pods on the failed node and force delete the pods stuck in terminating state. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>

Add nfs pods label to the list so that Rook can get all the pods on the failed node and force delete the pods stuck in terminating state. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>

travisn · 2023-08-08T17:32:56Z

@BlaineEXE @Madhu-1 I may have merged this too soon, would be good to discuss... If we force delete the holder pod, this could cause mounts to not be unmounted on that node and could require the admin to reboot the node if any rbd/cephfs mounts are remaining on the node. If we allow the holder pod to stay running and not force delete it, it would allow the holder pod to run as long as possible. Should we consider not adding the holder pods to this list?

BlaineEXE · 2023-08-08T17:46:05Z

I think this force deletes pods already in terminating state right? It could still be useful to keep the network namespace "alive" until all other pods on the node that are using CSI volumes are terminated, but I'm not sure that is necessary. And definitely, once a pod is terminating, it needs to be terminated at some point.

I think that it's probably fine to delete holder pods in terminating state. I think that should only happen in an edge case like node failure or entire node eviction. And I'm not sure the added complication of figuring out all the what-ifs add very much right now. The risk would be that there are unknown corner cases where the holders might be deleted too soon, and cause ... something.

The other alternative is to not include the holder pods in the termination list. The onus then is on the user to make sure those are deleted. That could help avoid some unmount corner cases that could exist, but it means that Rook wouldn't be fully automated when handling node failure/eviction cases.

travisn · 2023-08-08T18:05:53Z

If the node were responsive, I guess we would anyway expect the pods to be deleted. This case for force deleting is only to clean up the pod if the node is not responding. So I don't expect it will make a difference to the PV cleanup on the node whether we force delete these pods or not. Sounds good to keep them in the list.

Madhu-1 · 2023-08-09T06:42:53Z

@travisn I hope we are settled in the discussion, as there are the node critical pods that will be tried to evicted at the last and if it's stuck in termination means the sandbox pod which holds the network might already be deleted and as already mentioned above this pod will only go to termination or kubelet will try to terminate pod at the last after all the other application pods are deleted. I think we are good here.

travisn · 2023-08-09T14:36:57Z

@Madhu-1 Agreed, no more concern from my side.

csi: add mode csi pods to the list to force delete pod (backport #12681)

Madhu-1 added 2 commits August 8, 2023 09:24

csi: add holder pods to the list to delete pod

22bfc9d

Add holder pods label to the list so that Rook can get all the pods on the failed node and force delete the pods stuck in terminating state. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>

csi: add nfs pods to the list to delete pod

29faca9

Add nfs pods label to the list so that Rook can get all the pods on the failed node and force delete the pods stuck in terminating state. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>

Madhu-1 requested a review from travisn August 8, 2023 07:25

Madhu-1 mentioned this pull request Aug 8, 2023

CSI holder pods are not deleted when all ceph clusters are deleted. #12645

Closed

travisn approved these changes Aug 8, 2023

View reviewed changes

travisn added backport-release-1.11 backport-release-1.12 labels Aug 8, 2023

travisn merged commit 3f01e8f into rook:master Aug 8, 2023
45 of 49 checks passed

This was referenced Aug 8, 2023

csi: add mode csi pods to the list to force delete pod (backport #12681) #12684

Merged

csi: add mode csi pods to the list to force delete pod (backport #12681) #12685

Merged

travisn added a commit that referenced this pull request Aug 9, 2023

Merge pull request #12685 from rook/mergify/bp/release-1.12/pr-12681

17ad5fb

csi: add mode csi pods to the list to force delete pod (backport #12681)

travisn added a commit that referenced this pull request Aug 9, 2023

Merge pull request #12684 from rook/mergify/bp/release-1.11/pr-12681

19d19ee

csi: add mode csi pods to the list to force delete pod (backport #12681)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csi: add mode csi pods to the list to force delete pod #12681

csi: add mode csi pods to the list to force delete pod #12681

Madhu-1 commented Aug 8, 2023

travisn commented Aug 8, 2023

BlaineEXE commented Aug 8, 2023

travisn commented Aug 8, 2023

Madhu-1 commented Aug 9, 2023

travisn commented Aug 9, 2023

csi: add mode csi pods to the list to force delete pod #12681

csi: add mode csi pods to the list to force delete pod #12681

Conversation

Madhu-1 commented Aug 8, 2023

travisn commented Aug 8, 2023

BlaineEXE commented Aug 8, 2023

travisn commented Aug 8, 2023

Madhu-1 commented Aug 9, 2023

travisn commented Aug 9, 2023