Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csi: add mode csi pods to the list to force delete pod #12681

Merged
merged 2 commits into from
Aug 8, 2023

Conversation

Madhu-1
Copy link
Member

@Madhu-1 Madhu-1 commented Aug 8, 2023

Description of your changes:

Add holder pods label to the list so that Rook can get all the pods on the failed node and force delete the pods stuck in the terminating state.

Add nfs pods label to the list so that Rook can get all the pods on the failed node and force delete the pods stuck in the terminating state.

Which issue is resolved by this Pull Request:
Updates #12645

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Skip Tests for Docs: If this is only a documentation change, add the label skip-ci on the PR.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

Add holder pods label to the list so that Rook
can get all the pods on the failed node and
force delete the pods stuck in terminating state.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Add nfs pods label to the list so that Rook
can get all the pods on the failed node and
force delete the pods stuck in terminating state.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
@travisn
Copy link
Member

travisn commented Aug 8, 2023

@BlaineEXE @Madhu-1 I may have merged this too soon, would be good to discuss... If we force delete the holder pod, this could cause mounts to not be unmounted on that node and could require the admin to reboot the node if any rbd/cephfs mounts are remaining on the node. If we allow the holder pod to stay running and not force delete it, it would allow the holder pod to run as long as possible. Should we consider not adding the holder pods to this list?

@BlaineEXE
Copy link
Member

I think this force deletes pods already in terminating state right? It could still be useful to keep the network namespace "alive" until all other pods on the node that are using CSI volumes are terminated, but I'm not sure that is necessary. And definitely, once a pod is terminating, it needs to be terminated at some point.

I think that it's probably fine to delete holder pods in terminating state. I think that should only happen in an edge case like node failure or entire node eviction. And I'm not sure the added complication of figuring out all the what-ifs add very much right now. The risk would be that there are unknown corner cases where the holders might be deleted too soon, and cause ... something.

The other alternative is to not include the holder pods in the termination list. The onus then is on the user to make sure those are deleted. That could help avoid some unmount corner cases that could exist, but it means that Rook wouldn't be fully automated when handling node failure/eviction cases.

@travisn
Copy link
Member

travisn commented Aug 8, 2023

If the node were responsive, I guess we would anyway expect the pods to be deleted. This case for force deleting is only to clean up the pod if the node is not responding. So I don't expect it will make a difference to the PV cleanup on the node whether we force delete these pods or not. Sounds good to keep them in the list.

@Madhu-1
Copy link
Member Author

Madhu-1 commented Aug 9, 2023

@travisn I hope we are settled in the discussion, as there are the node critical pods that will be tried to evicted at the last and if it's stuck in termination means the sandbox pod which holds the network might already be deleted and as already mentioned above this pod will only go to termination or kubelet will try to terminate pod at the last after all the other application pods are deleted. I think we are good here.

@travisn
Copy link
Member

travisn commented Aug 9, 2023

@Madhu-1 Agreed, no more concern from my side.

travisn added a commit that referenced this pull request Aug 9, 2023
csi: add mode csi pods to the list to force delete pod (backport #12681)
travisn added a commit that referenced this pull request Aug 9, 2023
csi: add mode csi pods to the list to force delete pod (backport #12681)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants