csi: set ceph cluster as ControllerRef for holder ds #12724

Madhu-1 · 2023-08-15T10:46:07Z

Description of your changes:

Setting the ceph cluster as the ControllerRef for the holder daemonset set so that when a ceph cluster is deleted the holder daemoset will also get deleted.

Which issue is resolved by this Pull Request:
Resolves #12645

Checklist:

Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
Skip Tests for Docs: If this is only a documentation change, add the label skip-ci on the PR.
Reviewed the developer guide on Submitting a Pull Request
Pending release notes updated with breaking and/or notable changes for the next minor release.
Documentation has been updated, if necessary.
Unit tests have been added, if necessary.
Integration tests have been added, if necessary.

Madhu-1 · 2023-08-15T10:47:07Z

pkg/operator/ceph/csi/spec.go

-	// DO NOT set owner reference on ceph plugin holder daemonset, this DS must never restart unless
-	// the entire node is rebooted


as having the ceph cluster as the owner it will not restart the DS.

@BlaineEXE This comment was likely referring to not setting the ownerRef to the operator deployment, right? Setting the cluster CR as the ownerRef makes sense to me though.

I don't think so. If there are multiple CephClusters, what happens when one of them is deleted? To my logic, the CSI drivers in general shouldn't have any owner set to a CephCluster because they are operator-/cluster-scoped. We don't want to accidentally delete the CSI driver when one of many CephClusters are removed.

Oh of course, the holder daemonset has the same scope as the csi driver, which means it may be serving multiple clusters. So the question is what (if any) resource can the holder set as its owner reference so that it will be cleaned up during uninstall, but only after all cephcluster resources.

@BlaineEXE @travisn These are the special daemonset holder as you know and they get created in the ceph cluster namespace not on the operator namespace like cephcsi plugin daemonset. they are unique per ceph cluster as multiple ceph clusters can be created with different multus networks.

Actually, there is still an issue with this... Even though a separate daemonset is created for each cephcluster, they are still created in the operator namespace. The holder daemonsets are named csi-rbdplugin-holder-<cluster> so they can all be created in the same namespace with the operator. But in this case, the owner references cannot be set on the cephcluster in a different namespace, or else it results in this error:

2023-08-21 16:39:38.454427 E | ceph-csi: failed to reconcile failed to configure ceph csi: failed to start ceph csi drivers: failed to configure holder: failed to configure holder "rbd" for "my-cluster"/"other-ceph": failed to set owner reference to plugin holder "rook-ceph.rbd.csi.ceph.com": cross-namespace owner references are disallowed, owner's namespace other-ceph, obj's namespace rook-ceph

It may be a breaking change at this point, but can the holder daemonsets be created in the same namespace as the cephcluster? That would be necessary for setting the ownerReference like this, except that the CSI driver likely requires the holder pod to be in the same namespace as itself.

For a scoped fix, we could add the ownerReference to the holder daemonset only if is in the same namespace as the operator. This should cover the common scenario anyway.

@travisn Yes you are correct, Thanks for catching it, i don't remember why we choose to create the holder daemonset in the operator namespace other than the cluster namespace, as for the scoped fix am adding a check for it and i think we might need to revisit and check why it's not getting deployed in cluster namespace.

@leseb any idea why we choose to create a holder daemonset set in the operator namespace not in the cluster namespace?

@Madhu-1 Honestly, I don't remember :( Maybe we intended to exercise caution and allow sufficient time for this feature to stabilize before proceeding with a manual deletion? 🤷🏻‍♂️

travisn · 2023-08-21T16:43:46Z

pkg/operator/ceph/csi/spec.go

-	// DO NOT set owner reference on ceph plugin holder daemonset, this DS must never restart unless
-	// the entire node is rebooted


Actually, there is still an issue with this... Even though a separate daemonset is created for each cephcluster, they are still created in the operator namespace. The holder daemonsets are named csi-rbdplugin-holder-<cluster> so they can all be created in the same namespace with the operator. But in this case, the owner references cannot be set on the cephcluster in a different namespace, or else it results in this error:

2023-08-21 16:39:38.454427 E | ceph-csi: failed to reconcile failed to configure ceph csi: failed to start ceph csi drivers: failed to configure holder: failed to configure holder "rbd" for "my-cluster"/"other-ceph": failed to set owner reference to plugin holder "rook-ceph.rbd.csi.ceph.com": cross-namespace owner references are disallowed, owner's namespace other-ceph, obj's namespace rook-ceph

It may be a breaking change at this point, but can the holder daemonsets be created in the same namespace as the cephcluster? That would be necessary for setting the ownerReference like this, except that the CSI driver likely requires the holder pod to be in the same namespace as itself.

Setting ceph cluster as the ControllerRef for the holder daemonset set so that when a ceph cluster is deleted the holder daemoset will also gets deleted. fixes: rook#12645 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>

csi: set ceph cluster as ControllerRef for holder ds (backport #12724)

Madhu-1 requested a review from travisn August 15, 2023 10:46

Madhu-1 commented Aug 15, 2023

View reviewed changes

Madhu-1 force-pushed the close-12645 branch 3 times, most recently from 7131d22 to 163730c Compare August 15, 2023 12:02

Madhu-1 requested a review from BlaineEXE August 21, 2023 10:43

BlaineEXE approved these changes Aug 21, 2023

View reviewed changes

parth-gr approved these changes Aug 21, 2023

View reviewed changes

travisn requested changes Aug 21, 2023

View reviewed changes

csi: set ceph cluster as ControllerRef for holder

d7a4f00

Setting ceph cluster as the ControllerRef for the holder daemonset set so that when a ceph cluster is deleted the holder daemoset will also gets deleted. fixes: rook#12645 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>

Madhu-1 force-pushed the close-12645 branch from 163730c to d7a4f00 Compare August 22, 2023 08:41

Madhu-1 requested a review from travisn August 22, 2023 08:41

travisn approved these changes Aug 22, 2023

View reviewed changes

travisn added the backport-release-1.12 label Aug 22, 2023

travisn merged commit 9803bd5 into rook:master Aug 22, 2023
46 of 49 checks passed

mergify bot mentioned this pull request Aug 22, 2023

csi: set ceph cluster as ControllerRef for holder ds (backport #12724) #12769

Merged

travisn added a commit that referenced this pull request Aug 28, 2023

Merge pull request #12769 from rook/mergify/bp/release-1.12/pr-12724

71c151c

csi: set ceph cluster as ControllerRef for holder ds (backport #12724)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csi: set ceph cluster as ControllerRef for holder ds #12724

csi: set ceph cluster as ControllerRef for holder ds #12724

Madhu-1 commented Aug 15, 2023

Madhu-1 Aug 15, 2023

travisn Aug 15, 2023

BlaineEXE Aug 15, 2023

travisn Aug 15, 2023

Madhu-1 Aug 16, 2023

travisn Aug 21, 2023 •

edited

travisn Aug 21, 2023

Madhu-1 Aug 22, 2023

Madhu-1 Aug 22, 2023

leseb Aug 28, 2023 •

edited

travisn Aug 21, 2023 •

edited

		// DO NOT set owner reference on ceph plugin holder daemonset, this DS must never restart unless
		// the entire node is rebooted

csi: set ceph cluster as ControllerRef for holder ds #12724

csi: set ceph cluster as ControllerRef for holder ds #12724

Conversation

Madhu-1 commented Aug 15, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

travisn Aug 21, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leseb Aug 28, 2023 • edited

Choose a reason for hiding this comment

travisn Aug 21, 2023 • edited

Choose a reason for hiding this comment

travisn Aug 21, 2023 •

edited

leseb Aug 28, 2023 •

edited

travisn Aug 21, 2023 •

edited