Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csi: set ceph cluster as ControllerRef for holder ds #12724

Merged
merged 1 commit into from
Aug 22, 2023

Conversation

Madhu-1
Copy link
Member

@Madhu-1 Madhu-1 commented Aug 15, 2023

Description of your changes:

Setting the ceph cluster as the ControllerRef for the holder daemonset set so that when a ceph cluster is deleted the holder daemoset will also get deleted.

Which issue is resolved by this Pull Request:
Resolves #12645

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Skip Tests for Docs: If this is only a documentation change, add the label skip-ci on the PR.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

@Madhu-1 Madhu-1 requested a review from travisn August 15, 2023 10:46
Comment on lines -805 to -806
// DO NOT set owner reference on ceph plugin holder daemonset, this DS must never restart unless
// the entire node is rebooted
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as having the ceph cluster as the owner it will not restart the DS.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BlaineEXE This comment was likely referring to not setting the ownerRef to the operator deployment, right? Setting the cluster CR as the ownerRef makes sense to me though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. If there are multiple CephClusters, what happens when one of them is deleted? To my logic, the CSI drivers in general shouldn't have any owner set to a CephCluster because they are operator-/cluster-scoped. We don't want to accidentally delete the CSI driver when one of many CephClusters are removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh of course, the holder daemonset has the same scope as the csi driver, which means it may be serving multiple clusters. So the question is what (if any) resource can the holder set as its owner reference so that it will be cleaned up during uninstall, but only after all cephcluster resources.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BlaineEXE @travisn These are the special daemonset holder as you know and they get created in the ceph cluster namespace not on the operator namespace like cephcsi plugin daemonset. they are unique per ceph cluster as multiple ceph clusters can be created with different multus networks.

Copy link
Member

@travisn travisn Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there is still an issue with this... Even though a separate daemonset is created for each cephcluster, they are still created in the operator namespace. The holder daemonsets are named csi-rbdplugin-holder-<cluster> so they can all be created in the same namespace with the operator. But in this case, the owner references cannot be set on the cephcluster in a different namespace, or else it results in this error:

2023-08-21 16:39:38.454427 E | ceph-csi: failed to reconcile failed to configure ceph csi: failed to start ceph csi drivers: 
failed to configure holder: failed to configure holder "rbd" for "my-cluster"/"other-ceph": 
failed to set owner reference to plugin holder "rook-ceph.rbd.csi.ceph.com": 
cross-namespace owner references are disallowed, owner's namespace other-ceph, obj's namespace rook-ceph

It may be a breaking change at this point, but can the holder daemonsets be created in the same namespace as the cephcluster? That would be necessary for setting the ownerReference like this, except that the CSI driver likely requires the holder pod to be in the same namespace as itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a scoped fix, we could add the ownerReference to the holder daemonset only if is in the same namespace as the operator. This should cover the common scenario anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travisn Yes you are correct, Thanks for catching it, i don't remember why we choose to create the holder daemonset in the operator namespace other than the cluster namespace, as for the scoped fix am adding a check for it and i think we might need to revisit and check why it's not getting deployed in cluster namespace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leseb any idea why we choose to create a holder daemonset set in the operator namespace not in the cluster namespace?

Copy link
Member

@leseb leseb Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Madhu-1 Honestly, I don't remember :( Maybe we intended to exercise caution and allow sufficient time for this feature to stabilize before proceeding with a manual deletion? 🤷🏻‍♂️

@Madhu-1 Madhu-1 force-pushed the close-12645 branch 3 times, most recently from 7131d22 to 163730c Compare August 15, 2023 12:02
@Madhu-1 Madhu-1 requested a review from BlaineEXE August 21, 2023 10:43
Comment on lines -805 to -806
// DO NOT set owner reference on ceph plugin holder daemonset, this DS must never restart unless
// the entire node is rebooted
Copy link
Member

@travisn travisn Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there is still an issue with this... Even though a separate daemonset is created for each cephcluster, they are still created in the operator namespace. The holder daemonsets are named csi-rbdplugin-holder-<cluster> so they can all be created in the same namespace with the operator. But in this case, the owner references cannot be set on the cephcluster in a different namespace, or else it results in this error:

2023-08-21 16:39:38.454427 E | ceph-csi: failed to reconcile failed to configure ceph csi: failed to start ceph csi drivers: 
failed to configure holder: failed to configure holder "rbd" for "my-cluster"/"other-ceph": 
failed to set owner reference to plugin holder "rook-ceph.rbd.csi.ceph.com": 
cross-namespace owner references are disallowed, owner's namespace other-ceph, obj's namespace rook-ceph

It may be a breaking change at this point, but can the holder daemonsets be created in the same namespace as the cephcluster? That would be necessary for setting the ownerReference like this, except that the CSI driver likely requires the holder pod to be in the same namespace as itself.

Setting ceph cluster as the ControllerRef
for the holder daemonset set so that when
a ceph cluster is deleted the holder daemoset
will also gets deleted.

fixes: rook#12645

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
@travisn travisn merged commit 9803bd5 into rook:master Aug 22, 2023
46 of 49 checks passed
travisn added a commit that referenced this pull request Aug 28, 2023
csi: set ceph cluster as ControllerRef for holder ds (backport #12724)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CSI holder pods are not deleted when all ceph clusters are deleted.
5 participants