-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csi: allow force disabling holder pods #13890
Conversation
// is holder enabled for this cluster? | ||
thisHolderEnabled := (!csiHostNetworkEnabled || cluster.Spec.Network.IsMultus()) && !csiDisableHolders | ||
|
||
// Do we have a multus cluster or csi host network disabled? | ||
// If so deploy the plugin holder with the fsid attached | ||
if thisHolderEnabled { | ||
logger.Debugf("cluster %q: deploying the ceph-csi plugin holder", cluster.Name) | ||
r.clustersWithHolder = append(r.clustersWithHolder, ClusterDetail{cluster: &cephClusters.Items[i], clusterInfo: clusterInfo}) | ||
|
||
// holder pods are enabled globally if any cluster needs a holder pod | ||
holderEnabled = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic for manipulating the global holderEnabled
config has moved here, up from spec.go
.
holderEnabled = !CSIParam.EnableCSIHostNetwork | ||
|
||
for i := range r.clustersWithHolder { | ||
if r.clustersWithHolder[i].cluster.Spec.Network.IsMultus() { | ||
holderEnabled = true | ||
break | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic has moved up into the parent controller.go
logic here: https://github.com/rook/rook/pull/13890/files#r1515280100
25f0485
to
0fe41fa
Compare
I think I still have a few unit test failures that I need to resolve, and there are still some doc updates I need to make. However, I believe the code is good and the docs are mostly complete. I have been unable to test cephfs due to the issue reported here: #13739. But RBD-related testing I have done has gone well. Note that testing on single-node Minikube is much more forgiving than testing on multi-node Minikube. |
I think the fix to make sure these CI tests pass is to have them set |
716c50c
to
a387e59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes LGTM, left some comments
pkg/operator/ceph/csi/controller.go
Outdated
|
||
// if holder pods were disabled, the controller needs to update the configmap for each | ||
// cephcluster to remove the net namespace file path | ||
err = SaveCSIDriverOptions(r.context.Clientset, cluster.Namespace, clusterInfo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SaveClusterConfig will not remove the netnamespace from the configmap (from my code reading i could be wrong but we need to check on this)
we have multiple functions acting the same configmap in different ways which can easily introduce bugs, we need to refractor it (not as part of this PR but later for sure)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added updateNetNamespaces()
to that function so I could use it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -341,6 +343,7 @@ func updateCSIDriverOptions(curr, clusterKey string, | |||
} | |||
} | |||
|
|||
updateNetNamespaceFilePath(clusterKey, cc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to SaveClusterConfig here.
a387e59
to
7dc0957
Compare
This pull request has merge conflicts that must be resolved before it can be merged. @BlaineEXE please rebase it. https://rook.io/docs/rook/latest/Contributing/development-flow/#updating-your-fork |
- name: allow holder pod deployment | ||
run: sed -i "s|CSI_DISABLE_HOLDER_PODS|# CSI_DISABLE_HOLDER_PODS|g" "deploy/examples/operator.yaml" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, let's allow holder pods to get the CI working. This will make sure legacy behavior is preserved. Let's follow up in a new PR to ensure the new behavior works greenfield beyond the manual testing I've done.
7dc0957
to
56e7e85
Compare
If the scenario does not apply, skip ahead to the | ||
[Disabling Holder Pods](#disabling-holder-pods) section below. | ||
|
||
**Step 1** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added simple Step # headers in response to a note Subham gave about having a hard time following.
2f22d2b
to
30e301a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, my questions are answered. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor suggestion on the upgrade guide
CSI "holder" pods are frequently reported objects of confusion and struggle in Rook. Because of | ||
this, they are being deprecated and will be removed in Rook v1.16. | ||
|
||
If there are any CephClusters that use `network.provider: "multus"`, or if the operator config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default is to use host networking, so we don't expect many users to be affected, right? Should we add a note that they would only be affected it they changed those default network settings?
If there are any CephClusters that use `network.provider: "multus"`, or if the operator config | |
If there are any CephClusters that use the non-default network setting `network.provider: "multus"`, or if the operator config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good note. Added 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
# Deprecation note: Rook uses "holder" pods to allow CSI to connect to the multus public network | ||
# without needing hosts to the network. Holder pods are being deprecated. See issue for details: | ||
# https://github.com/rook/rook/issues/13055. New Rook deployments should set this to "true". | ||
CSI_DISABLE_HOLDER_PODS: "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we call it as HOLDER_PODS
or HOLDER_DAEMONSET
or just HOLDER
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this and settled on "holder pods" (versus "daemonset" or just "holders") since most users seem to refer to the pods themselves and take on a pod-centric view.
Add new CSI_DISABLE_HOLDER_PODS option for rook-ceph-operator. This option will disable holder pods when set to "true". In the long term, Rook plans to deprecate the holder pods entirely. This new option will allow users to choose to migrate their clusters to non-holder clusters when they are ready and able, giving them time to gracefully migrate before the holders are permanently removed. This option is set to "false" by default so that upgrading users don't have their CSI pods modified unexpectedly. Example manifests are modified to set this value to true so that new clusters will not deploy holder pods. Migrating users are provided with documentation to instruct them about the new requirements they need to satisfy to successfully remove holder pods, a procedure for migrating pods from holder to non-holder mounts, and a way to delete holder pods once they are no longer in use. When users set CSI_DISABLE_HOLDER_PODS="true", the CSI controller will no longer deploy or update the holder pod Daemonsets, but it does not delete any existing Daemonsets. This allows already-attached PVCs to continue operating normally with their network connection continuing to exist in the current holder pod. This is critical to avoid causing ia cluster-wide storage outage. More info: rook#13055 Signed-off-by: Blaine Gardner <blaine.gardner@ibm.com>
30e301a
to
4f555db
Compare
When deploying new StorageClusters, ocs-operator should apply the new Rook operator config `CSI_REMOVE_HOLDER_PODS: "true"`. This reflects the new config and default value that Rook specifies in example manifests here: rook/rook#13890 Signed-off-by: Blaine Gardner <blaine.gardner@ibm.com>
Add new CSI_DISABLE_HOLDER_PODS option for rook-ceph-operator. This option will disable holder pods when set to "true".
In the long term, Rook plans to deprecate the holder pods entirely. This new option will allow users to choose to migrate their clusters to non-holder clusters when they are ready and able, giving them time to gracefully migrate before the holders are permanently removed.
This option is set to "false" by default so that upgrading users don't have their CSI pods modified unexpectedly.
Example manifests are modified to set this value to true so that new clusters will not deploy holder pods.
Migrating users are provided with documentation to instruct them about the new requirements they need to satisfy to successfully remove holder pods, a procedure for migrating pods from holder to non-holder mounts, and a way to delete holder pods once they are no longer in use.
When users set CSI_DISABLE_HOLDER_PODS="true", the CSI controller will no longer deploy or update the holder pod Daemonsets, but it does not delete any existing Daemonsets. This allows already-attached PVCs to continue operating normally with their network connection continuing to exist in the current holder pod. This is critical to avoid causing ia cluster-wide storage outage.
More info: #13055
Checklist: