New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csi: only create CSI config configmap in CSI reconciler #14089
csi: only create CSI config configmap in CSI reconciler #14089
Conversation
8010a51
to
3f1eeda
Compare
3f1eeda
to
03ddb7e
Compare
err = CreateCsiConfigMap(r.opManagerContext, r.opConfig.OperatorNamespace, r.context.Clientset, ownerInfo) | ||
if err != nil { | ||
return opcontroller.ImmediateRetryResult, errors.Wrap(err, "failed creating csi config map") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now the only usage of CreateCsiConfigMap()
. It is called with owner info being the operator deployment, as it is supposed to be.
Because other instances were called with owner info being some CephCluster, they were all removed.
This code was moved up from its location below so that it would run before reconcileSaveCSIDriverOptions()
is called for multus clusters.
if err != nil { | ||
return errors.Wrap(err, "failed creating csi config map") | ||
} | ||
return errors.Wrap(err, "waiting for CSI config map to be created") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any use of SaveClusterConfig()
will now wait until the main CSI reconcile routine creates the configmap here: https://github.com/rook/rook/pull/14089/files#r1571122479
@@ -108,12 +108,6 @@ func (c *ClusterController) configureExternalCephCluster(cluster *cluster) error | |||
} | |||
} | |||
|
|||
// Create CSI config map | |||
err = csi.CreateCsiConfigMap(c.OpManagerCtx, c.namespacedName.Namespace, c.context.Clientset, cluster.ownerInfo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the external cluster reconcile be failed at some point if the configmap doesn't exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into this, and I don't believe so.
pkg/operator/ceph/csi/controller.go
line 192 "gates" creation of the csi config map. The only metric that affects creation is whether or not any CephClusters exist. It doesn't matter if the clusters are internal or external. By my logic, if an external cluster exists, the CSI controller will create the configmap (unless there is a create error), and so it is safe for the external reconcile to wait until that happens.
It might be good for you to double check my logic just to be sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now that SaveClusterConfig()
will be called below on line 133, but given your changes in that method, it will fail the reconcile and retry. Makes sense to me.
There have been some issues with non-CSI Rook controllers that are creating the CSI config configmap (`rook-ceph-csi-config`). This causes problems with the K8s OwnerReference. The primary CSI reconciler (controller) creates the configmap with the correct owner reference, which is supposed to be the operator deployment. Other instances were creating the configmap with owner references set to the CephCluster. This is a minor bug, but it can result in this configmap being deleted along with the first CephCluster that initially created it. To fix this issue, remove all instances of `CreateCsiConfigMap()` except the single usage which the CSI reconcile uses to initially create the configmap. Other controllers that might have attempted to create this configmap previously will return an error indicating that it is waiting for the configmap to be created. Signed-off-by: Blaine Gardner <blaine.gardner@ibm.com>
In the primary CSI reconcile, ensure the CSI config map (`rook-ceph-csi-config`) has the correct owner info. This corrects any pre-existing config maps that might have incorrect owner info, which has observed to include references to CephClusters. The config map should only have a single reference, and it should refer to the operator deployment. If any existing Rook clusters have a CSI config map which has a CephCluster as an owner, this change will ensure that the config map is not deleted when the CephCluster is deleted. This is especially important for any environments with multiple CephClusters installed. Signed-off-by: Blaine Gardner <blaine.gardner@ibm.com>
03ddb7e
to
605b963
Compare
@@ -108,12 +108,6 @@ func (c *ClusterController) configureExternalCephCluster(cluster *cluster) error | |||
} | |||
} | |||
|
|||
// Create CSI config map | |||
err = csi.CreateCsiConfigMap(c.OpManagerCtx, c.namespacedName.Namespace, c.context.Clientset, cluster.ownerInfo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now that SaveClusterConfig()
will be called below on line 133, but given your changes in that method, it will fail the reconcile and retry. Makes sense to me.
} | ||
|
||
logger.Infof("successfully created csi config map %q", configMap.Name) | ||
return nil | ||
} | ||
|
||
// check the owner references on the csi config map, and fix incorrect references if needed | ||
func updateCsiConfigMapOwnerRefs(ctx context.Context, namespace string, clientset kubernetes.Interface, expectedOwnerInfo *k8sutil.OwnerInfo) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SaveClusterConfig()
already has a code path that updates the configmap. Would that code path naturally update the ownerref at the same time also ensuring the configmap contents are updated? Or perhaps there is a small update to that code path? Then we don't need this new update method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair question. SaveClusterConfig()
doesn't do anything with the owner ref during update -- either before or after these changes. It could be modified to update the owner ref fairly easily, but we should still add a couple unit tests to be certain that it was updating things.
Using SaveClusterConfig()
would require less overall Rook code, but I think it would result in overall more k8s API calls. But neither pro/con is a dramatic change compared to the alternative.
After looking into it, it seems to me like it's slightly preferable to keep this PR as-is just to avoid the developer time needed to rework it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, let's keep it as is.
} | ||
|
||
logger.Infof("successfully created csi config map %q", configMap.Name) | ||
return nil | ||
} | ||
|
||
// check the owner references on the csi config map, and fix incorrect references if needed | ||
func updateCsiConfigMapOwnerRefs(ctx context.Context, namespace string, clientset kubernetes.Interface, expectedOwnerInfo *k8sutil.OwnerInfo) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, let's keep it as is.
csi: only create CSI config configmap in CSI reconciler (backport #14089)
@Madhu-1 as per @parth-gr this PR might have introduced a regression in provider mode where CSI is deployed by client-op and probably doesn't deploy this CM which rook expects to exist. Could you pls check and suggest next steps? As the intention of this PR seems to be CSI should create CM then ocs-client should do it? |
rook/pkg/operator/ceph/csi/controller.go Lines 203 to 218 in 4b9ada6
rook/pkg/operator/ceph/csi/controller.go Lines 160 to 166 in 4b9ada6
|
The workaround currently is to create the csi-configmap with the mon-endpoint JSON data, rook/pkg/operator/ceph/cluster/mon/mon.go Line 1115 in 4b9ada6
To propose the fix, we can think of making the two config maps independent of each other, And for the client operator to work well when CSI is not deployed we can make sure rook always create an empty config map with the desired structure for a quick fix. Let's discuss this in today's huddle. |
@Madhu-1 can we store the values to mon endpoint configmap, instead of csi configmap first rook/pkg/operator/ceph/cluster/mon/mon.go Line 1115 in 4b9ada6
And later csi configmap can fetch these details if needed. This will make csi configmap more independent as well as it will only store values of the specific cluster tenants. And mon endpoint configmap will be the source of truth. PS: And this can be also a proposal for updating the csi-configmap only through csi-controller, |
Created #14123 to continue the discussion... |
There have been some issues with non-CSI Rook controllers that are creating the CSI config configmap (
rook-ceph-csi-config
). This causes problems with the K8s OwnerReference. The primary CSI reconciler (controller) creates the configmap with the correct owner reference, which is supposed to be the operator deployment.Other instances were creating the configmap with owner references set to the CephCluster. This is a minor bug, but it can result in this configmap being deleted along with the first CephCluster that initially created it.
To fix this issue, remove all instances of
CreateCsiConfigMap()
except the single usage which the CSI reconcile uses to initially create the configmap. Other controllers that might have attempted to create this configmap previously will return an error indicating that it is waiting for the configmap to be created.Checklist: