Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csi: only create CSI config configmap in CSI reconciler #14089

Merged
merged 2 commits into from
Apr 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 0 additions & 6 deletions pkg/operator/ceph/cluster/cluster_external.go
Original file line number Diff line number Diff line change
Expand Up @@ -108,12 +108,6 @@ func (c *ClusterController) configureExternalCephCluster(cluster *cluster) error
}
}

// Create CSI config map
err = csi.CreateCsiConfigMap(c.OpManagerCtx, c.namespacedName.Namespace, c.context.Clientset, cluster.ownerInfo)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the external cluster reconcile be failed at some point if the configmap doesn't exist?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this, and I don't believe so.

pkg/operator/ceph/csi/controller.go line 192 "gates" creation of the csi config map. The only metric that affects creation is whether or not any CephClusters exist. It doesn't matter if the clusters are internal or external. By my logic, if an external cluster exists, the CSI controller will create the configmap (unless there is a create error), and so it is safe for the external reconcile to wait until that happens.

It might be good for you to double check my logic just to be sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now that SaveClusterConfig() will be called below on line 133, but given your changes in that method, it will fail the reconcile and retry. Makes sense to me.

if err != nil {
return errors.Wrap(err, "failed to create csi config map")
}

// update the msgr2 flag
for _, m := range cluster.ClusterInfo.Monitors {
// m.Endpoint=10.1.115.104:3300
Expand Down
43 changes: 39 additions & 4 deletions pkg/operator/ceph/csi/cluster_config.go
Original file line number Diff line number Diff line change
Expand Up @@ -254,12 +254,50 @@ func CreateCsiConfigMap(ctx context.Context, namespace string, clientset kuberne
if !k8serrors.IsAlreadyExists(err) {
return errors.Wrapf(err, "failed to create initial csi config map %q (in %q)", configMap.Name, namespace)
}
// CM already exists; update owner refs to it if needed
// this corrects issues where the csi config map was sometimes created with CephCluster
// owner ref, which would result in the cm being deleted if that cluster was deleted
if err := updateCsiConfigMapOwnerRefs(ctx, namespace, clientset, ownerInfo); err != nil {
return errors.Wrapf(err, "failed to ensure csi config map %q (in %q) owner references", configMap.Name, namespace)
}
}

logger.Infof("successfully created csi config map %q", configMap.Name)
return nil
}

// check the owner references on the csi config map, and fix incorrect references if needed
func updateCsiConfigMapOwnerRefs(ctx context.Context, namespace string, clientset kubernetes.Interface, expectedOwnerInfo *k8sutil.OwnerInfo) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SaveClusterConfig() already has a code path that updates the configmap. Would that code path naturally update the ownerref at the same time also ensuring the configmap contents are updated? Or perhaps there is a small update to that code path? Then we don't need this new update method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair question. SaveClusterConfig() doesn't do anything with the owner ref during update -- either before or after these changes. It could be modified to update the owner ref fairly easily, but we should still add a couple unit tests to be certain that it was updating things.

Using SaveClusterConfig() would require less overall Rook code, but I think it would result in overall more k8s API calls. But neither pro/con is a dramatic change compared to the alternative.

After looking into it, it seems to me like it's slightly preferable to keep this PR as-is just to avoid the developer time needed to rework it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, let's keep it as is.

cm, err := clientset.CoreV1().ConfigMaps(namespace).Get(ctx, ConfigName, metav1.GetOptions{})
if err != nil {
return errors.Wrapf(err, "failed to fetch csi config map %q (in %q) which already exists", ConfigName, namespace)
}

existingOwners := cm.GetOwnerReferences()
var currentOwner *metav1.OwnerReference = nil
if len(existingOwners) == 1 {
currentOwner = &existingOwners[0] // currentOwner is nil unless there is exactly one owner on the cm
}
// if there is exactly one owner, and it is correct --> no fix needed
if currentOwner != nil && (currentOwner.UID == expectedOwnerInfo.GetUID()) {
logger.Debugf("csi config map %q (in %q) has the expected owner; owner id: %q", ConfigName, namespace, currentOwner.UID)
return nil
}

// must fix owner refs
logger.Infof("updating csi configmap %q (in %q) owner info", ConfigName, namespace)
cm.OwnerReferences = []metav1.OwnerReference{}
if err := expectedOwnerInfo.SetControllerReference(cm); err != nil {
return errors.Wrapf(err, "failed to set updated owner reference on csi config map %q (in %q)", ConfigName, namespace)
}
_, err = clientset.CoreV1().ConfigMaps(namespace).Update(ctx, cm, metav1.UpdateOptions{})
if err != nil {
return errors.Wrapf(err, "failed to update csi config map %q (in %q) to update its owner reference", ConfigName, namespace)
}

return nil
}

// SaveClusterConfig updates the config map used to provide ceph-csi with
// basic cluster configuration. The clusterNamespace and clusterInfo are
// used to determine what "cluster" in the config map will be updated and
Expand Down Expand Up @@ -292,10 +330,7 @@ func SaveClusterConfig(clientset kubernetes.Interface, clusterNamespace string,
configMap, err := clientset.CoreV1().ConfigMaps(csiNamespace).Get(clusterInfo.Context, ConfigName, metav1.GetOptions{})
if err != nil {
if k8serrors.IsNotFound(err) {
err = CreateCsiConfigMap(clusterInfo.Context, csiNamespace, clientset, clusterInfo.OwnerInfo)
if err != nil {
return errors.Wrap(err, "failed creating csi config map")
}
return errors.Wrap(err, "waiting for CSI config map to be created")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any use of SaveClusterConfig() will now wait until the main CSI reconcile routine creates the configmap here: https://github.com/rook/rook/pull/14089/files#r1571122479

}
return errors.Wrap(err, "failed to fetch current csi config map")
}
Expand Down
131 changes: 129 additions & 2 deletions pkg/operator/ceph/csi/cluster_config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,22 @@ limitations under the License.
package csi

import (
"context"
"encoding/json"
"reflect"
"strings"
"testing"

cephcsi "github.com/ceph/ceph-csi/api/deploy/kubernetes"
cephv1 "github.com/rook/rook/pkg/apis/ceph.rook.io/v1"
cephclient "github.com/rook/rook/pkg/daemon/ceph/client"
"github.com/rook/rook/pkg/operator/ceph/cluster/osd/topology"
"github.com/rook/rook/pkg/operator/k8sutil"
"github.com/rook/rook/pkg/operator/test"
"github.com/stretchr/testify/assert"

cephcsi "github.com/ceph/ceph-csi/api/deploy/kubernetes"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/client-go/kubernetes"
)

func unmarshal(s string) ([]CSIClusterConfigEntry, error) {
Expand Down Expand Up @@ -892,3 +897,125 @@ func TestUpdateNetNamespaceFilePath(t *testing.T) {
}
})
}

func Test_updateCsiConfigMapOwnerRefs(t *testing.T) {
ctx := context.TODO()
ns := "test-ns"
ownerController := true
blockOwnerDel := true
opDeployRef := metav1.OwnerReference{
APIVersion: "v1",
Kind: "Deployment",
Name: "rook-ceph-operator",
UID: "e55604f2-710c-4353-9a3e-9d23ea2d6eb9", // random uuid
Controller: &ownerController,
BlockOwnerDeletion: &blockOwnerDel,
}
expectedOwnerInfo := k8sutil.NewOwnerInfoWithOwnerRef(&opDeployRef, ns)

minimalCsiConfigMap := func() *corev1.ConfigMap {
return &corev1.ConfigMap{
ObjectMeta: metav1.ObjectMeta{
Name: ConfigName,
Namespace: ns,
OwnerReferences: []metav1.OwnerReference{},
},
}
}

t.Run("no configmap", func(t *testing.T) {
clientset := test.New(t, 1)
err := updateCsiConfigMapOwnerRefs(ctx, ns, clientset, expectedOwnerInfo)
assert.ErrorContains(t, err, "failed to fetch csi config map")
assert.ErrorContains(t, err, "which already exists")
})

assertOwner := func(t *testing.T, clientset kubernetes.Interface) {
t.Helper()

cm, err := clientset.CoreV1().ConfigMaps(ns).Get(ctx, ConfigName, metav1.GetOptions{})
assert.NoError(t, err)
assert.Len(t, cm.GetOwnerReferences(), 1)
cmOwner := cm.GetOwnerReferences()[0]
assert.Equal(t, "v1", cmOwner.APIVersion)
assert.Equal(t, "Deployment", cmOwner.Kind)
assert.Equal(t, "rook-ceph-operator", cmOwner.Name)
assert.Equal(t, "e55604f2-710c-4353-9a3e-9d23ea2d6eb9", string(cmOwner.UID))
assert.True(t, *cmOwner.Controller)
assert.True(t, *cmOwner.BlockOwnerDeletion)
}

t.Run("no existing owner ref", func(t *testing.T) {
clientset := test.New(t, 1)
cm := minimalCsiConfigMap()
_, err := clientset.CoreV1().ConfigMaps(ns).Create(ctx, cm, metav1.CreateOptions{})
assert.NoError(t, err)

err = updateCsiConfigMapOwnerRefs(ctx, ns, clientset, expectedOwnerInfo)
assert.NoError(t, err)
assertOwner(t, clientset)
})

t.Run("correct existing owner ref", func(t *testing.T) {
clientset := test.New(t, 1)
cm := minimalCsiConfigMap()
cm.OwnerReferences = []metav1.OwnerReference{
*opDeployRef.DeepCopy(), // correct ref
}
_, err := clientset.CoreV1().ConfigMaps(ns).Create(ctx, cm, metav1.CreateOptions{})
assert.NoError(t, err)

err = updateCsiConfigMapOwnerRefs(ctx, ns, clientset, expectedOwnerInfo)
assert.NoError(t, err)
assertOwner(t, clientset)
})

t.Run("single incorrect existing owner ref", func(t *testing.T) {
clientset := test.New(t, 1)
cm := minimalCsiConfigMap()

ownerController := true
blockOwnerDel := true
cm.OwnerReferences = []metav1.OwnerReference{
{
APIVersion: "ceph.rook.io/v1",
Kind: "CephCluster",
Name: "my-cluster",
UID: "a77777a7-777a-7777-7a7a-7a77aa7a7aa7", // random uuid
Controller: &ownerController,
BlockOwnerDeletion: &blockOwnerDel,
},
}
_, err := clientset.CoreV1().ConfigMaps(ns).Create(ctx, cm, metav1.CreateOptions{})
assert.NoError(t, err)

err = updateCsiConfigMapOwnerRefs(ctx, ns, clientset, expectedOwnerInfo)
assert.NoError(t, err)
assertOwner(t, clientset)
})

t.Run("multiple existing owner refs, one correct", func(t *testing.T) {
clientset := test.New(t, 1)
cm := minimalCsiConfigMap()

ownerController := true
blockOwnerDel := true
cm.OwnerReferences = []metav1.OwnerReference{
*opDeployRef.DeepCopy(), // correct ref
{
APIVersion: "ceph.rook.io/v1",
Kind: "CephCluster",
Name: "my-cluster",
UID: "a77777a7-777a-7777-7a7a-7a77aa7a7aa7", // random uuid
Controller: &ownerController,
BlockOwnerDeletion: &blockOwnerDel,
},
}
_, err := clientset.CoreV1().ConfigMaps(ns).Create(ctx, cm, metav1.CreateOptions{})
assert.NoError(t, err)

err = updateCsiConfigMapOwnerRefs(ctx, ns, clientset, expectedOwnerInfo)
assert.NoError(t, err)
assertOwner(t, clientset)
})
}
56 changes: 28 additions & 28 deletions pkg/operator/ceph/csi/controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,34 @@ func (r *ReconcileCSI) reconcile(request reconcile.Request) (reconcile.Result, e
return reconcile.Result{}, nil
}

ownerRef, err := k8sutil.GetDeploymentOwnerReference(r.opManagerContext, r.context.Clientset, os.Getenv(k8sutil.PodNameEnvVar), r.opConfig.OperatorNamespace)
if err != nil {
logger.Warningf("could not find deployment owner reference to assign to csi drivers. %v", err)
}
if ownerRef != nil {
blockOwnerDeletion := false
ownerRef.BlockOwnerDeletion = &blockOwnerDeletion
}

ownerInfo := k8sutil.NewOwnerInfoWithOwnerRef(ownerRef, r.opConfig.OperatorNamespace)
// create an empty config map. config map will be filled with data
// later when clusters have mons
err = CreateCsiConfigMap(r.opManagerContext, r.opConfig.OperatorNamespace, r.context.Clientset, ownerInfo)
if err != nil {
return opcontroller.ImmediateRetryResult, errors.Wrap(err, "failed creating csi config map")
}
Comment on lines +215 to +218
Copy link
Member Author

@BlaineEXE BlaineEXE Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now the only usage of CreateCsiConfigMap(). It is called with owner info being the operator deployment, as it is supposed to be.

Because other instances were called with owner info being some CephCluster, they were all removed.

This code was moved up from its location below so that it would run before reconcileSaveCSIDriverOptions() is called for multus clusters.


err = peermap.CreateOrUpdateConfig(r.opManagerContext, r.context, &peermap.PeerIDMappings{})
if err != nil {
return opcontroller.ImmediateRetryResult, errors.Wrap(err, "failed to create pool ID mapping config map")
}

exists, err := checkCsiCephConfigMapExists(r.opManagerContext, r.context.Clientset, r.opConfig.OperatorNamespace)
if err != nil {
return opcontroller.ImmediateRetryResult, errors.Wrap(err, "failed to get csi ceph.conf configmap")
}
CustomCSICephConfigExists = exists

csiHostNetworkEnabled, err := strconv.ParseBool(k8sutil.GetValue(r.opConfig.Parameters, "CSI_ENABLE_HOST_NETWORK", "true"))
if err != nil {
return reconcile.Result{}, errors.Wrap(err, "failed to parse value for 'CSI_ENABLE_HOST_NETWORK'")
Expand Down Expand Up @@ -264,34 +292,6 @@ func (r *ReconcileCSI) reconcile(request reconcile.Request) (reconcile.Result, e
}
}

ownerRef, err := k8sutil.GetDeploymentOwnerReference(r.opManagerContext, r.context.Clientset, os.Getenv(k8sutil.PodNameEnvVar), r.opConfig.OperatorNamespace)
if err != nil {
logger.Warningf("could not find deployment owner reference to assign to csi drivers. %v", err)
}
if ownerRef != nil {
blockOwnerDeletion := false
ownerRef.BlockOwnerDeletion = &blockOwnerDeletion
}

ownerInfo := k8sutil.NewOwnerInfoWithOwnerRef(ownerRef, r.opConfig.OperatorNamespace)
// create an empty config map. config map will be filled with data
// later when clusters have mons
err = CreateCsiConfigMap(r.opManagerContext, r.opConfig.OperatorNamespace, r.context.Clientset, ownerInfo)
if err != nil {
return opcontroller.ImmediateRetryResult, errors.Wrap(err, "failed creating csi config map")
}

err = peermap.CreateOrUpdateConfig(r.opManagerContext, r.context, &peermap.PeerIDMappings{})
if err != nil {
return opcontroller.ImmediateRetryResult, errors.Wrap(err, "failed to create pool ID mapping config map")
}

exists, err := checkCsiCephConfigMapExists(r.opManagerContext, r.context.Clientset, r.opConfig.OperatorNamespace)
if err != nil {
return opcontroller.ImmediateRetryResult, errors.Wrap(err, "failed to get csi ceph.conf configmap")
}
CustomCSICephConfigExists = exists

err = r.validateAndConfigureDrivers(serverVersion, ownerInfo)
if err != nil {
return opcontroller.ImmediateRetryResult, errors.Wrap(err, "failed to configure ceph csi")
Expand Down
8 changes: 7 additions & 1 deletion pkg/operator/k8sutil/resources.go
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,13 @@ func (info *OwnerInfo) SetControllerReference(object metav1.Object) error {

// GetUID gets the UID of the owner
func (info *OwnerInfo) GetUID() types.UID {
return info.owner.GetUID()
if info.owner != nil {
return info.owner.GetUID()
}
if info.ownerRef != nil {
return info.ownerRef.UID
}
return types.UID("")
}

func MergeResourceRequirements(first, second v1.ResourceRequirements) v1.ResourceRequirements {
Expand Down