Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csi: update network fence CR name #13615

Merged
merged 1 commit into from Feb 5, 2024
Merged

Conversation

riya-singhal31
Copy link
Contributor

@riya-singhal31 riya-singhal31 commented Jan 24, 2024

this commit updates the name of network fence CR to node name + driver name, which will allow the creation of multiple network fence per drivers on the same node.

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

@riya-singhal31 riya-singhal31 changed the title csi: update network fence CR name [WIP]csi: update network fence CR name Jan 24, 2024
Copy link
Contributor

@subhamkrai subhamkrai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update the unit test to have both pv in single node and verify both NF are created.

pkg/operator/ceph/cluster/watcher.go Outdated Show resolved Hide resolved
@@ -516,7 +521,7 @@ func (c *clientCluster) createNetworkFence(ctx context.Context, pv corev1.Persis

networkFence := &addonsv1alpha1.NetworkFence{
ObjectMeta: metav1.ObjectMeta{
Name: node.Name,
Name: node.Name + driver,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about also add a - to the name? It will be more standard K8s convention and also more readable.

Suggested change
Name: node.Name + driver,
Name: node.Name + "-" + driver,

pkg/operator/ceph/cluster/watcher.go Outdated Show resolved Hide resolved
}

err = wait.PollUntilContextTimeout(ctx, 2*time.Second, 60*time.Second, true, func(ctx context.Context) (bool, error) {
err = c.client.Get(ctx, types.NamespacedName{Name: node.Name + "rbd", Namespace: cluster.Namespace}, networkFence)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err = c.client.Get(ctx, types.NamespacedName{Name: node.Name + "rbd", Namespace: cluster.Namespace}, networkFence)
err = c.client.Get(ctx, types.NamespacedName{Name: node.Name + "-rbd", Namespace: cluster.Namespace}, networkFence)

@@ -565,7 +615,7 @@ func (c *clientCluster) unfenceAndDeleteNetworkFence(ctx context.Context, node c
}

err = wait.PollUntilContextTimeout(ctx, 2*time.Second, 60*time.Second, true, func(ctx context.Context) (bool, error) {
err = c.client.Get(ctx, types.NamespacedName{Name: node.Name, Namespace: cluster.Namespace}, networkFence)
err = c.client.Get(ctx, types.NamespacedName{Name: node.Name + "cephFS", Namespace: cluster.Namespace}, networkFence)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just keep it lower case, IMO the K8s resource naming is more consistent that way

Suggested change
err = c.client.Get(ctx, types.NamespacedName{Name: node.Name + "cephFS", Namespace: cluster.Namespace}, networkFence)
err = c.client.Get(ctx, types.NamespacedName{Name: node.Name + "-cephfs", Namespace: cluster.Namespace}, networkFence)

return nil
}

func (c *clientCluster) unfenceAndDeleteCephFSNetworkFence(ctx context.Context, node corev1.Node, cluster *cephv1.CephCluster) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a common helper method for unfencing? If the only difference between unfenceAndDeleteCephFSNetworkFence() and unfenceAndDeleteRbdNetworkFence() is the name of the CR and the log messages, then we can make the name a parameters and keep a single method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, made the changes.

@riya-singhal31 riya-singhal31 force-pushed the node-name branch 10 times, most recently from 803d301 to 40d8262 Compare January 24, 2024 19:19
@riya-singhal31
Copy link
Contributor Author

Thanks for the review Travis and Subham,
I've updated the PR for the same, PTAL
cc: @travisn , @subhamkrai

@riya-singhal31 riya-singhal31 changed the title [WIP]csi: update network fence CR name csi: update network fence CR name Jan 24, 2024
@riya-singhal31
Copy link
Contributor Author

please update the unit test to have both pv in single node and verify both NF are created.

Thanks, Updated, and working as expected.

@@ -575,7 +580,7 @@ func (c *clientCluster) unfenceAndDeleteNetworkFence(ctx context.Context, node c
return false, err
}

logger.Infof("successfully unfenced network fence cr %q, proceeding with deletion", networkFence.Name)
logger.Infof("successfully unfenced %q network fence cr %q, proceeding with deletion", driver, networkFence.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.Infof("successfully unfenced %q network fence cr %q, proceeding with deletion", driver, networkFence.Name)
logger.Infof("successfully unfenced %q network fence cr %q, proceeding with deletion", driver, networkFence.Name)

@@ -516,7 +521,7 @@ func (c *clientCluster) createNetworkFence(ctx context.Context, pv corev1.Persis

networkFence := &addonsv1alpha1.NetworkFence{
ObjectMeta: metav1.ObjectMeta{
Name: node.Name,
Name: node.Name + "-" + driver,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this name generation is used in several places, how about a helper method to implement it? Something like:

Suggested change
Name: node.Name + "-" + driver,
Name: fenceResourceName(node.Name, driver),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Updated!

Copy link
Contributor

@subhamkrai subhamkrai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few suggestions

pkg/operator/ceph/cluster/watcher.go Show resolved Hide resolved
pkg/operator/ceph/cluster/watcher.go Show resolved Hide resolved
pkg/operator/ceph/cluster/watcher_test.go Show resolved Hide resolved
Copy link
Member

@Madhu-1 Madhu-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Secret:
    Name:       rook-csi-rbd-provisioner
    Namespace:  openshift-storage
Status:
  Message:  fencing operation successful
  Result:   Succeeded
Events:
  Type     Reason                    Age   From                          Message
  ----     ------                    ----  ----                          -------
  Warning  OwnerRefInvalidNamespace  89s   garbage-collector-controller  ownerRef [ceph.rook.io/v1/CephCluster, namespace: , name: ocs-storagecluster-cephcluster, uid: db2eb584-6ebb-4865-8902-a17241e49ff2] does not exist in namespace ""
(venv) [jopinto@jopinto ceph-csi]$

what about the warning mentioned in the BZ, can we check that as well or else we might get i to garbage collection problem.

use SetControllerReference from the k8sutil/resources.go file which check for valid Reference.

@@ -343,7 +348,7 @@ func listRWOCephFSPV(listPVs *corev1.PersistentVolumeList, cluster *cephv1.CephC
continue
}

if pv.Spec.CSI.VolumeAttributes["staticVolume"] == "true" || pv.Spec.CSI.VolumeAttributes["pool"] == "" {
if pv.Spec.CSI.VolumeAttributes["staticVolume"] == "true" || pv.Spec.CSI.VolumeAttributes["fsName"] == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont need to check fsName for a static volume if staticVolume check is enough, i don't see when fsName can be set to empty.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can simplify to this:

Suggested change
if pv.Spec.CSI.VolumeAttributes["staticVolume"] == "true" || pv.Spec.CSI.VolumeAttributes["fsName"] == "" {
if pv.Spec.CSI.VolumeAttributes["staticVolume"] == "true" {

@@ -500,7 +505,11 @@ func concatenateWatcherIp(address string) string {
return watcherIP
}

func (c *clientCluster) createNetworkFence(ctx context.Context, pv corev1.PersistentVolume, node *corev1.Node, cluster *cephv1.CephCluster, cidr []string) error {
func fenceResourceName(nodeName string, driver string) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func fenceResourceName(nodeName string, driver string) string {
func fenceResourceName(nodeName , driver string) string {

@@ -185,9 +185,14 @@ func (c *clientCluster) handleNodeFailure(ctx context.Context, cluster *cephv1.C
return nil
}

err = c.unfenceAndDeleteNetworkFence(ctx, *node, cluster)
err = c.unfenceAndDeleteNetworkFence(ctx, *node, cluster, "rbd")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why hardcoding can we define a variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, updated, Thanks.

@@ -500,7 +505,11 @@ func concatenateWatcherIp(address string) string {
return watcherIP
}

func (c *clientCluster) createNetworkFence(ctx context.Context, pv corev1.PersistentVolume, node *corev1.Node, cluster *cephv1.CephCluster, cidr []string) error {
func fenceResourceName(nodeName string, driver string) string {
return nodeName + "-" + driver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use fmt.stringf

@@ -343,7 +348,7 @@ func listRWOCephFSPV(listPVs *corev1.PersistentVolumeList, cluster *cephv1.CephC
continue
}

if pv.Spec.CSI.VolumeAttributes["staticVolume"] == "true" || pv.Spec.CSI.VolumeAttributes["pool"] == "" {
if pv.Spec.CSI.VolumeAttributes["staticVolume"] == "true" || pv.Spec.CSI.VolumeAttributes["fsName"] == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can simplify to this:

Suggested change
if pv.Spec.CSI.VolumeAttributes["staticVolume"] == "true" || pv.Spec.CSI.VolumeAttributes["fsName"] == "" {
if pv.Spec.CSI.VolumeAttributes["staticVolume"] == "true" {

pkg/operator/ceph/cluster/watcher.go Show resolved Hide resolved
@riya-singhal31
Copy link
Contributor Author

Secret:
    Name:       rook-csi-rbd-provisioner
    Namespace:  openshift-storage
Status:
  Message:  fencing operation successful
  Result:   Succeeded
Events:
  Type     Reason                    Age   From                          Message
  ----     ------                    ----  ----                          -------
  Warning  OwnerRefInvalidNamespace  89s   garbage-collector-controller  ownerRef [ceph.rook.io/v1/CephCluster, namespace: , name: ocs-storagecluster-cephcluster, uid: db2eb584-6ebb-4865-8902-a17241e49ff2] does not exist in namespace ""
(venv) [jopinto@jopinto ceph-csi]$

what about the warning mentioned in the BZ, can we check that as well or else we might get i to garbage collection problem.

use SetControllerReference from the k8sutil/resources.go file which check for valid Reference.

Thanks @Madhu-1 , for letting us know about this.
Created a tracker -> #13626

this commit updates the name of network fence CR to
node name + driver name, which will allow the creation of
multiple network fence per drivers on the same node, and
modifies the unit test and deletion procedure for the
same.

Signed-off-by: Riya Singhal <rsinghal@redhat.com>
Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good, could you also confirm in local testing? The unit tests don't catch everything.

@subhamkrai
Copy link
Contributor

@riya-singhal31 please do the local testing, thanks

@riya-singhal31
Copy link
Contributor Author

riya-singhal31 commented Feb 5, 2024

Hi! Working as expected,

These are the volumes & pods for cephFS and rbd each driver.

riyasinghal@rsinghal-mac rook % kubectl get pvc,pods
NAME                               STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
persistentvolumeclaim/cephfs-pvc   Bound    pvc-7f8aa823-7ec7-4950-917e-80136e00c0b2   1Gi        RWO            rook-cephfs       36m
persistentvolumeclaim/rbd-pvc      Bound    pvc-e840b0a9-f1d7-4338-8ac0-c69dd6e471b1   1Gi        RWO            rook-ceph-block   60m

NAME                     READY   STATUS    RESTARTS   AGE
pod/csicephfs-demo-pod   1/1     Running   0          61s
pod/csirbd-demo-pod      1/1     Running   0          92s

Scheduled on same node, to test the functionality for this PR,

riyasinghal@rsinghal-mac rook % kubectl describe pod/csirbd-demo-pod 
Name:             csirbd-demo-pod
Namespace:        default
Priority:         0
Service Account:  default
Node:             ip-10-0-113-254.us-west-2.compute.internal/[10.0.113.254](http://10.0.113.254/)
Start Time:       Mon, 05 Feb 2024 18:38:27 +0530

riyasinghal@rsinghal-mac rook % kubectl describe pod csicephfs-demo-pod
Name:             csicephfs-demo-pod
Namespace:        default
Priority:         0
Service Account:  default
Node:             ip-10-0-113-254.us-west-2.compute.internal/[10.0.113.254](http://10.0.113.254/)
Start Time:       Mon, 05 Feb 2024 18:38:58 +0530

Since there are 2 drivers, so we expect 2 different Network fences to be created after applying taint to the node for each driver,

riyasinghal@rsinghal-mac rook % kubectl get [networkfences.csiaddons.openshift.io](http://networkfences.csiaddons.openshift.io/)                                  
NAME                                                DRIVER                          CIDRS               FENCESTATE   AGE    RESULT
ip-10-0-113-254.us-west-2.compute.internal-cephfs   [rook-ceph.cephfs.csi.ceph.com](http://rook-ceph.cephfs.csi.ceph.com/)   ["[100.64.0.5/32](http://100.64.0.5/32)"]   Fenced       6m6s   Succeeded
ip-10-0-113-254.us-west-2.compute.internal-rbd      [rook-ceph.rbd.csi.ceph.com](http://rook-ceph.rbd.csi.ceph.com/)      ["[100.64.0.5/32](http://100.64.0.5/32)"]   Fenced       6m7s   Succeeded

The above step was the main requirement for the concern of bug and this PR.

Thanks,
@travisn , @subhamkrai

Copy link
Contributor

@subhamkrai subhamkrai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess ready to merge once @Madhu-1 approves.

@travisn travisn dismissed parth-gr’s stale review February 5, 2024 14:10

feedback addressed

@travisn travisn merged commit 73427c9 into rook:master Feb 5, 2024
49 of 50 checks passed
mergify bot added a commit that referenced this pull request Feb 6, 2024
csi: update network fence CR name (backport #13615)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants