Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generates and labels Volume*SnapshotClass #168

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

raaizik
Copy link
Contributor

@raaizik raaizik commented Jun 3, 2024

Changes

  • Have storage claim controller own VolumeGroupSnapshotClass with the required labels for RDR added to both VGSC and VSC.
  • Adds Available CRDs functionality
  • Upgrades k8s-csi
    • Requires go >= 1.22.0 which uses client-go v0.30 which works with CR v0.18 and CG v0.14

RHSTOR-5795

@raaizik raaizik force-pushed the RHSTOR-5795 branch 4 times, most recently from d68c5d2 to 7e993b3 Compare June 3, 2024 13:59
@pull-request-size pull-request-size bot added size/L and removed size/M labels Jun 3, 2024
@pull-request-size pull-request-size bot added size/M and removed size/L labels Jun 3, 2024
@raaizik raaizik force-pushed the RHSTOR-5795 branch 2 times, most recently from 38c58d7 to b148266 Compare June 4, 2024 10:21
@raaizik raaizik changed the title [WIP] Generates VolumeGroupSnapshotClass [WIP] Generates and labels Volume*SnapshotClass Jun 4, 2024
@raaizik raaizik changed the title [WIP] Generates and labels Volume*SnapshotClass Generates and labels Volume*SnapshotClass Jun 4, 2024
@raaizik
Copy link
Contributor Author

raaizik commented Jun 4, 2024

/cc @rewantsoni

@openshift-ci openshift-ci bot requested a review from rewantsoni June 4, 2024 13:50
@raaizik raaizik changed the title Generates and labels Volume*SnapshotClass [WIP] Generates and labels Volume*SnapshotClass Jun 5, 2024
@raaizik raaizik changed the title [WIP] Generates and labels Volume*SnapshotClass Generates and labels Volume*SnapshotClass Jun 6, 2024
@raaizik raaizik force-pushed the RHSTOR-5795 branch 2 times, most recently from 63d1586 to 0af9ddc Compare June 6, 2024 13:04
Comment on lines 407 to 410
labels["ramendr.openshift.io/replicationID"] = r.storageClaimHash
labels["ramendr.openshift.io/storageID"] = storageID
if resource.Name == "cephfs" {
volumeSnapshotClass = r.getCephFSVolumeSnapshotClass(data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The labels are not being applied to the cephfs or rbd VolumeSnapshotClass

Suggested change
labels["ramendr.openshift.io/replicationID"] = r.storageClaimHash
labels["ramendr.openshift.io/storageID"] = storageID
if resource.Name == "cephfs" {
volumeSnapshotClass = r.getCephFSVolumeSnapshotClass(data)
labels["ramendr.openshift.io/replicationID"] = r.storageClaimHash
labels["ramendr.openshift.io/storageID"] = storageID
if resource.Name == "cephfs" {
volumeSnapshotClass = r.getCephFSVolumeSnapshotClass(data,labels)

Comment on lines 720 to 712
if reflect.DeepEqual(existing.Parameters, volumeGroupSnapshotClass.Parameters) {
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also check for labels?

@raaizik raaizik force-pushed the RHSTOR-5795 branch 7 times, most recently from d3c6028 to bc8957d Compare June 17, 2024 07:58
@raaizik raaizik requested a review from Madhu-1 June 17, 2024 08:01
Makefile Outdated
Copy link
Contributor Author

@raaizik raaizik Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yq isn't recognized anymore. (bash: yq: command not found)

err = json.Unmarshal(resource.Data, &data)
if err != nil {
return reconcile.Result{}, fmt.Errorf("failed to unmarshal StorageClaim configuration response: %v", err)
}

// SID for RamenDR
storageID := getMD5Hash(string(r.storageClaim.UID)) + r.storageClaimHash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storage ID does not need to be distinct across clusters, only the Replication ID needs to be.
But even if the requirement was true for Storage ID, a UUID is assumed to be globally unique (by definition).

Suggested change
storageID := getMD5Hash(string(r.storageClaim.UID)) + r.storageClaimHash
storageID := r.storageClaim.UID

And please move the definition of the storage ID var outside of the for, there is no need to redefine it on every iteration as it is constant for the entire reconciliation.

Comment on lines 407 to 411
labels["ramendr.openshift.io/storageID"] = storageID
volumeSnapshotClass = r.getCephDriverVolumeSnapshotClass(resource.Name, data)
utils.AddLabels(volumeSnapshotClass, labels)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are only concerned with a single-label why do we need the labels map var in the first place

Suggested change
labels["ramendr.openshift.io/storageID"] = storageID
volumeSnapshotClass = r.getCephDriverVolumeSnapshotClass(resource.Name, data)
utils.AddLabels(volumeSnapshotClass, labels)
volumeSnapshotClass = r.getCephDriverVolumeSnapshotClass(resource.Name, data)
utils.AddLabel(volumeSnapshotClass, "ramendr.openshift.io/storageID", storageID)

Comment on lines 421 to 425
labels["ramendr.openshift.io/storageID"] = storageID
volumeGroupSnapshotClass = r.getCephDriverVolumeGroupSnapshotClass(resource.Name, data)
utils.AddLabels(volumeGroupSnapshotClass, labels)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as the one above regarding AddLabels

// generate a new clusterID for cephfs subvolumegroup, as
// storageclaim is clusterscoped resources using its
// hash as the clusterID
data["clusterID"] = r.storageClaimHash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raaizik I am confused.
If we can decide on a cluster ID locally why do we need to send it from the provider (as implemented in this PR: red-hat-storage/ocs-operator#2661)?

Copy link
Contributor Author

@raaizik raaizik Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As are we ➡️ comment
Edit: removed from provider

Comment on lines 617 to 618
labelsEqual := reflect.DeepEqual(existing.Labels, volumeSnapshotClass.Labels)
if reflect.DeepEqual(existing.Parameters, volumeSnapshotClass.Parameters) && labelsEqual {
Copy link
Contributor

@nb-ohad nb-ohad Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the code more readable by aligning both the params and label equality checks
Also, we are setting annotations so we should also consider checking equality there as well

Suggested change
labelsEqual := reflect.DeepEqual(existing.Labels, volumeSnapshotClass.Labels)
if reflect.DeepEqual(existing.Parameters, volumeSnapshotClass.Parameters) && labelsEqual {
areLabelsEqual := eflect.DeepEqual(existing.Labels, volumeSnapshotClass.Labels)
areAnnotationsEqual := eflect.DeepEqual(existing.Annotations, volumeSnapshotClass.Annotations)
areParamsEqual := reflect.DeepEqual(existing.Parameters, volumeSnapshotClass.Parameters)
if areLabelsEqual && areAnnotationsEqual && areParamsEqual {

Edit (leela): Fixed the suggestion rendering

if existing.UID != "" {
// Since we have to update the existing VolumeSnapshotClass, so we will delete the existing VolumeSnapshotClass and create a new one.
r.log.Info("VolumeSnapshotClass needs to be updated, deleting it.", "Name", existing.Name)
if labelsEqual {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check does not make sense.
At this point in the code, it is already established that an update is needed (because of label changes or params changes)

The only concern left is if we need to delete the existing one or not!

Your current approach has a couple of bugs:

  1. If the VolumeSnapshotClass didn't exist then your labelsEqual will be true and you will jump to the lease which will try to update instead of create
  2. if both labels and params changed you will jump to the else and try to update instead of deleting and creating

With that in mind, I would prefer to keep the code simple and always delete and recreate, even in cases where we don't have to delete. Trying to do it differently will just lead to many edge cases that we can easily miss and the overhead in deleting and recreating is negligible as we don't assume many label changes

Copy link
Contributor Author

@raaizik raaizik Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like your opinions are conflicting ➡️ comment @Madhu-1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nb-ohad i dont agree to delete the resource when its not required, we do have update for that case, these resources have special meaning we should use what is required when instead of having a common case to delete and recreate always if there is a change but it doesn't require to be deleted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Madhu-1 This is going to be an evergrowing pit of bugs, even the current implementation is missing very important edge cases. I am not predicting too many label, and annotation changes over the lifetime of these resources and want to avoid very common eng mistakes that result in non-so-simple bugs when an Eng touches this code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's pretty straightforward because we already know what will change and what will not change. However, I haven't reviewed this part of the code thoroughly, so I may be missing some edge cases. My suggestion is to update the metadata if it changes, and to use delete/create for anything else. If you believe there are potential issues, we can resort to using delete/create, but I still prefer updating in certain cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed back to no update.

@@ -651,6 +673,70 @@ func (r *StorageClaimReconciler) hasVolumeSnapshotContents() (bool, error) {
return false, nil
}

func (r *StorageClaimReconciler) getCephDriverVolumeGroupSnapshotClass(driverType string,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please move this function near the getCephDriverVolumeSnapshotClass function? both of them have similar concerns and I would prefer to have both of them at a similar location in the file

Comment on lines 699 to 701
if err := r.own(volumeGroupSnapshotClass); err != nil {
return fmt.Errorf("failed to own volumegroupsnapshotclass: %v", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't volumeGroupSnapshotClass (and volumeSnapshotClass) a cluster-wide resource? In that case, you cannot own it by a namespace scoped resource (StorageClaim resource). We are already working around that using the storage claim annotation. So let's remove this own call.

Comment on lines 695 to 738
func (r *StorageClaimReconciler) createOrReplaceVolumeGroupSnapshotClass(volumeGroupSnapshotClass *groupsnapapi.VolumeGroupSnapshotClass) error {
existing := &groupsnapapi.VolumeGroupSnapshotClass{}
existing.Name = r.storageClaim.Name

if err := r.own(volumeGroupSnapshotClass); err != nil {
return fmt.Errorf("failed to own volumegroupsnapshotclass: %v", err)
}

if err := r.get(existing); err != nil && !errors.IsNotFound(err) {
return fmt.Errorf("failed to get VolumeGroupSnapshotClass: %v", err)
}

// If present then compare the existing VolumeGroupSnapshotClass parameters and labels with
// the received VolumeGroupSnapshotClass parameters and labels, and only proceed if they differ.
labelsEqual := reflect.DeepEqual(existing.Labels, volumeGroupSnapshotClass.Labels)
if reflect.DeepEqual(existing.Parameters, volumeGroupSnapshotClass.Parameters) && labelsEqual {
return nil
}

if labelsEqual {
// VolumeGroupSnapshotClass already exists, but parameters have changed. Delete the existing VolumeGroupSnapshotClass and create a new one.
if existing.UID != "" {
// Since we have to update the existing VolumeGroupSnapshotClass, so we will delete the existing VolumeGroupSnapshotClass and create a new one.
r.log.Info("VolumeGroupSnapshotClass needs to be updated, deleting it.", "Name", existing.Name)

// Delete the VolumeGroupSnapshotClass.
if err := r.delete(existing); err != nil {
r.log.Error(err, "Failed to delete VolumeGroupSnapshotClass.", "Name", existing.Name)
return fmt.Errorf("failed to delete VolumeGroupSnapshotClass: %v", err)
}
}
r.log.Info("Creating VolumeGroupSnapshotClass.", "Name", existing.Name)
if err := r.Client.Create(r.ctx, volumeGroupSnapshotClass); err != nil {
return fmt.Errorf("failed to create VolumeGroupSnapshotClass: %v", err)
}
} else {
r.log.Info("Updating VolumeGroupSnapshotClass", "Name", existing.Name)
if err := r.Client.Update(r.ctx, volumeGroupSnapshotClass); err != nil {
return fmt.Errorf("failed to update VolumeGroupSnapshotClass: %v", err)
}
}

return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is almost an identical duplication of the createOrReplaceVolumeSnapshotClass.
Instead of creating a duplicate where Bugz might be repeating, can we please generalize to a
func (r *StorageClaimReconciler) createOrReplace(client object.Client) error

controllers/storageclaim_controller.go Outdated Show resolved Hide resolved
controllers/storageclaim_controller.go Outdated Show resolved Hide resolved
controllers/storageclaim_controller.go Outdated Show resolved Hide resolved
controllers/storageclaim_controller.go Show resolved Hide resolved
@raaizik raaizik force-pushed the RHSTOR-5795 branch 2 times, most recently from c042c9c to fc5325c Compare July 1, 2024 11:29
@raaizik raaizik requested a review from Madhu-1 July 1, 2024 11:30
Copy link
Member

@Madhu-1 Madhu-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some nits, LGTM

@@ -207,6 +219,20 @@ func (r *StorageClaimReconciler) Reconcile(ctx context.Context, req ctrl.Request
return result, nil
}

func (r *StorageClaimReconciler) mapCRDAvailability(crdNames ...string) map[string]bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to make crdNames as input? instead of that can we have it as var inside this method only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? We need to be able to send to this function all the CRD names that need to go through this flow path.

var driverName string
if driverType == "cephfs" {
driverName = csi.GetCephFSDriverName()
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add the case of rbd here?

if err != nil {
return fmt.Errorf("failed to create StorageClass: %v", err)
r.log.Info("Creating "+classKind+".", "Name", types.NamespacedName{
Namespace: class.GetNamespace(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

namespace is not required as its clusterScoped resouces

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And how about for StoageClass? Seems like it did have its namespace mentioned in the logs:

r.log.Info("StorageClass needs to be updated, deleting it.", "StorageClass", klog.KRef(storageClass.Namespace, existing.Name))

func (r *StorageClaimReconciler) createOrReplaceStorageClass(storageClass *storagev1.StorageClass) error {
existing := &storagev1.StorageClass{}
existing.Name = r.storageClaim.Name
func (r *StorageClaimReconciler) createOrReplaceClass(class client.Object) error {
Copy link
Member

@rewantsoni rewantsoni Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we get the existing before the switch case? existingParameters will always be empty in this case.
We should,

  1. set the name to existing, get it based on the object kind
  2. compare the existing parameter/labels/annotations to the desired
  3. if equal break
  4. own the desired
  5. if UID is not nil delete it
  6. create the desired

Could we also rename class client.Object to a better name? maybe to desired?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like class since they're all classes

@@ -207,6 +218,20 @@ func (r *StorageClaimReconciler) Reconcile(ctx context.Context, req ctrl.Request
return result, nil
}

func (r *StorageClaimReconciler) mapCRDAvailability(crdNames ...string) map[string]bool {
Copy link
Member

@rewantsoni rewantsoni Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move this to a utility package?

Signed-off-by: raaizik <132667934+raaizik@users.noreply.github.com>
Signed-off-by: raaizik <132667934+raaizik@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants