Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup RADOS namespace with forced deletion annotation #14052

Merged
merged 1 commit into from Apr 12, 2024

Conversation

sp98
Copy link
Contributor

@sp98 sp98 commented Apr 10, 2024

If RadosNamespace CR is deleted with rook.io/force-deletion annotation then start a clean up job that will delete the images and snapshots on the radosnamespace.
Steps followed for cleanup:

  • List images in pool
  • List snapshots for each image.
  • Delete snapshots for each image.
  • Move Image to trash
  • Start a task to delete the trash.

Operator logs after deleting radosNamespace CR with annotation
operator-logs.txt

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

@sp98 sp98 force-pushed the cleanup-radosnamespace branch 3 times, most recently from d6f684b to 92c96c1 Compare April 10, 2024 13:38
Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small suggestions

logger.Error(retErr)
}

// TODO: rather than deleting add tasks to trash the image?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean to add tasks to trash it? Since force deleting, seems like we should directly delete all the images as you have here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the TODO for now.
There was one suggestion by Madhu to use following command instead of deleting the images

rbd trash mv <RBD image image>
ceph rbd task trash remove <RBD image>

@@ -41,8 +41,11 @@ type CephBlockImage struct {
InfoName string `json:"name"`
}

func ListImages(context *clusterd.Context, clusterInfo *ClusterInfo, poolName string) ([]CephBlockImage, error) {
func ListImages(context *clusterd.Context, clusterInfo *ClusterInfo, poolName, namespace string) ([]CephBlockImage, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For code readability, how about creating two methods? The latter will have exactly your implementation, and the first one just calls the second one.

Suggested change
func ListImages(context *clusterd.Context, clusterInfo *ClusterInfo, poolName, namespace string) ([]CephBlockImage, error) {
func ListImagesInPool(context *clusterd.Context, clusterInfo *ClusterInfo, poolName string) ([]CephBlockImage, error) {
return ListImagesInRadosNamespace(context, clusterInfo, poolName, "")
}
func ListImagesInRadosNamespace(context *clusterd.Context, clusterInfo *ClusterInfo, poolName, namespace string) ([]CephBlockImage, error) {

@@ -116,10 +133,13 @@ func CreateImage(context *clusterd.Context, clusterInfo *ClusterInfo, name, pool
return &CephBlockImage{Name: name, Size: newSizeBytes}, nil
}

func DeleteImage(context *clusterd.Context, clusterInfo *ClusterInfo, name, poolName string) error {
func DeleteImage(context *clusterd.Context, clusterInfo *ClusterInfo, name, poolName, namespace string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I'd suggest DeleteImageInPool() and DeleteImageInRadosNamespace()

@sp98 sp98 requested a review from travisn April 10, 2024 14:53
@sp98 sp98 force-pushed the cleanup-radosnamespace branch 2 times, most recently from 806567a to 6581c50 Compare April 10, 2024 15:05
@sp98 sp98 requested a review from Madhu-1 April 10, 2024 15:05
@sp98 sp98 force-pushed the cleanup-radosnamespace branch 2 times, most recently from 2a33156 to 499bee1 Compare April 10, 2024 16:10
| Custom Resource | Ceph Resources to be cleaned up |
| -------- | ------- |
| CephFilesystemSubVolumeGroup | CSI stored RADOS OMAP details for pvc/volumesnapshots, subvolume snapshots, subvolume clones, subvolumes |
| CephBlockPoolRadosNamespace | Images and snapshots in the rados namespace|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| CephBlockPoolRadosNamespace | Images and snapshots in the rados namespace|
| CephBlockPoolRadosNamespace | Images and snapshots in the RADOS namespace|

@travisn travisn changed the title Cleanup radosnamespace Cleanup RADOS namespace with forced deletion annotation Apr 10, 2024
)

func RadosNamespaceCleanup(context *clusterd.Context, clusterInfo *client.ClusterInfo, poolName, radosNamespace string) error {
logger.Infof("starting clean up of CephBlockPoolRadosNamespace %q resources in blockpool %q", radosNamespace, poolName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update blockpool to cephblockpool ro match the name in all other places?

return ListImagesInRadosNamespace(context, clusterInfo, poolName, "")
}

func ListImagesInRadosNamespace(context *clusterd.Context, clusterInfo *ClusterInfo, poolName, namespace string) ([]CephBlockImage, error) {
args := []string{"ls", "-l", poolName}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR we could have used rbd ls -p=replicapool instead of -l

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Just used the the existing method we had for list images.

Comment on lines 42 to 61
err = cephclient.DeleteImageInRadosNamespace(context, clusterInfo, image.Name, poolName, radosNamespace)
if err != nil {
retErr = errors.Wrapf(err, "failed to delete image %q in pool %q.", image.Name, poolName)
logger.Error(retErr)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK rhis doesn't work when we have clones created from the snapshot, when we try to delete the clones on the parent image the parent will fail saying there are clones created from the parent image

sh-4.4$ rbd clone --rbd-default-clone-format 2 replicapool/test@test replicapool/test2
sh-4.4$ 
sh-4.4$ rbd ls replicapool
test
test2
sh-4.4$ rbd snap purge test
rbd: error opening default pool 'rbd'
Ensure that the default pool has been created or specify an alternate pool name.
sh-4.4$ rbd snap purge replicapool/test
Removing all snapshots: 100% complete...done.
sh-4.4$ rbd rm replicapool/test
2024-04-10T18:04:15.724+0000 7f406b33b580 -1 librbd::api::Image: remove: image has snapshots - not removing
Removing image: 0% complete...failed.
rbd: image has snapshots with linked clones - these must be deleted or flattened before the image can be removed.
sh-4.4$ 
sh-4.4$ ceph rbd task add trash remove replicapool/911d3a445d20
{"sequence": 1, "id": "2fc6afa4-1239-45f9-98e6-2a4298d8839c", "message": "Removing image replicapool/911d3a445d20 from trash", "refs": {"action": "trash remove", "pool_name": "replicapool", "pool_namespace": "", "image_id": "911d3a445d20"}}
sh-4.4$ rbd trash mv replicapool/test2
sh-4.4$ ceph rbd task add trash remove replicapool/ee91d79d6186
{"sequence": 2, "id": "fed70961-a430-4b6d-8594-f8e18f824905", "message": "Removing image replicapool/ee91d79d6186 from trash", "refs": {"action": "trash remove", "pool_name": "replicapool", "pool_namespace": "", "image_id": "ee91d79d6186"}}
sh-4.4$ ceph rbd task add trash remove replicapool/ee91d79d6186
{"sequence": 2, "id": "fed70961-a430-4b6d-8594-f8e18f824905", "message": "Removing image replicapool/ee91d79d6186 from trash", "refs": {"action": "trash remove", "pool_name": "replicapool", "pool_namespace": "", "image_id": "ee91d79d6186"}}

please test it out and see all the images are deleted from pool/namespace and also from trash.

@sp98 sp98 force-pushed the cleanup-radosnamespace branch 2 times, most recently from 2f8ede6 to 57b46b2 Compare April 12, 2024 14:23
@sp98 sp98 requested a review from Madhu-1 April 12, 2024 14:24
Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with the latest changes

Copy link
Member

@Madhu-1 Madhu-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some nits, LGTM

Comment on lines 54 to 61
err = cephclient.MoveImageToTrashInRadosNamespace(context, clusterInfo, poolName, image.Name, radosNamespace)
if err != nil {
retErr = errors.Wrapf(err, "failed to delete image %q in cephblockpool %q.", image.Name, poolName)
logger.Error(retErr)
}
err = cephclient.DeleteImageFromTrashInRadosNamespace(context, clusterInfo, poolName, image.ID, radosNamespace)
if err != nil {
retErr = errors.Wrapf(err, "failed to delete image %q in cephblockpool %q.", image.Name, poolName)
logger.Error(retErr)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above we already have a for loop for image and we can move this check to above for loop.

for _, image := range images {
err = cephclient.MoveImageToTrashInRadosNamespace(context, clusterInfo, poolName, image.Name, radosNamespace)
if err != nil {
retErr = errors.Wrapf(err, "failed to delete image %q in cephblockpool %q.", image.Name, poolName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failed to move image to trash add radosnamespace name to the error message (please take care of it in all the places

}
err = cephclient.DeleteImageFromTrashInRadosNamespace(context, clusterInfo, poolName, image.ID, radosNamespace)
if err != nil {
retErr = errors.Wrapf(err, "failed to delete image %q in cephblockpool %q.", image.Name, poolName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failed to add task to remove image from trash add radosnamespace name to the error message

Comment on lines 49 to 55
func ListImagesInPool(context *clusterd.Context, clusterInfo *ClusterInfo, poolName string) ([]CephBlockImage, error) {
return ListImagesInRadosNamespace(context, clusterInfo, poolName, "")
}

func ListImagesInRadosNamespace(context *clusterd.Context, clusterInfo *ClusterInfo, poolName, namespace string) ([]CephBlockImage, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments for these exported functions?

cmd.JsonOutput = true
buf, err := cmd.Run()
if err != nil {
return snapshots, errors.Wrapf(err, "failed to list snapshots of image %q in cephblockpool %q", imageName, poolName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add radosnamespace name to the error message?

cmd := NewRBDCommand(context, clusterInfo, args)
_, err := cmd.Run()
if err != nil {
return errors.Wrapf(err, "failed to delete snapshot %q of image %q in cephblockpool %q", snapshot, imageName, poolName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add radosnamespace name to the error message?

}

// DeleteSnapshotInRadosNamespace deletes a image snapshot created in block pool in a given rados namespace
func DeleteSnapshotInRadosNamespace(context *clusterd.Context, clusterInfo *ClusterInfo, poolName, imageName, snapshot, namespace string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove below if check as we are deleting snapshots in a rados namespace, same can be done for all other functions as well.

if namespace != "" {
		args = append(args, "--namespace", namespace)
	}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applicable for all other functions

@@ -196,3 +273,10 @@ func UnMapImage(context *clusterd.Context, clusterInfo *ClusterInfo, imageName,
func getImageSpec(name, poolName string) string {
return fmt.Sprintf("%s/%s", poolName, name)
}

func getImageSpecInRadosNamespace(name, poolName, namespace string) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name, poolName, namespace string to poolName, namespace,image string lets keep it in order

func getImageSpecInRadosNamespace(name, poolName, namespace string) string {
return fmt.Sprintf("%s/%s/%s", poolName, namespace, name)
}
func getImageSnapshotSpec(name, poolName, snapshot string) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here as well

Clean up pool images and snapshots in the
radosnamespace

Signed-off-by: sp98 <sapillai@redhat.com>
@sp98 sp98 requested a review from Madhu-1 April 12, 2024 16:36
@travisn travisn dismissed Madhu-1’s stale review April 12, 2024 17:11

Feedback addressed, or else let's follow up in a separate PR

@travisn travisn merged commit 499a09e into rook:master Apr 12, 2024
53 checks passed
travisn added a commit that referenced this pull request Apr 12, 2024
Cleanup RADOS namespace with forced deletion annotation (backport #14052)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants