osd: replace existing OSDs to use new store #12770

sp98 · 2023-08-22T14:32:54Z

This is a follow up for #12507

fixes replacing of encrypted OSDs.
Updates OSD status at the end of main reconcile, rather than every 60 seconds via the OSDhealthChecker

Description of your changes:

Which issue is resolved by this Pull Request:
Resolves #

Tests:

Encrypted OSDs
Encrypted OSDs with KMS

osd prepare pod logs for cleaning encrypted OSDs and preparing it again:
osd-0-prepare.txt

Checklist:

Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
Skip Tests for Docs: If this is only a documentation change, add the label skip-ci on the PR.
Reviewed the developer guide on Submitting a Pull Request
Pending release notes updated with breaking and/or notable changes for the next minor release.
Documentation has been updated, if necessary.
Unit tests have been added, if necessary.
Integration tests have been added, if necessary.

travisn

I know it's still in draft, just a couple quick suggestions at a glance...

travisn · 2023-08-22T23:03:16Z

cmd/rook/ceph/osd.go

-	} else if osdInfo.CVMode == "lvm" {
-		s.SanitizeLVMDisk([]oposd.OSDInfo{osdInfo})
+	// zap the device
+	output, err := context.Executor.ExecuteCommandWithCombinedOutput("stdbuf", "-oL", "ceph-volume", "lvm", "zap", osd.Path, "--destroy")


To zap and wipe the disk, can we call helpers in pkg/daemon/ceph/cleanup/? It's nice to keep the cmd packages only focusing on command line processing.

pkg/daemon/ceph/cleanup/ is used by the cleanup jobs that run at the end so it just logs the error. It does not return any error. We need error handling while zapping a single disk.

Also all we need to use is ceph-volume lvm zap. Ceph volume team recommended this instead of shred.

travisn · 2023-08-22T23:04:13Z

cmd/rook/ceph/osd.go

+	logger.Infof("successfully zaped osd.%d path %q", osd.ID, osd.Path)
+
+	// shred the device?
+	shredArgs := []string{"--random-source=/dev/zero", "--force", "--verbose", "--iterations=1", osd.Path}


Can we limit the shred to just a small amount (e.g. 1MB) so we don't have to wait for the full disk?

Ceph-Volume team recommended not to use shred and that ceph-volume lvm zap should be enough. It works for both raw and lvm mode. I've removed the shred call for now. Tested it and looks like ceph-volume zap should be enough.

travisn · 2023-08-23T20:30:01Z

cmd/rook/ceph/osd.go

-func destroyOSD(context *clusterd.Context, clusterInfo *client.ClusterInfo, osdID int) (*oposd.OSDInfo, error) {
-	osdInfo, err := osddaemon.GetOSDInfoById(context, clusterInfo, osdID)
+// destroyOSD fetches the OSD to be replaced based on the ID and then destroys that OSD and zaps the backing device
+func destroyOSD(context *clusterd.Context, clusterInfo *client.ClusterInfo, id int) (*osd.OSDReplaceInfo, error) {


How about moving this method into the package pkg/daemon/ceph/osd?

Could it be part of the DiskSanitizer?

This could go in the existing pkg/daemon/ceph/osd/remove.go or a new pkg/daemon/ceph/osd/destroy.go

Problem with diskSanitizer is that it does not handle any error. So modifications will be needed in order to user existing methods.

How about as a new method in the disk sanitizer? Then it can be clear it has the purpose to sanitize the disk, and it can be the version that handles errors.

DiskSanitizer is mostly about cleaning the disks and controlling how the disks can be cleaned using the SanitizeDisksSpec.

We need to destroy OSD and run ceph-volume zap to clean the disk. We don't need any fine grained control over how the disk can be cleaned by using the SanitizeDisksSpec.

So I feel DiskSanitizer might not be the best place to have this method.

Ok, then I'd vote for pkg/daemon/ceph/osd/remove.go.

moved the DestroyOSD method to pkg/daemon/ceph/osd/remove.go

pkg/daemon/ceph/osd/volume.go

travisn

just a typo

pkg/daemon/ceph/osd/remove.go

travisn · 2023-08-28T16:47:26Z

Is testing with KMS completed?

sp98 · 2023-09-01T15:36:41Z

Is testing with KMS completed?

Yes. Completed today. Testing with KMS looks good now. @travisn

This follow up the rook#12507 - fixes replacing of encrypted OSDs. - Updates OSD status at the end of reconcile Signed-off-by: sp98 <sapillai@redhat.com>

osd: replace existing OSDs to use new store (backport #12770)

travisn requested changes Aug 22, 2023

View reviewed changes

sp98 force-pushed the osd-migration-part2 branch 2 times, most recently from 5eb298e to 6048cf0 Compare August 23, 2023 03:37

sp98 marked this pull request as ready for review August 23, 2023 03:40

sp98 requested a review from travisn August 23, 2023 03:42

sp98 force-pushed the osd-migration-part2 branch 3 times, most recently from 7aa25d3 to 7a8f726 Compare August 23, 2023 06:00

travisn reviewed Aug 23, 2023

View reviewed changes

sp98 requested a review from travisn August 24, 2023 12:14

sp98 force-pushed the osd-migration-part2 branch from 7a8f726 to 497fc70 Compare August 27, 2023 13:58

travisn approved these changes Aug 28, 2023

View reviewed changes

pkg/daemon/ceph/osd/remove.go Outdated Show resolved Hide resolved

travisn added the backport-release-1.12 label Aug 28, 2023

osd: replace existing OSDs to use new store

11b8d10

This follow up the rook#12507 - fixes replacing of encrypted OSDs. - Updates OSD status at the end of reconcile Signed-off-by: sp98 <sapillai@redhat.com>

sp98 force-pushed the osd-migration-part2 branch from 497fc70 to 11b8d10 Compare September 1, 2023 15:37

travisn merged commit 0ae3800 into rook:master Sep 5, 2023
49 of 50 checks passed

mergify bot mentioned this pull request Sep 5, 2023

osd: replace existing OSDs to use new store (backport #12770) #12851

Merged

mergify bot added a commit that referenced this pull request Sep 5, 2023

Merge pull request #12851 from rook/mergify/bp/release-1.12/pr-12770

bf59382

osd: replace existing OSDs to use new store (backport #12770)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osd: replace existing OSDs to use new store #12770

osd: replace existing OSDs to use new store #12770

sp98 commented Aug 22, 2023 •

edited

travisn left a comment

travisn Aug 22, 2023

sp98 Aug 23, 2023 •

edited

travisn Aug 22, 2023

sp98 Aug 23, 2023

travisn Aug 23, 2023

travisn Aug 23, 2023

sp98 Aug 24, 2023

travisn Aug 24, 2023

sp98 Aug 25, 2023

travisn Aug 25, 2023

sp98 Aug 27, 2023

travisn left a comment

travisn commented Aug 28, 2023

sp98 commented Sep 1, 2023

osd: replace existing OSDs to use new store #12770

osd: replace existing OSDs to use new store #12770

Conversation

sp98 commented Aug 22, 2023 • edited

travisn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sp98 Aug 23, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

travisn left a comment

Choose a reason for hiding this comment

travisn commented Aug 28, 2023

sp98 commented Sep 1, 2023

sp98 commented Aug 22, 2023 •

edited

sp98 Aug 23, 2023 •

edited