Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: replace existing OSDs to use new store #12770

Merged
merged 1 commit into from
Sep 5, 2023

Conversation

sp98
Copy link
Contributor

@sp98 sp98 commented Aug 22, 2023

This is a follow up for #12507

  • fixes replacing of encrypted OSDs.
  • Updates OSD status at the end of main reconcile, rather than every 60 seconds via the OSDhealthChecker

Description of your changes:

Which issue is resolved by this Pull Request:
Resolves #

Tests:

  • Encrypted OSDs
  • Encrypted OSDs with KMS

osd prepare pod logs for cleaning encrypted OSDs and preparing it again:
osd-0-prepare.txt

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Skip Tests for Docs: If this is only a documentation change, add the label skip-ci on the PR.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's still in draft, just a couple quick suggestions at a glance...

} else if osdInfo.CVMode == "lvm" {
s.SanitizeLVMDisk([]oposd.OSDInfo{osdInfo})
// zap the device
output, err := context.Executor.ExecuteCommandWithCombinedOutput("stdbuf", "-oL", "ceph-volume", "lvm", "zap", osd.Path, "--destroy")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To zap and wipe the disk, can we call helpers in pkg/daemon/ceph/cleanup/? It's nice to keep the cmd packages only focusing on command line processing.

Copy link
Contributor Author

@sp98 sp98 Aug 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkg/daemon/ceph/cleanup/ is used by the cleanup jobs that run at the end so it just logs the error. It does not return any error. We need error handling while zapping a single disk.

Also all we need to use is ceph-volume lvm zap. Ceph volume team recommended this instead of shred.

logger.Infof("successfully zaped osd.%d path %q", osd.ID, osd.Path)

// shred the device?
shredArgs := []string{"--random-source=/dev/zero", "--force", "--verbose", "--iterations=1", osd.Path}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we limit the shred to just a small amount (e.g. 1MB) so we don't have to wait for the full disk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ceph-Volume team recommended not to use shred and that ceph-volume lvm zap should be enough. It works for both raw and lvm mode. I've removed the shred call for now. Tested it and looks like ceph-volume zap should be enough.

@sp98 sp98 force-pushed the osd-migration-part2 branch 2 times, most recently from 5eb298e to 6048cf0 Compare August 23, 2023 03:37
@sp98 sp98 marked this pull request as ready for review August 23, 2023 03:40
@sp98 sp98 requested a review from travisn August 23, 2023 03:42
@sp98 sp98 force-pushed the osd-migration-part2 branch 3 times, most recently from 7aa25d3 to 7a8f726 Compare August 23, 2023 06:00
func destroyOSD(context *clusterd.Context, clusterInfo *client.ClusterInfo, osdID int) (*oposd.OSDInfo, error) {
osdInfo, err := osddaemon.GetOSDInfoById(context, clusterInfo, osdID)
// destroyOSD fetches the OSD to be replaced based on the ID and then destroys that OSD and zaps the backing device
func destroyOSD(context *clusterd.Context, clusterInfo *client.ClusterInfo, id int) (*osd.OSDReplaceInfo, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving this method into the package pkg/daemon/ceph/osd?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be part of the DiskSanitizer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could go in the existing pkg/daemon/ceph/osd/remove.go or a new pkg/daemon/ceph/osd/destroy.go

Problem with diskSanitizer is that it does not handle any error. So modifications will be needed in order to user existing methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about as a new method in the disk sanitizer? Then it can be clear it has the purpose to sanitize the disk, and it can be the version that handles errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DiskSanitizer is mostly about cleaning the disks and controlling how the disks can be cleaned using the SanitizeDisksSpec.

We need to destroy OSD and run ceph-volume zap to clean the disk. We don't need any fine grained control over how the disk can be cleaned by using the SanitizeDisksSpec.

So I feel DiskSanitizer might not be the best place to have this method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then I'd vote for pkg/daemon/ceph/osd/remove.go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved the DestroyOSD method to pkg/daemon/ceph/osd/remove.go

pkg/daemon/ceph/osd/volume.go Show resolved Hide resolved
Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a typo

pkg/daemon/ceph/osd/remove.go Outdated Show resolved Hide resolved
@travisn
Copy link
Member

travisn commented Aug 28, 2023

Is testing with KMS completed?

@sp98
Copy link
Contributor Author

sp98 commented Sep 1, 2023

Is testing with KMS completed?

Yes. Completed today. Testing with KMS looks good now. @travisn

This follow up the rook#12507
- fixes replacing of encrypted OSDs.
- Updates OSD status at the end of reconcile

Signed-off-by: sp98 <sapillai@redhat.com>
@travisn travisn merged commit 0ae3800 into rook:master Sep 5, 2023
49 of 50 checks passed
mergify bot added a commit that referenced this pull request Sep 5, 2023
osd: replace existing OSDs to use new store (backport #12770)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants