-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd: replace existing OSDs to use new store #12770
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's still in draft, just a couple quick suggestions at a glance...
cmd/rook/ceph/osd.go
Outdated
} else if osdInfo.CVMode == "lvm" { | ||
s.SanitizeLVMDisk([]oposd.OSDInfo{osdInfo}) | ||
// zap the device | ||
output, err := context.Executor.ExecuteCommandWithCombinedOutput("stdbuf", "-oL", "ceph-volume", "lvm", "zap", osd.Path, "--destroy") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To zap and wipe the disk, can we call helpers in pkg/daemon/ceph/cleanup/
? It's nice to keep the cmd
packages only focusing on command line processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pkg/daemon/ceph/cleanup/
is used by the cleanup jobs that run at the end so it just logs the error. It does not return any error. We need error handling while zapping a single disk.
Also all we need to use is ceph-volume lvm zap
. Ceph volume team recommended this instead of shred
.
cmd/rook/ceph/osd.go
Outdated
logger.Infof("successfully zaped osd.%d path %q", osd.ID, osd.Path) | ||
|
||
// shred the device? | ||
shredArgs := []string{"--random-source=/dev/zero", "--force", "--verbose", "--iterations=1", osd.Path} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we limit the shred to just a small amount (e.g. 1MB) so we don't have to wait for the full disk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ceph-Volume team recommended not to use shred
and that ceph-volume lvm zap
should be enough. It works for both raw
and lvm
mode. I've removed the shred
call for now. Tested it and looks like ceph-volume zap
should be enough.
5eb298e
to
6048cf0
Compare
7aa25d3
to
7a8f726
Compare
cmd/rook/ceph/osd.go
Outdated
func destroyOSD(context *clusterd.Context, clusterInfo *client.ClusterInfo, osdID int) (*oposd.OSDInfo, error) { | ||
osdInfo, err := osddaemon.GetOSDInfoById(context, clusterInfo, osdID) | ||
// destroyOSD fetches the OSD to be replaced based on the ID and then destroys that OSD and zaps the backing device | ||
func destroyOSD(context *clusterd.Context, clusterInfo *client.ClusterInfo, id int) (*osd.OSDReplaceInfo, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving this method into the package pkg/daemon/ceph/osd
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be part of the DiskSanitizer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could go in the existing pkg/daemon/ceph/osd/remove.go
or a new pkg/daemon/ceph/osd/destroy.go
Problem with diskSanitizer
is that it does not handle any error. So modifications will be needed in order to user existing methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about as a new method in the disk sanitizer? Then it can be clear it has the purpose to sanitize the disk, and it can be the version that handles errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DiskSanitizer is mostly about cleaning the disks and controlling how the disks can be cleaned using the SanitizeDisksSpec
.
We need to destroy OSD and run ceph-volume zap
to clean the disk. We don't need any fine grained control over how the disk can be cleaned by using the SanitizeDisksSpec
.
So I feel DiskSanitizer
might not be the best place to have this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then I'd vote for pkg/daemon/ceph/osd/remove.go
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved the DestroyOSD
method to pkg/daemon/ceph/osd/remove.go
7a8f726
to
497fc70
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a typo
Is testing with KMS completed? |
Yes. Completed today. Testing with KMS looks good now. @travisn |
This follow up the rook#12507 - fixes replacing of encrypted OSDs. - Updates OSD status at the end of reconcile Signed-off-by: sp98 <sapillai@redhat.com>
497fc70
to
11b8d10
Compare
osd: replace existing OSDs to use new store (backport #12770)
This is a follow up for #12507
Description of your changes:
Which issue is resolved by this Pull Request:
Resolves #
Tests:
osd prepare pod logs for cleaning encrypted OSDs and preparing it again:
osd-0-prepare.txt
Checklist:
skip-ci
on the PR.