-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace OSD to use new backend store #12507
Conversation
This pull request has merge conflicts that must be resolved before it can be merged. @sp98 please rebase it. https://rook.io/docs/rook/latest/Contributing/development-flow/#updating-your-fork |
94b404d
to
2b5255d
Compare
980dc8a
to
5ce3b9a
Compare
6d45eb4
to
00ca167
Compare
f314fc0
to
57e9922
Compare
1fa42fa
to
ef2a467
Compare
cmd/rook/ceph/osd.go
Outdated
// destroy the OSD using the OSD ID | ||
var replaceOSD *osd.OSDReplaceInfo | ||
if replaceOSDID != -1 { | ||
osdInfo, err := osddaemon.GetOSDInfoById(context, &clusterInfo, replaceOSDID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be moved inside the destroyOSD()
method? To keep everything about the destroy in a single place.
@@ -860,6 +880,7 @@ func GetCephVolumeLVMOSDs(context *clusterd.Context, clusterInfo *client.Cluster | |||
lvPath = lv | |||
} | |||
|
|||
// TODO: Don't read osd store type from env variable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then we can update the comment here with the tracker, thanks
|
||
osdStore, err := m.getOSDStoreStatus() | ||
if err != nil { | ||
logger.Errorf("failed to get osd store type count. %v", osdStore) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this failed, will osdStore
still be valid?
logger.Errorf("failed to get osd store type count. %v", osdStore) | |
logger.Errorf("failed to get osd store status. %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the log message.
It fails only when there is a failure to list the OSD deployment. I think then we would just return and don't update OSD store status in the spec, keeping the status prior to the failure intact.
|
||
func (m *OSDHealthMonitor) getOSDStoreStatus() (*cephv1.OSDStatus, error) { | ||
label := fmt.Sprintf("%s=%s", k8sutil.AppAttr, AppName) | ||
osdDeployments, err := k8sutil.GetDeployments(m.clusterInfo.Context, m.context.Clientset, m.clusterInfo.Namespace, label) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OSD deployments will rarely change, so it seems expensive to query these deployments again every minute for the OSD status. I wonder if we can skip this query most of the time. For example, perhaps only query if the desired storeType is different from any existing OSD store types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or perhaps we just query this and update the status at the end of a normal OSD reconcile. During the OSD status update just seems too frequent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or perhaps we just query this and update the status at the end of a normal OSD reconcile. During the OSD status update just seems too frequent.
With this approach, we will updating osdStatus
at two different places (once for deviceClass list and again for OSD store). That might be more confusing to debug later on. IMO, updating the entire OSD status at one place might be a better approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OSD deployments will rarely change, so it seems expensive to query these deployments again every minute for the OSD status. I wonder if we can skip this query most of the time. For example, perhaps only query if the desired storeType is different from any existing OSD store types.
I prefer this approach. We just need to pass the desired store type
spec to the OSD health checker. If desired store type is bluestore-rdr
but status.storage.osd.storeType['bluestore
] != 0` (implying that migration is not complete), only then we make go ahead and get the OSD list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, that should work perfectly to reduce the queries, and keep the osd status update in a single place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach I suggested will not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take this up in a follow PR next week.
pkg/operator/ceph/cluster/osd/osd.go
Outdated
if pgClean { | ||
return true, nil | ||
} | ||
logger.Warningf("pgs are not healthy. PG status: %q", pgHealthMsg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.Warningf("pgs are not healthy. PG status: %q", pgHealthMsg) | |
logger.Infof("waiting for PGs to be healthy after replacing an OSD, status: %q", pgHealthMsg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also just print this every minute or two instead of every 10s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take this up in a follow PR next week.
} | ||
|
||
func (o *OSDReplaceConfig) string() (string, error) { | ||
configInBytes, err := json.Marshal(o) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
This pull request has merge conflicts that must be resolved before it can be merged. @sp98 please rebase it. https://rook.io/docs/rook/latest/Contributing/development-flow/#updating-your-fork |
If osd store is updated in the ceph cluster, then delete OSDs one by one, cleanup disks and provision a new OSD on the same disk Signed-off-by: sp98 <sapillai@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going ahead and merging with these follow-up items understood:
- Additional changes needed for encrypted OSDs to be cleaned up during replacement
- OSD status update to be more efficient so it doesn't query OSDs at every interval
- Reduce logging frequency while waiting for PGs
Also, a couple follow-up items as feedback from my testing:
4. Let's not wipe the entire disk, just the first part. There's no need to wipe the entire disk since we're just going to backfill to the disk again. Let's just do a quick wipe (perhaps first ~1MB) to get it clean enough to create a new OSD.
5. In the OSD prepare job, add a clear log statement that we're replacing the OSD right before we wipe the disk. When looking at the log, it just wasn't obvious why the disk was being wiped.
@travisn I propose we move the entire updateCephStorageStatus from OSDHealthChecker to the main reconcile that creates/updates the OSD. We can add this method at the end of the OSD reconcile. |
That method only updates the device classes and device types, right? Since those will only change infrequently and only after a reconcile, makes sense to update that status at the end of a reconcile. |
Replace OSD to use new backend store (backport #12507)
This follow up the rook#12507 - fixes replacing of encrypted OSDs. - Updates OSD status at the end of reconcile Signed-off-by: sp98 <sapillai@redhat.com>
This follow up the rook#12507 - fixes replacing of encrypted OSDs. - Updates OSD status at the end of reconcile Signed-off-by: sp98 <sapillai@redhat.com>
This follow up the rook#12507 - fixes replacing of encrypted OSDs. - Updates OSD status at the end of reconcile Signed-off-by: sp98 <sapillai@redhat.com>
This follow up the rook#12507 - fixes replacing of encrypted OSDs. - Updates OSD status at the end of reconcile Signed-off-by: sp98 <sapillai@redhat.com>
This follow up the rook#12507 - fixes replacing of encrypted OSDs. - Updates OSD status at the end of reconcile Signed-off-by: sp98 <sapillai@redhat.com>
This follow up the rook#12507 - fixes replacing of encrypted OSDs. - Updates OSD status at the end of reconcile Signed-off-by: sp98 <sapillai@redhat.com>
This follow up the rook#12507 - fixes replacing of encrypted OSDs. - Updates OSD status at the end of reconcile Signed-off-by: sp98 <sapillai@redhat.com>
This follow up the rook#12507 - fixes replacing of encrypted OSDs. - Updates OSD status at the end of reconcile Signed-off-by: sp98 <sapillai@redhat.com>
Description of your changes:
This PR is to enable migration of OSDs to a new backend store. Currently the OSDs are using
bluestore
. In future if a new OSD store is introduced, then existing rook OSDs will need to migrate to use this new store.Design PR: #12381
Pending:
Note: This PR does not cover following scenarios:
Which issue is resolved by this Pull Request:
Resolves #
Checklist:
skip-ci
on the PR.