Improve Ceph upgrade pod recreation gates #2889

liewegas · 2019-03-25T15:28:29Z

When doing an upgrade, each pod is recreated with the updated image. We should gate each pod recreate with the Ceph status to make sure we are giving Ceph enough time to settle between these restarts.

For monitor, restarts are fast, so it is usually not an issue. However, we have a new command pending ceph mon ok-to-stop <id> to check whether stopping a given monitor will break quorum. Sometimes that is necessary (e.g., if there is only 1 mon), but when there are 3 or more, we should be more gentle and wait between mon restarts.

For the mgr, no check is necessary--just restart the mgr pods.

For OSDs, we need to be careful. There is an existing command, ceph osd ok-to-stop that checks whether stopping the given OSD will make any data go unavailable. We should (generally) wait for this to become true. Note, however, that if there are pools with 1 replica, or the cluster is degraded for other reasons, it may never become true, so we may want to have a 'force' mode, or have it proceed anyway if a timeout expires.

For MDSs, a new command is pending ceph mds ok-to-stop <id>.

radosgw is stateless--restarting one at a time should be sufficient.

Same for rbd-mirror.

The text was updated successfully, but these errors were encountered:

leseb · 2019-03-25T15:33:34Z

I guess, the only thing missing is a reasonable timeout for OSDs :). Is it really safe pursuing anyway if the number of PGs never settles?

rohantmp · 2019-03-26T10:37:00Z

I think it would be very useful to have an ok-to-stop for any level on the osd tree. Especially host. It would be the perfect for node level disruptions like kernel upgrade or fencing.

liewegas · 2019-03-26T16:27:46Z

@rohan47 that's a good point. The current ok-to-stop command takes a list of OSDs, so you can currently do ceph osd ok-to-stop $(ceph osd crush ls $HOSTNAME). This could be collapsed into a single command, perhaps. Note that that also only covers the OSDs (which, to be fair, are usually the primary concern). Perhaps a host-level check for all daemons that checks whether a host restart is safe would nice.

rohantmp · 2019-03-27T08:42:49Z

@liewegas That command combination fulfils the exact function I was thinking of. Thanks!

P.S. I'm @rohantmp :P

When a cluster is updated with a different image version, this triggers a serialized restart of all the pods. Prior to this commit, no safety check were performed and rook was hoping for the best outcome. Now before doing restarting a daemon we check it can be restarted. Once it's restarted we also check we can pursue with the rest of the platform. For instance, with monitors we check that they are in quorum, for OSD we check that PGs are clean and for MDS we make sure they are active. Fixes: rook#2889 Signed-off-by: Sébastien Han <seb@redhat.com>

When a cluster is updated with a different image version, this triggers a serialized restart of all the pods. Prior to this commit, no safety check were performed and rook was hoping for the best outcome. Now before doing restarting a daemon we check it can be restarted. Once it's restarted we also check we can pursue with the rest of the platform. For instance, with monitors we check that they are in quorum, for OSD we check that PGs are clean and for MDS we make sure they are all active. Fixes: rook#2889 Signed-off-by: Sébastien Han <seb@redhat.com>

stale · 2019-06-25T09:10:52Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

When a cluster is updated with a different image version, this triggers a serialized restart of all the pods. Prior to this commit, no safety check were performed and rook was hoping for the best outcome. Now before doing restarting a daemon we check it can be restarted. Once it's restarted we also check we can pursue with the rest of the platform. For instance, with monitors we check that they are in quorum, for OSD we check that PGs are clean and for MDS we make sure they are all active. Fixes: rook#2889 Signed-off-by: Sébastien Han <seb@redhat.com>

liewegas added the feature label Mar 25, 2019

leseb added the ceph main ceph tag label Mar 25, 2019

leseb self-assigned this Mar 25, 2019

leseb mentioned this issue Mar 27, 2019

ceph: improve upgrade procedure #2901

Merged

5 tasks

stale bot added the wontfix label Jun 25, 2019

leseb removed the wontfix label Jun 25, 2019

leseb added this to the 1.1 milestone Jun 28, 2019

leseb mentioned this issue Jul 16, 2019

Update the roadmap for the v1.1 release #3458

Merged

9 tasks

leseb closed this as completed in #2901 Jul 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Ceph upgrade pod recreation gates #2889

Improve Ceph upgrade pod recreation gates #2889

liewegas commented Mar 25, 2019

leseb commented Mar 25, 2019

rohantmp commented Mar 26, 2019

liewegas commented Mar 26, 2019

rohantmp commented Mar 27, 2019

stale bot commented Jun 25, 2019

Improve Ceph upgrade pod recreation gates #2889

Improve Ceph upgrade pod recreation gates #2889

Comments

liewegas commented Mar 25, 2019

leseb commented Mar 25, 2019

rohantmp commented Mar 26, 2019

liewegas commented Mar 26, 2019

rohantmp commented Mar 27, 2019

stale bot commented Jun 25, 2019