Skip to content

Commit

Permalink
mon/OSDMonitor: Added extra check before mon.go_recovery_stretch_mode()
Browse files Browse the repository at this point in the history
Problem:
There are certain scenarios in degraded
stretched cluster where will try to
go into the
function ``Monitor::go_recovery_stretch_mode()``
that will lead to a `ceph_assert`.

Solution:
Make sure ``dead_mon_buckets.size() == 0``
in ``OSDMonitor:update_from_paxos()``
before going into ``Monitor::go_recovery_stretch_mode()``.

Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=2104207

Signed-off-by: Kamoltat <ksirivad@redhat.com>
(cherry picked from commit d95c41a)
  • Loading branch information
kamoltat committed Nov 8, 2022
1 parent 025d3fa commit 94dc970
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion src/mon/OSDMonitor.cc
Expand Up @@ -960,10 +960,12 @@ void OSDMonitor::update_from_paxos(bool *need_bootstrap)
dout(20) << "mon_stretch_cluster_recovery_ratio: " << cct->_conf.get_val<double>("mon_stretch_cluster_recovery_ratio") << dendl;
if (prev_num_up_osd < osdmap.num_up_osd &&
(osdmap.num_up_osd / (double)osdmap.num_osd) >
cct->_conf.get_val<double>("mon_stretch_cluster_recovery_ratio")) {
cct->_conf.get_val<double>("mon_stretch_cluster_recovery_ratio") &&
mon.dead_mon_buckets.size() == 0) {
// TODO: This works for 2-site clusters when the OSD maps are appropriately
// trimmed and everything is "normal" but not if you have a lot of out OSDs
// you're ignoring or in some really degenerate failure cases

dout(10) << "Enabling recovery stretch mode in this map" << dendl;
mon.go_recovery_stretch_mode();
}
Expand Down

0 comments on commit 94dc970

Please sign in to comment.