Skip to content

Commit

Permalink
A CPG client can sometimes lockup if the local node is in the downlist
Browse files Browse the repository at this point in the history
In a 10-node cluster where all nodes are booting up and starting corosync
at the same time, sometimes during this process corosync detects a node as
leaving and rejoining the cluster.

Occasionally the downlist that gets picked contains the local node. When the
local node sends leave events for the downlist (including itself), it sets
its cpd state to CPD_STATE_UNJOINED and clears the cpd->group_name. This
means it no longer sends CPG events to the CPG client.

Reviewed-by: Jan Friesse <jfriesse@redhat.com>
(cherry picked from commit 08f07be)
  • Loading branch information
Tim Beale authored and jfriesse committed Aug 18, 2011
1 parent 240af15 commit ac1d79e
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion services/cpg.c
Expand Up @@ -683,7 +683,8 @@ static int notify_lib_joinlist(
}
if (left_list_entries) {
if (left_list[0].pid == cpd->pid &&
left_list[0].nodeid == api->totem_nodeid_get()) {
left_list[0].nodeid == api->totem_nodeid_get() &&
left_list[0].reason == CONFCHG_CPG_REASON_LEAVE) {

cpd->pid = 0;
memset (&cpd->group_name, 0, sizeof(cpd->group_name));
Expand Down

0 comments on commit ac1d79e

Please sign in to comment.