Skip to content

Commit

Permalink
Speed up elections during a netsplit by skipping delayed candidate_ti…
Browse files Browse the repository at this point in the history
…mer events

In a netsplit scenario, sending heartbeats to downed nodes can be very
expensive (have to wait for TCP timeouts). This causes candidate_worker
messages to stack up in the mailbox. This patch flushes them to prevent
that from happening.
  • Loading branch information
Vagabond committed Aug 18, 2011
1 parent 26171f0 commit 908b932
Showing 1 changed file with 21 additions and 0 deletions.
21 changes: 21 additions & 0 deletions src/gen_leader.erl
Original file line number Diff line number Diff line change
Expand Up @@ -644,6 +644,8 @@ safe_loop(#server{mod = Mod, state = State} = Server, Role,
timer:cancel(E#election.cand_timer),
E#election{cand_timer = undefined};
Down ->
%% get rid of any queued up candidate_timers, since we just handled one
flush_candidate_timers(),
%% Some of potential master candidate nodes are down.
%% Try to wake them up
F = fun(N) ->
Expand Down Expand Up @@ -901,6 +903,9 @@ loop(#server{parent = Parent,
timer:cancel(E#election.cand_timer),
E#election{cand_timer=undefined};
true ->
%% get rid of any queued up candidate_timers,
%% since we just handled one
flush_candidate_timers(),
E
end,
%% This shouldn't happen in the leader - just ignore
Expand Down Expand Up @@ -1537,3 +1542,19 @@ mon_handle_down(Ref, Parent, Refs) ->

mon_reply(From, Reply) ->
From ! {mon_reply, Reply}.

%% the heartbeat messages sent to the downed nodes when the candicate_timer
%% message is received can take a very long time in the case of a partitioned
%% network (7 seconds in my testing). Since the candidate_timer is generated
%% by a send_interval, this means many candidate_timer messages can accumulate
%% in the mailbox. This function is used to clear them out after handling one
%% of the candidate_timers, so gen_leader doesn't spend all its time sending
%% heartbeats.
flush_candidate_timers() ->
receive
{candidate_timer} ->
flush_candidate_timers()
after
0 ->
ok
end.

0 comments on commit 908b932

Please sign in to comment.