-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent non-voting (new and not-yet-caught-up) replicas from becoming cluster leader in certain scenarios #435
Conversation
The OCI build failures are due to the fact that this is an external contribution, so Docker Hub secrets are not injected by Actions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea and your understanding is correct, non-voters are not meant to transition directly to leaders.
However, the log messages used in this PR are a bit confusing.
@illotum - would you mind casting an eye at this? |
@sile would you be able to provide reproduction steps that lead to self-election? I'd prefer to try fix it on the election side, right now self-vote is acceptable if it doesn't lead to promotion. More context:
Overall, during the initial implementation, I tried and abandoned approach that relies on followers' knowledge of their state, for this and similar reasons. |
@illotum okay, I will share reproduction steps. I think that using If I can use
|
BTW, I'm bit confused about the phrase "right now self-vote is acceptable" because the following code in the ra main branch seems to prohibit that: handle_follower(election_timeout,
#{cfg := #cfg{log_id = LogId},
membership := Membership} = State) when Membership =/= voter ->
?DEBUG("~ts: follower ignored election_timeout, non-voter: ~p0",
[LogId, Membership]),
{follower, State, []}; |
My bad! We don't want non voters to call for election when they are up to date on state, you're correct. What I meant by acceptable, is that it still happens when the member is not up to date. As I mentioned above, a voter may be demoted later in the log, but won't know it if cluster gets into election mid-sync. We rely on general Raft election mechanism to not accept those candidates. Regarding your reproduction. In what situation you expect For example Or in other words, do you have a repro where system gets into undesirable state on its own? I will have a look at #427 :) |
@illotum I see. Thank you for your explanation.
I think that the function will always be called with human intent. So, if it is intended behaviour (by ra developpers) that If so, however, as I wrote in this PR's description, the self-vote by non-voters should be fixed instead I think. %% Run two Erlang nodes in separate terminals (foo@localhost is voter, and bar@localhost is non_voter)
$ rebar3 shell --sname foo@localhost
$ rebar3 shell --sname bar@localhost
%% Start ra
foo@localhost> ra:start().
bar@localhost> ra:start().
%% Start a single-member cluster.
foo@localhost> ClusterName = dyn_members.
foo@localhost> Machine = {simple, fun erlang:'+'/2, 0}.
foo@localhost> {ok, _, _} = ra:start_cluster(default, ClusterName, Machine, [{dyn_members, 'foo@localhost'}]).
%% Add a non_voter member
foo@localhost> NonVoterServer = #{id => {dyn_members, 'bar@localhost'}, membership => non_voter}.
foo@localhost> ra:add_member({dyn_members, 'foo@localhost'}, NonVoterServer).
foo@localhost> ra:start_server(default, ClusterName, NonVoterServer, Machine, [{dyn_members,'foo@localhost'}]).
%% Stop foo@localhost node (=> only a non_voter member is running in the cluster)
foo@localhost> q().
%% At this time being, 'bar@localhost' considers 'foo@localhost' is the leader.
bar@localhost> maps:get(leader_id, element(2,ra:member_overview(dyn_members))).
{dyn_members,foo@localhost}
%% Call `gen_statem:cast({dyn_members,bar@localhost},try_become_leader).`
%%
%% [NOTE]
%% This is a bit tricky.
%% But the same scenario could happen in the real world
%% if `ra:transfer_leadership({dyn_members,bar@localhost}, {dyn_members,bar@localhost})` is called
%% when 'foo@localhost' node is alive,
%% and the node aborted just after
%% the leader ra server process called `gen_statem:cast({dyn_members,bar@localhost},try_become_leader)`.
bar@localhost> gen_statem:cast({dyn_members,bar@localhost},try_become_leader).
%% Now, 'bar@localhost' is elected as the leader even if there are no alive voters.
bar@localhost> maps:get(leader_id, element(2,ra:member_overview(dyn_members))).
{dyn_members,bar@localhost} |
It was never the intent to make This is definitely not a scenario that this was introduced for in RabbitMQ :) |
I can see the argument that a library should prohibit actions leading to bad states. In that case I'd recommend adding a "become_voter" API. My original idea was to complement Regarding the repro, this seems to stem from the way we count required quorum. When you run one voter and one non_voter, you effectively run a Raft cluster of one. Raft cannot guarantee anything when you lose that one (voter) member. If there was no witness, you'd already have no data left. This is a degraded state and by any means is not how Raft should be operated. As soon as you add another voter to your repro, non_voter loses its ability to self elect. Edit. Just to reiterate. The only reason I can rely on follower's knowledge of cluster state in |
Thank you for your comments. I feel that there is some controversy among reviewers about the best way to address this issue. Some notes about comments for the repro
I think it is normal that the single voter member will restart without data loss.
I might misunderstand something, but even if there is another voter member in the repro (unlike |
@sile leave the change that is commented on as a legitimate improvement, back out the rest. Then I suspect this PR would have a chance of being accepted. Our team's focus is on chaos testing of Ra, so contributions with less-than-obvious value proposition do not receive much attention. |
@michaelklishin Thank you for your suggestion (fixed at 01afd9b).
Sounds interesting 👍 |
The RabbitMQ OCI failures are not related to this change: for external contributions, Actions does not inject the necessary secrets. |
Merging per discussion with @kjnilsson. |
@sile thank you for your ongoing contributions to Ra! |
Thank you! |
Proposed Changes
In my understanding,
ra
doesn't assume that non-voters become the cluster leader.(If the understanding is not correct, self vote by non-voters should be fixed instead to prevent potential quorum violation.)
However, the current implementation sometimes allows non-voters to transition into
pre_vote
state.For instance, a non-voter can become the leader by calling
ra:transfer_leadership/2
.This PR addresses this problem by adding checks that prevent non-voters from transitioning into
pre_vote
state.Types of Changes
What types of changes does your code introduce to this project?
Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply. You can also fill these out after creatingthe PR. If you're unsure about any of them, don't hesitate to ask on the
mailing list. We're here to help! This is simply a reminder of what we are
going to look for before merging your code.
CONTRIBUTING.md
documentFurther Comments
If this is a relatively large or complex change, kick off the discussion by
explaining why you chose the solution you did and what alternatives you
considered, etc.