Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent non-voting (new and not-yet-caught-up) replicas from becoming cluster leader in certain scenarios #435

Merged
merged 4 commits into from
May 9, 2024

Conversation

sile
Copy link
Contributor

@sile sile commented Apr 25, 2024

Proposed Changes

In my understanding, ra doesn't assume that non-voters become the cluster leader.
(If the understanding is not correct, self vote by non-voters should be fixed instead to prevent potential quorum violation.)

However, the current implementation sometimes allows non-voters to transition into pre_vote state.
For instance, a non-voter can become the leader by calling ra:transfer_leadership/2.
This PR addresses this problem by adding checks that prevent non-voters from transitioning into pre_vote state.

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation (correction or otherwise)
  • Cosmetics (whitespace, appearance)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating
the PR. If you're unsure about any of them, don't hesitate to ask on the
mailing list. We're here to help! This is simply a reminder of what we are
going to look for before merging your code.

  • I have read the CONTRIBUTING.md document
  • I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • All tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in related repositories

Further Comments

If this is a relatively large or complex change, kick off the discussion by
explaining why you chose the solution you did and what alternatives you
considered, etc.

@michaelklishin
Copy link
Member

The OCI build failures are due to the fact that this is an external contribution, so Docker Hub secrets are not injected by Actions.

Copy link
Member

@michaelklishin michaelklishin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea and your understanding is correct, non-voters are not meant to transition directly to leaders.

However, the log messages used in this PR are a bit confusing.

src/ra_server.erl Outdated Show resolved Hide resolved
src/ra_server.erl Outdated Show resolved Hide resolved
src/ra_server.erl Outdated Show resolved Hide resolved
@kjnilsson
Copy link
Contributor

@illotum - would you mind casting an eye at this?

@illotum
Copy link
Collaborator

illotum commented Apr 25, 2024

@sile would you be able to provide reproduction steps that lead to self-election? I'd prefer to try fix it on the election side, right now self-vote is acceptable if it doesn't lead to promotion.

More context:

force_membership_change at al are not part of the log, so they may arrive before follower reaches its current state (during log replay). Additionally these out-of-band actions can be used by ops. For example we may want to promote witness to leader via manual force_shrink.

Overall, during the initial implementation, I tried and abandoned approach that relies on followers' knowledge of their state, for this and similar reasons.

@sile
Copy link
Contributor Author

sile commented Apr 25, 2024

@illotum okay, I will share reproduction steps.
But, would it be possible for you to review #427 beforehand?

I think that using non_voter is much easier than promotable to reproduce the self-voting problem. However, I cannot use non_voter for reproduction as it seems having another issue subtly relating to this problem.

If I can use non_voter as expected in #427, the reproduction steps would be as follows:

  1. Creates a single member cluster
  2. Adds a non_voter member (named A) to the cluster
  3. Calls ra:tansfer_leadership(A, A)
  4. A becomes the leader

@sile
Copy link
Contributor Author

sile commented Apr 25, 2024

BTW, I'm bit confused about the phrase "right now self-vote is acceptable" because the following code in the ra main branch seems to prohibit that:
https://github.com/rabbitmq/ra/blob/main/src/ra_server.erl#L1304-L1309

handle_follower(election_timeout,
                #{cfg := #cfg{log_id = LogId},
                  membership := Membership} = State) when Membership =/= voter ->
    ?DEBUG("~ts: follower ignored election_timeout, non-voter: ~p0",
           [LogId, Membership]),
    {follower, State, []};

@illotum
Copy link
Collaborator

illotum commented Apr 25, 2024

My bad! We don't want non voters to call for election when they are up to date on state, you're correct.

What I meant by acceptable, is that it still happens when the member is not up to date. As I mentioned above, a voter may be demoted later in the log, but won't know it if cluster gets into election mid-sync. We rely on general Raft election mechanism to not accept those candidates.


Regarding your reproduction. In what situation you expect ra:tansfer_leadership(A, A) to happen without human intent? Membership status was deliberately left malleable to operators and developers, to allow its use in various scenarios.

For example non_voter is designed for permanent "witnesses" and those may be used as a passive backup. Backups need to be promoted back occasionally.

Or in other words, do you have a repro where system gets into undesirable state on its own?


I will have a look at #427 :)

@sile
Copy link
Contributor Author

sile commented Apr 26, 2024

@illotum I see. Thank you for your explanation.

In what situation you expect ra:tansfer_leadership(A, A) to happen without human intent?
...
Or in other words, do you have a repro where system gets into undesirable state on its own?

I think that the function will always be called with human intent. So, if it is intended behaviour (by ra developpers) that non_voter members could be the cluster leader, it's okay to me (the diff for try_become_leader and force_member_change should be removed).

If so, however, as I wrote in this PR's description, the self-vote by non-voters should be fixed instead I think.
By this self-vote, non-voters always get +1 vote (by themselves) while they are excluded in the quorum. Thus, they can become the leader with fewer votes than what Raft algorithm requires.
The following reproduction steps show this scenario (note that we need to use #427).

%% Run two Erlang nodes in separate terminals (foo@localhost is voter, and bar@localhost is non_voter)
$ rebar3 shell --sname foo@localhost
$ rebar3 shell --sname bar@localhost

%% Start ra
foo@localhost> ra:start().
bar@localhost> ra:start().

%% Start a single-member cluster.
foo@localhost> ClusterName = dyn_members.
foo@localhost> Machine = {simple, fun erlang:'+'/2, 0}.
foo@localhost> {ok, _, _} =  ra:start_cluster(default, ClusterName, Machine, [{dyn_members, 'foo@localhost'}]).

%% Add a non_voter member
foo@localhost> NonVoterServer = #{id => {dyn_members, 'bar@localhost'}, membership => non_voter}.
foo@localhost> ra:add_member({dyn_members, 'foo@localhost'}, NonVoterServer).
foo@localhost> ra:start_server(default, ClusterName, NonVoterServer, Machine, [{dyn_members,'foo@localhost'}]).

%% Stop foo@localhost node (=> only a non_voter member is running in the cluster)
foo@localhost> q().

%% At this time being, 'bar@localhost' considers 'foo@localhost' is the leader. 
bar@localhost> maps:get(leader_id, element(2,ra:member_overview(dyn_members))).
{dyn_members,foo@localhost}

%% Call `gen_statem:cast({dyn_members,bar@localhost},try_become_leader).`
%% 
%% [NOTE]
%% This is a bit tricky. 
%% But the same scenario could happen in the real world
%% if `ra:transfer_leadership({dyn_members,bar@localhost}, {dyn_members,bar@localhost})` is called 
%% when 'foo@localhost' node is alive, 
%% and the node aborted just after 
%% the leader ra server process called `gen_statem:cast({dyn_members,bar@localhost},try_become_leader)`.
bar@localhost> gen_statem:cast({dyn_members,bar@localhost},try_become_leader).

%% Now, 'bar@localhost' is elected as the leader even if there are no alive voters.
bar@localhost> maps:get(leader_id, element(2,ra:member_overview(dyn_members))).
{dyn_members,bar@localhost}

@michaelklishin
Copy link
Member

It was never the intent to make non_voter leaders. If a replica is not allowed to participate in election, how much sense does it make to allow it to become a leader?

This is definitely not a scenario that this was introduced for in RabbitMQ :)

@illotum
Copy link
Collaborator

illotum commented Apr 26, 2024

I can see the argument that a library should prohibit actions leading to bad states. In that case I'd recommend adding a "become_voter" API. My original idea was to complement non_voter with the ability to both promote and demote voters, to allow runtime control over membership state.


Regarding the repro, this seems to stem from the way we count required quorum. When you run one voter and one non_voter, you effectively run a Raft cluster of one. Raft cannot guarantee anything when you lose that one (voter) member. If there was no witness, you'd already have no data left.

This is a degraded state and by any means is not how Raft should be operated. As soon as you add another voter to your repro, non_voter loses its ability to self elect.

Edit. Just to reiterate. The only reason I can rely on follower's knowledge of cluster state in election_timeout is that election will not succeed unless follower is up to date on the log. In all other cases, you must also consider situations when a follower is behind and doesn't know its current state.

@sile
Copy link
Contributor Author

sile commented May 7, 2024

@michaelklishin, @illotum

Thank you for your comments.
However, I am a bit confused about what I should do next to proceed with this PR.

I feel that there is some controversy among reviewers about the best way to address this issue.
So it may be better that ra team decides the approach to proceed this PR.
(it is okay to close this PR in order to hand it over to you if it is desirable.)

Some notes about comments for the repro

Raft cannot guarantee anything when you lose that one (voter) member. If there was no witness, you'd already have no data left.

I think it is normal that the single voter member will restart without data loss.

As soon as you add another voter to your repro, non_voter loses its ability to self elect.

I might misunderstand something, but even if there is another voter member in the repro (unlike foo@localhost, this member does not have to be stopped), the non-voter member will still be able to become the leader.
(In this case, bar@localhost will gain 2 votes (one from self-voting and another from the additional voter), and ra_server:required_quorum() will return 2.)

@michaelklishin
Copy link
Member

@sile leave the change that is commented on as a legitimate improvement, back out the rest. Then I suspect this PR would have a chance of being accepted. Our team's focus is on chaos testing of Ra, so contributions with less-than-obvious value proposition do not receive much attention.

@sile
Copy link
Contributor Author

sile commented May 7, 2024

@michaelklishin Thank you for your suggestion (fixed at 01afd9b).

Our team's focus is on chaos testing of Ra

Sounds interesting 👍

@michaelklishin
Copy link
Member

The RabbitMQ OCI failures are not related to this change: for external contributions, Actions does not inject the necessary secrets.

@michaelklishin
Copy link
Member

Merging per discussion with @kjnilsson.

@michaelklishin michaelklishin merged commit 29306c0 into rabbitmq:main May 9, 2024
8 of 10 checks passed
@michaelklishin
Copy link
Member

@sile thank you for your ongoing contributions to Ra!

@michaelklishin michaelklishin changed the title Fix a problem that non-voters could become leader Prevent non-voters from becoming leader in certain scenarios May 9, 2024
@michaelklishin michaelklishin changed the title Prevent non-voters from becoming leader in certain scenarios Prevent non-voters from becoming cluster leader in certain scenarios May 9, 2024
@michaelklishin michaelklishin changed the title Prevent non-voters from becoming cluster leader in certain scenarios Prevent non-voting (new and not-yet-caught-up) replicas from becoming cluster leader in certain scenarios May 9, 2024
@sile sile deleted the prevent-non-voter-leader branch May 9, 2024 21:50
@sile
Copy link
Contributor Author

sile commented May 9, 2024

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants