Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Election problem #258

Closed
yfei-z opened this issue Mar 26, 2024 · 2 comments
Closed

Election problem #258

yfei-z opened this issue Mar 26, 2024 · 2 comments

Comments

@yfei-z
Copy link

yfei-z commented Mar 26, 2024

I noticed that the election means of jgroups-raft has a little different from the standard algorithm, so I did some tests about it.
raft-bug1
There are 5 nodes of the cluster, and here is the log status of term 1. The steps like below:

  1. Apply two commands with 5 nodes
  2. Shutdown C then apply two commands
  3. Shutdown E then apply two commands

raft-bug2
Shutdown all nodes then restart C, D, E of the 5 nodes cluster, C first and it become the coordinator, then D and then E. The majority has been reached, so the coordinator begin the election process, after all responses have been received, coordinator begin to find out the leader, below is the source code.

    protected Address determineLeader() {
        Address leader=null;
        Map<Address,VoteResponse> results=votes.getResults();
        for(Address mbr: view.getMembersRaw()) {
            VoteResponse rsp=results.get(mbr);
            if(rsp == null)
                continue;
            if(leader == null)
                leader=mbr;
            if(isHigher(rsp.last_log_term, rsp.last_log_index))
                leader=mbr;
        }
        return leader;
    }

    protected boolean isHigher(long last_term, long last_index) {
        long my_last_index=raft.log().lastAppended();
        LogEntry entry=raft.log().get(my_last_index);
        long my_last_term=entry != null? entry.term() : 0;
        if(last_term > my_last_term)
            return true;
        if(last_term < my_last_term)
            return false;
        return last_index > my_last_index;
    }

view.getMembersRaw() returns "C, D, E" which is the joining order of them, and the "self" here is C which is the coordinator, the result is E become the leader of term 2.

raft-bug3
Trying to setAsync a command, after the log being committed and applied, the log like above.

@jabolina
Copy link
Member

Thanks for opening the issue, @yfei-z. I think b87f5f3 fixes this issue, a problem in calculating the longest log. I'll try to create a test with the scenario described to make sure and release a new version with the fix ASAP.

@jabolina
Copy link
Member

I developed a small test with the scenario to try. Running against main, the correct node (D) is elected. And without b87f5f3, the wrong node is elected. An equivalent of this scenario is

public void testElectionWithLongLogMiddle(Class<?> ignore) {

I'll run a few more tests and try to release 1.0.13.Final this week. I'll close this one, thanks for reporting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants