Skip to content

follower can not be synced after network timeout and keep REPEATED_PUSH #331

@absolute8511

Description

@absolute8511

2025-11-14 01:14:04 WARN NettyClientPublicExecutor_3 - [Push-n1]Get error response code [code=302,name=NETWORK_ERROR,desc=] info[group=rocketmq-broker-23,term=2,code=302,local=n0,remote=n1,leader=n0]

。。。

2025-11-14 11:16:59 WARN EntryDispatcher-n0-n1 - [Push-n1]Retry to push entry at 126097589609
2025-11-14 11:16:59 WARN NettyClientPublicExecutor_4 - [Push-n1]Get error response code [code=413,name=REPEATED_PUSH,desc=] info[group=rocketmq-broker-23,term=2,code=413,local=null,remote=null,leader=null]
2025-11-14 11:17:00 WARN EntryDispatcher-n0-n1 - [Push-n1]Retry to push entry at 126097589609
2025-11-14 11:17:00 WARN NettyClientPublicExecutor_6 - [Push-n1]Get error response code [code=413,name=REPEATED_PUSH,desc=] info[group=rocketmq-broker-23,term=2,code=413,local=null,remote=null,leader=null]
2025-11-14 11:17:01 WARN EntryDispatcher-n0-n1 - [Push-n1]Retry to push entry at 126097589609
2025-11-14 11:17:01 WARN NettyClientPublicExecutor_5 - [Push-n1]Get error response code [code=413,name=REPEATED_PUSH,desc=] info[group=rocketmq-broker-23,term=2,code=413,local=null,remote=null,leader=null]
2025-11-14 11:17:01 INFO QuorumAckChecker-n0 - [n0][LEADER] term=2 ledgerBegin=125158751135 ledgerEnd=126204270130 committed=126204270117 watermarks={2:{"n0":126204270130,"n1":126097589608,"n2":126204270121}}

follower

2025-11-14 01:14:00 INFO QuorumAckChecker-n1 - [n1][FOLLOWER] term=2 ledgerBegin=124835725784 ledgerEnd=126097588785 committed=126097588785 watermarks={2:{"n0":-1,"n1":-1,"n2":-1}
}
2025-11-14 01:14:03 INFO QuorumAckChecker-n1 - [n1][FOLLOWER] term=2 ledgerBegin=124835725784 ledgerEnd=126097589536 committed=126097589536 watermarks={2:{"n0":-1,"n1":-1,"n2":-1}
}
2025-11-14 01:14:04 INFO StateMaintainer - [n1][HeartBeatTimeOut] lastLeaderHeartBeatTime: 2025-11-14 01:13:58.079 heartBeatTimeIntervalMs: 2000 lastLeader=n0
2025-11-14 01:14:04 INFO StateMaintainer - [n1] [ChangeRoleToCandidate] from term: 2 and currTerm: 2
2025-11-14 01:14:04 INFO QuorumAckChecker-n1 - Initialize the pending append map in QuorumAckChecker for term=3
2025-11-14 01:14:04 INFO QuorumAckChecker-n1 - Initialize the watermark in QuorumAckChecker for term=3
2025-11-14 01:14:04 INFO QuorumAckChecker-n1 - [TermChange] Will clear the watermarks for term changed from 2 to 3
2025-11-14 01:14:04 INFO StateMaintainer - n1_[INCREASE_TERM] from 2 to 3
2025-11-14 01:14:04 INFO StateMaintainer - [n1][GetVoteResponse] {"code":200,"group":"rocketmq-broker-23","leaderId":"n1","localId":"n1","remoteId":"n1","term":3,
"voteResult":"ACCEPT"}
2025-11-14 01:14:04 INFO NettyClientPublicExecutor_5 - [n1][GetVoteResponse] {"code":200,"group":"rocketmq-broker-23","leaderId":"n1","localId":"n1","remoteId":"n
2","term":2,"voteResult":"REJECT_SMALL_LEDGER_END_INDEX"}
2025-11-14 01:14:04 INFO NettyClientPublicExecutor_6 - [n1][GetVoteResponse] {"code":200,"group":"rocketmq-broker-23","leaderId":"n1","localId":"n1","remoteId":"n
0","term":2,"voteResult":"REJECT_SMALL_LEDGER_END_INDEX"}
2025-11-14 01:14:04 INFO StateMaintainer - [n1] [PARSE_VOTE_RESULT] cost=466 term=3 memberNum=3 allNum=3 acceptedNum=1 notReadyTermNum=0 biggerLedgerNum=2 alreadyHasLeader=false m
axTerm=3 result=WAIT_TO_REVOTE
2025-11-14 01:14:05 WARN NettyServerPublicExecutor_4 - [MONITOR]The index 126097589609 has already existed with info[group=rocketmq-broker-23,term=2,code=200,loca
l=n0,remote=n1,leader=n0] and curr is info[group=rocketmq-broker-23,term=2,code=200,local=n0,remote=n1,leader=n0]
2025-11-14 01:14:06 WARN NettyServerPublicExecutor_1 - [MONITOR]The index 126097589609 has already existed with info[group=rocketmq-broker-23,term=2,code=200,loca
l=n0,remote=n1,leader=n0] and curr is info[group=rocketmq-broker-23,term=2,code=200,local=n0,remote=n1,leader=n0]
2025-11-14 01:14:06 INFO QuorumAckChecker-n1 - [n1][CANDIDATE] term=3 ledgerBegin=124835725784 ledgerEnd=126097589608 committed=126097589608 watermarks={3:{"n0":-1,"n1":-1,"n2":-1
}}
。。。



2025-11-14 11:15:22 INFO StateMaintainer - [n1][GetVoteResponse] {"code":200,"group":"rocketmq-broker-23","leaderId":"n1","localId":"n1","remoteId":"n1","term":3,"voteResult":"ACCEPT"}
2025-11-14 11:15:22 INFO NettyClientPublicExecutor_3 - [n1][GetVoteResponse] {"code":200,"group":"rocketmq-broker-23","leaderId":"n1","localId":"n1","remoteId":"n0","term":2,"voteResult":"REJECT_SMALL_LEDGER_END_INDEX"}
2025-11-14 11:15:22 INFO NettyClientPublicExecutor_4 - [n1][GetVoteResponse] {"code":200,"group":"rocketmq-broker-23","leaderId":"n1","localId":"n1","remoteId":"n2","term":2,"voteResult":"REJECT_SMALL_LEDGER_END_INDEX"}
2025-11-14 11:15:22 INFO StateMaintainer - [n1] [PARSE_VOTE_RESULT] cost=1 term=3 memberNum=3 allNum=3 acceptedNum=1 notReadyTermNum=0 biggerLedgerNum=2 alreadyHasLeader=false maxTerm=3 result=WAIT_TO_REVOTE
2025-11-14 11:15:22 INFO QuorumAckChecker-n1 - [n1][CANDIDATE] term=3 ledgerBegin=125158751135 ledgerEnd=126097589608 committed=126097589608 watermarks={3:{"n0":-1,"n1":-1,"n2":-1}}
2025-11-14 11:15:23 WARN NettyServerPublicExecutor_3 - [MONITOR]The index 126097589609 has already existed with info[group=rocketmq-broker-23,term=2,code=200,local=n0,remote=n1,leader=n0] and curr is info[group=rocketmq-broker-23,term=2,code=200,local=n0,remote=n1,leader=n0]
2025-11-14 11:15:23 INFO StateMaintainer - [n1][GetVoteResponse] {"code":200,"group":"rocketmq-broker-23","leaderId":"n1","localId":"n1","remoteId":"n1","term":3,"voteResult":"ACCEPT"}

复现过程是,主节点到其中一个从节点网络一直超时,然后其中一个从节点开始尝试选主,然后网络恢复后,失联的从节点无法继续同步.
使用的是 dledger-all-0.3.2 版本,所以和 #251 应该不是一个问题。 另外为什么 candidate 收到多数派投票拒绝后,不会尝试变成 follower ?看代码 candidate 如果收到合法的 heartbeat 应该会变成 follower 吧?
但是看起来心跳无法处理这种情况

  1. 网络分区时 :n1变成candidate,term=3
  2. 网络恢复后 :n0仍然在term=2,向n1发送heartbeat
  3. term冲突 :n1收到 term=2 < currTerm=3 的heartbeat,直接返回 EXPIRED_TERM
  4. 持续选举 :n1继续在term=3中保持candidate,无法变成follower

有问题的是, candidate 一开始 term 变高了,n1_[INCREASE_TERM] from 2 to 3 触发条件是什么,此时还没开始投票

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions