-
Notifications
You must be signed in to change notification settings - Fork 323
Open
Description
主
2025-11-14 01:14:04 WARN NettyClientPublicExecutor_3 - [Push-n1]Get error response code [code=302,name=NETWORK_ERROR,desc=] info[group=rocketmq-broker-23,term=2,code=302,local=n0,remote=n1,leader=n0]
。。。
2025-11-14 11:16:59 WARN EntryDispatcher-n0-n1 - [Push-n1]Retry to push entry at 126097589609
2025-11-14 11:16:59 WARN NettyClientPublicExecutor_4 - [Push-n1]Get error response code [code=413,name=REPEATED_PUSH,desc=] info[group=rocketmq-broker-23,term=2,code=413,local=null,remote=null,leader=null]
2025-11-14 11:17:00 WARN EntryDispatcher-n0-n1 - [Push-n1]Retry to push entry at 126097589609
2025-11-14 11:17:00 WARN NettyClientPublicExecutor_6 - [Push-n1]Get error response code [code=413,name=REPEATED_PUSH,desc=] info[group=rocketmq-broker-23,term=2,code=413,local=null,remote=null,leader=null]
2025-11-14 11:17:01 WARN EntryDispatcher-n0-n1 - [Push-n1]Retry to push entry at 126097589609
2025-11-14 11:17:01 WARN NettyClientPublicExecutor_5 - [Push-n1]Get error response code [code=413,name=REPEATED_PUSH,desc=] info[group=rocketmq-broker-23,term=2,code=413,local=null,remote=null,leader=null]
2025-11-14 11:17:01 INFO QuorumAckChecker-n0 - [n0][LEADER] term=2 ledgerBegin=125158751135 ledgerEnd=126204270130 committed=126204270117 watermarks={2:{"n0":126204270130,"n1":126097589608,"n2":126204270121}}
follower
2025-11-14 01:14:00 INFO QuorumAckChecker-n1 - [n1][FOLLOWER] term=2 ledgerBegin=124835725784 ledgerEnd=126097588785 committed=126097588785 watermarks={2:{"n0":-1,"n1":-1,"n2":-1}
}
2025-11-14 01:14:03 INFO QuorumAckChecker-n1 - [n1][FOLLOWER] term=2 ledgerBegin=124835725784 ledgerEnd=126097589536 committed=126097589536 watermarks={2:{"n0":-1,"n1":-1,"n2":-1}
}
2025-11-14 01:14:04 INFO StateMaintainer - [n1][HeartBeatTimeOut] lastLeaderHeartBeatTime: 2025-11-14 01:13:58.079 heartBeatTimeIntervalMs: 2000 lastLeader=n0
2025-11-14 01:14:04 INFO StateMaintainer - [n1] [ChangeRoleToCandidate] from term: 2 and currTerm: 2
2025-11-14 01:14:04 INFO QuorumAckChecker-n1 - Initialize the pending append map in QuorumAckChecker for term=3
2025-11-14 01:14:04 INFO QuorumAckChecker-n1 - Initialize the watermark in QuorumAckChecker for term=3
2025-11-14 01:14:04 INFO QuorumAckChecker-n1 - [TermChange] Will clear the watermarks for term changed from 2 to 3
2025-11-14 01:14:04 INFO StateMaintainer - n1_[INCREASE_TERM] from 2 to 3
2025-11-14 01:14:04 INFO StateMaintainer - [n1][GetVoteResponse] {"code":200,"group":"rocketmq-broker-23","leaderId":"n1","localId":"n1","remoteId":"n1","term":3,
"voteResult":"ACCEPT"}
2025-11-14 01:14:04 INFO NettyClientPublicExecutor_5 - [n1][GetVoteResponse] {"code":200,"group":"rocketmq-broker-23","leaderId":"n1","localId":"n1","remoteId":"n
2","term":2,"voteResult":"REJECT_SMALL_LEDGER_END_INDEX"}
2025-11-14 01:14:04 INFO NettyClientPublicExecutor_6 - [n1][GetVoteResponse] {"code":200,"group":"rocketmq-broker-23","leaderId":"n1","localId":"n1","remoteId":"n
0","term":2,"voteResult":"REJECT_SMALL_LEDGER_END_INDEX"}
2025-11-14 01:14:04 INFO StateMaintainer - [n1] [PARSE_VOTE_RESULT] cost=466 term=3 memberNum=3 allNum=3 acceptedNum=1 notReadyTermNum=0 biggerLedgerNum=2 alreadyHasLeader=false m
axTerm=3 result=WAIT_TO_REVOTE
2025-11-14 01:14:05 WARN NettyServerPublicExecutor_4 - [MONITOR]The index 126097589609 has already existed with info[group=rocketmq-broker-23,term=2,code=200,loca
l=n0,remote=n1,leader=n0] and curr is info[group=rocketmq-broker-23,term=2,code=200,local=n0,remote=n1,leader=n0]
2025-11-14 01:14:06 WARN NettyServerPublicExecutor_1 - [MONITOR]The index 126097589609 has already existed with info[group=rocketmq-broker-23,term=2,code=200,loca
l=n0,remote=n1,leader=n0] and curr is info[group=rocketmq-broker-23,term=2,code=200,local=n0,remote=n1,leader=n0]
2025-11-14 01:14:06 INFO QuorumAckChecker-n1 - [n1][CANDIDATE] term=3 ledgerBegin=124835725784 ledgerEnd=126097589608 committed=126097589608 watermarks={3:{"n0":-1,"n1":-1,"n2":-1
}}
。。。
2025-11-14 11:15:22 INFO StateMaintainer - [n1][GetVoteResponse] {"code":200,"group":"rocketmq-broker-23","leaderId":"n1","localId":"n1","remoteId":"n1","term":3,"voteResult":"ACCEPT"}
2025-11-14 11:15:22 INFO NettyClientPublicExecutor_3 - [n1][GetVoteResponse] {"code":200,"group":"rocketmq-broker-23","leaderId":"n1","localId":"n1","remoteId":"n0","term":2,"voteResult":"REJECT_SMALL_LEDGER_END_INDEX"}
2025-11-14 11:15:22 INFO NettyClientPublicExecutor_4 - [n1][GetVoteResponse] {"code":200,"group":"rocketmq-broker-23","leaderId":"n1","localId":"n1","remoteId":"n2","term":2,"voteResult":"REJECT_SMALL_LEDGER_END_INDEX"}
2025-11-14 11:15:22 INFO StateMaintainer - [n1] [PARSE_VOTE_RESULT] cost=1 term=3 memberNum=3 allNum=3 acceptedNum=1 notReadyTermNum=0 biggerLedgerNum=2 alreadyHasLeader=false maxTerm=3 result=WAIT_TO_REVOTE
2025-11-14 11:15:22 INFO QuorumAckChecker-n1 - [n1][CANDIDATE] term=3 ledgerBegin=125158751135 ledgerEnd=126097589608 committed=126097589608 watermarks={3:{"n0":-1,"n1":-1,"n2":-1}}
2025-11-14 11:15:23 WARN NettyServerPublicExecutor_3 - [MONITOR]The index 126097589609 has already existed with info[group=rocketmq-broker-23,term=2,code=200,local=n0,remote=n1,leader=n0] and curr is info[group=rocketmq-broker-23,term=2,code=200,local=n0,remote=n1,leader=n0]
2025-11-14 11:15:23 INFO StateMaintainer - [n1][GetVoteResponse] {"code":200,"group":"rocketmq-broker-23","leaderId":"n1","localId":"n1","remoteId":"n1","term":3,"voteResult":"ACCEPT"}
复现过程是,主节点到其中一个从节点网络一直超时,然后其中一个从节点开始尝试选主,然后网络恢复后,失联的从节点无法继续同步.
使用的是 dledger-all-0.3.2 版本,所以和 #251 应该不是一个问题。 另外为什么 candidate 收到多数派投票拒绝后,不会尝试变成 follower ?看代码 candidate 如果收到合法的 heartbeat 应该会变成 follower 吧?
但是看起来心跳无法处理这种情况
- 网络分区时 :n1变成candidate,term=3
- 网络恢复后 :n0仍然在term=2,向n1发送heartbeat
- term冲突 :n1收到 term=2 < currTerm=3 的heartbeat,直接返回 EXPIRED_TERM
- 持续选举 :n1继续在term=3中保持candidate,无法变成follower
有问题的是, candidate 一开始 term 变高了,n1_[INCREASE_TERM] from 2 to 3 触发条件是什么,此时还没开始投票
Metadata
Metadata
Assignees
Labels
No labels