Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The peer to be destroyed is inconsistent with the current peer. #666

Open
hslam opened this issue Sep 22, 2021 · 2 comments
Open

The peer to be destroyed is inconsistent with the current peer. #666

hslam opened this issue Sep 22, 2021 · 2 comments

Comments

@hslam
Copy link
Contributor

hslam commented Sep 22, 2021

11:50.880 [applier.go:692] ["[region 144:15] 216 execute admin command. term 7, index 46, command cmd_type:ChangePeer change_peer:<change_type:RemoveNode peer:<id:216 store_id:13 > > "]
11:50.880 [applier.go:926] ["[region 144:15] 216 exec ConfChange, peer_id 216, type RemoveNode, epoch conf_ver:16 version:15 "]
11:50.880 [applier.go:973] ["[region 144:15] 216 remove peer successfully, peer id:216 store_id:13 , region id:144 region_epoch:<conf_ver:16 version:15 > peers:<id:147 store_id:7 > peers:<id:216 store_id:13 > peers:<id:270 store_id:1 > peers:<id:323 store_id:14 > "]
11:50.880 [applier.go:1325] ["[region 144:15] 216 remove applier"]
11:50.881 [fsm_peer.go:722] ["region 144:15 remove node [store 13 peer 216] from node [store 13 peer 216]"]
11:50.881 [fsm_peer.go:621] ["[region 144] 216 starts destroy [merged_by_target: false]"]
11:50.881 [peer.go:443] ["[region 144] 216 begin to destroy"]
11:50.881 [peer_storage.go:412] ["region 144:15 clear meta from peer storage"]
11:50.881 [peer.go:478] ["[region 144] 216 destroy itself, takes 92.361µs"]
11:50.881 +08:00] [WARN] [peer_worker.go:288] ["region 144 peer state is nil"]
12:04.520 [fsm_peer.go:91] ["[region 144] replicates peer with ID 216"]
12:04.520 [raft.go:765] ["144 became follower at term 0"]
12:04.520 [raft.go:765] ["144 became follower at term 1"]
12:04.520 [router.go:60] ["register region 144:0, peer 216"]
12:06.522 [fsm_peer.go:477] ["[region 144] 216 is stale as received a larger peer id:370 store_id:13 role:Learner , destroying"]
12:06.522 [fsm_peer.go:621] ["[region 144] 216 starts destroy [merged_by_target: false]"]
12:06.522 [peer.go:443] ["[region 144] 216 begin to destroy"]
12:06.522 [peer_storage.go:412] ["region 144:0 clear meta from peer storage"]
12:06.522 [peer.go:478] ["[region 144] 216 destroy itself, takes 81.971µs"]
12:06.523 [fsm_peer.go:91] ["[region 144] replicates peer with ID 370"]
12:06.523 [raft.go:765] ["144 became follower at term 0"]
12:06.523 [raft.go:765] ["144 became follower at term 1"]
12:06.523 [router.go:60] ["register region 144:0, peer 370"]
12:06.523 [applier.go:1325] ["[region 144:0] 370 remove applier"]
12:06.524 [error.go:62] ["region 144:0 peer 216, destroy wrong peer 370
github.com/pingcap/badger/y.AssertTruef
	/Users/huangmeng/go/pkg/mod/github.com/pingcap/badger@v1.5.1-0.20210918122008-22b718b5a6ba/y/error.go:62
github.com/ngaut/unistore/tikv/raftstore.(*peerMsgHandler).onApplyResult
	/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/fsm_peer.go:339
github.com/ngaut/unistore/tikv/raftstore.(*peerMsgHandler).HandleMsgs
	/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/fsm_peer.go:179
github.com/ngaut/unistore/tikv/raftstore.(*raftWorker).handleMsgs
	/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/peer_worker.go:224
github.com/ngaut/unistore/tikv/raftstore.(*raftWorker).run
	/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/peer_worker.go:143
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1371"]
@hslam
Copy link
Contributor Author

hslam commented Sep 22, 2021

When the number of stores is greater than the number of copies, the pd schedules region copies to maintain cluster stores size balance. Frequent deletion of scheduled peers can easily cause this issue.

@hslam
Copy link
Contributor Author

hslam commented Sep 23, 2021

When a peer is destroyed and a raft msg handler is created using the previously cached peer inbox. This issue will occur If use this msg handler to handle messages. When the received message is appended to the corresponding peer, we can check whether the peer is destroyed, and if it is, we update the peer inbox in the cache.

06:38.870 [fsm_peer.go:91] ["[region 97] replicates peer with ID 249"]
06:38.871 [raft.go:765] ["97 became follower at term 0"]
06:38.871 [raft.go:765] ["97 became follower at term 1"]
06:38.871 [router.go:60] ["register region 97:0, peer 249"]
06:38.879 [fsm_peer.go:478] ["[region 97] 249 is stale as received a larger peer id:368 store_id:3 role:Learner , destroying"]
06:38.879 [fsm_peer.go:622] ["[region 97] 249 starts destroy [merged_by_target: false]"]
06:38.879 [peer.go:443] ["[region 97] 249 begin to destroy"]
06:38.879 [peer_storage.go:412] ["region 97:0 clear meta from peer storage"]
06:38.879 [peer.go:478] ["[region 97] 249 destroy itself, takes 153.707µs"]
06:38.879 [fsm_peer.go:91] ["[region 97] replicates peer with ID 368"]
06:38.879 [raft.go:765] ["97 became follower at term 0"]
06:38.879 [raft.go:765] ["97 became follower at term 1"]
06:38.879 [router.go:60] ["register region 97:0, peer 368"]
06:38.891 [applier.go:1325] ["[region 97:0] 368 remove applier"]
06:38.891 [peer_worker.go:220] ["region 97: peer is not match. request 368, current 249, stopped true"] 
[stack="github.com/ngaut/unistore/tikv/raftstore.(*raftWorker).getPeerInbox
	/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/peer_worker.go:220
github.com/ngaut/unistore/tikv/raftstore.(*raftWorker).receiveMsgs
	/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/peer_worker.go:194
github.com/ngaut/unistore/tikv/raftstore.(*raftWorker).run
	/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/peer_worker.go:140"]
06:38.891 [error.go:62] ["region 97:0 peer 249, stopped true, destroy wrong peer 368
[stack="github.com/pingcap/badger/y.AssertTruef
	/Users/huangmeng/go/pkg/mod/github.com/pingcap/badger@v1.5.1-0.20210918122008-22b718b5a6ba/y/error.go:62
github.com/ngaut/unistore/tikv/raftstore.(*peerMsgHandler).onApplyResult
	/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/fsm_peer.go:339
github.com/ngaut/unistore/tikv/raftstore.(*peerMsgHandler).HandleMsgs
	/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/fsm_peer.go:179
github.com/ngaut/unistore/tikv/raftstore.(*raftWorker).handleMsgs
	/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/peer_worker.go:231
github.com/ngaut/unistore/tikv/raftstore.(*raftWorker).run
	/Users/huangmeng/go/src/github.com/ngaut/unistore/tikv/raftstore/peer_worker.go:143"]

func (rw *raftWorker) handleMsgs() {
begin := time.Now()
rw.raftCtx.pendingCount = 0
for _, inbox := range rw.inboxes {
h := newRaftMsgHandler(inbox.peer, rw.raftCtx)
h.HandleMsgs(inbox.msgs...)
}
rw.handleMsgDc.collect(time.Since(begin))
}

func (rw *raftWorker) getPeerInbox(regionID uint64) *peerInbox {
inbox, ok := rw.inboxes[regionID]
if !ok {
peerState := rw.pr.get(regionID)
inbox = &peerInbox{peer: peerState.peer}
rw.inboxes[regionID] = inbox
}
return inbox
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant