Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check PD list validation #1186

Closed
siddontang opened this issue Oct 18, 2016 · 1 comment
Closed

check PD list validation #1186

siddontang opened this issue Oct 18, 2016 · 1 comment
Milestone

Comments

@siddontang
Copy link
Contributor

pd1 is a standalone cluster(a), pd2 and pd3 is the other cluster(b). We have many TiKVs, when the TiKV bootstraps, it randomly selects a PD to connect, so we may have two TiKVs which have same store ID (1) but belong to different clusters (a and b). We use 1a and 1b to distinguish them.
When 1b is down and 1a restarts, it may connect to cluster b because it chooses the PD in PD list randomly. Badly, 1a can join the cluster b, can communicate with other TiKVs in the cluster b too and we will meet data corruption here.

When we start tikv with a pd list like pd1:2379,pd2:2379,pd3:2379, we must check these PDs are all in the same cluster. E,g, we can use members API to get all PD members. If we get pd1 and pd2 only in the response, we can think PD3 is not a valid PD. But here we should also consider the condition that PD3 has already been removed.

/cc @huachaohuang

@siddontang
Copy link
Contributor Author

Maybe PD can panic too if it finds the initialize list has different clusters.
/cc @xiang90 does it etcd support this directly? I only see some error logs, like:

2016/10/12 17:19:31 util.go:315: [info] I | rafthttp: [started streaming with peer 9d69de9935de6684 (writer)]
2016/10/12 17:19:31 util.go:315: [info] I | rafthttp: [started peer 9d69de9935de6684]
2016/10/12 17:19:31 util.go:315: [info] I | rafthttp: [added peer 9d69de9935de6684]
2016/10/12 17:19:31 util.go:315: [info] I | rafthttp: [started streaming with peer 9d69de9935de6684 (stream Message reader)]
2016/10/12 17:19:31 util.go:315: [info] I | rafthttp: [started streaming with peer 9d69de9935de6684 (stream MsgApp v2 reader)]
2016/10/12 17:19:31 util.go:315: [info] I | etcdserver: [starting server... [version: 3.0.0+git, cluster version: to_be_decided]]
2016/10/12 17:19:31 util.go:315: [info] I | v2http: [pprof is enabled under /debug/pprof]
2016/10/12 17:19:31 util.go:315: [info] I | membership: [added member 8b3cc3552607ca0b [http://10.9.95.73:2380] to cluster 358203f6402a9fb4]
2016/10/12 17:19:31 util.go:315: [info] I | membership: [added member 9d69de9935de6684 [http://10.9.94.185:2380] to cluster 358203f6402a9fb4]
2016/10/12 17:19:31 util.go:315: [info] I | membership: [added member d14d3e1502e18b01 [http://10.9.154.41:2380] to cluster 358203f6402a9fb4]
2016/10/12 17:19:31 util.go:309: [error] E | rafthttp: [request sent was ignored (cluster ID mismatch: peer[8b3cc3552607ca0b]=1c0abae39fd0e2bc, local=358203f6402a9fb4)]
2016/10/12 17:19:31 util.go:309: [error] E | rafthttp: [request sent was ignored (cluster ID mismatch: peer[8b3cc3552607ca0b]=1c0abae39fd0e2bc, local=358203f6402a9fb4)]

@siddontang siddontang modified the milestone: RC1 Nov 12, 2016
iosmanthus pushed a commit to iosmanthus/tikv that referenced this issue Jan 4, 2024
…ikv#1182)" (tikv#1186)

This reverts commit 5b0414e.

Signed-off-by: Ping Yu <yuping@pingcap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants