Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv client failed to fetch some region data #321

Closed
amyangfei opened this issue Mar 8, 2020 · 5 comments
Closed

kv client failed to fetch some region data #321

amyangfei opened this issue Mar 8, 2020 · 5 comments
Labels
area/ticdc Issues or PRs related to TiCDC. component/kv-client TiKV kv log client component. component/tikv TiKV component. type/bug This is a bug.

Comments

@amyangfei
Copy link
Contributor

amyangfei commented Mar 8, 2020

At about 2020/03/08 05:27:06.141 -04:00 the replication does not forward any anymore, because one of table's(sbtest3) resolved ts is not forward

CDC uses #308 version, TiKV uses 5kbpers/tikv@1765a5b version

➜  curl -s http://172.16.5.113:10080/tables/cdc_bench/sbtest3/regions |grep region_id
   "region_id": 176,
   "region_id": 192,
   "region_id": 204,
   "region_id": 160,

cdc log: http://139.219.11.38:8000/KJTrQ/issue_321_cdc.log.tar.gz
tikv log: http://139.219.11.38:8000/NK0s0/tikv.log.tar.gz

some abnormal behavior:

  • region_id=160, last register at
[2020/03/08 05:26:56.099 -04:00] [INFO] [endpoint.rs:242] ["cdc register region"] [region_id=160]

but last resolved ts in TiKV is

[2020/03/08 05:29:05.462 -04:00] [INFO] [delegate.rs:279] ["resolved ts updated"] [resolved_ts=415146900050149856] [region_id=160]
  • region_id=176, not register after deregister
[2020/03/08 05:26:56.099 -04:00] [INFO] [endpoint.rs:242] ["cdc register region"] [region_id=176]
[2020/03/08 05:26:56.104 -04:00] [INFO] [endpoint.rs:169] ["cdc deregister region"] [error="Some(Request(message: \"peer is not leader for region 176, leader may None\" not_leader { region_id: 176 }))"] [conn_id=Some(ConnID(6))] [downstream_id=Some(DownstreamID(10))] [region_id=176]
@amyangfei amyangfei added type/bug This is a bug. component/kv-client TiKV kv log client component. component/tikv TiKV component. labels Mar 8, 2020
@amyangfei
Copy link
Contributor Author

amyangfei commented Mar 8, 2020

another failed case

some abnormal regions, such as region_id=434, region_id=438

➜  grep register issue_321_tikv_cdc211.5.16.172.log|grep region_id=434
[2020/03/08 08:36:18.671 -04:00] [INFO] [endpoint.rs:242] ["cdc register region"] [region_id=434]
[2020/03/08 08:36:21.463 -04:00] [INFO] [endpoint.rs:169] ["cdc deregister region"] [error="Some(Request(message: \"peer is not leader for region 434, leader may None\" not_leader { region_id: 434 }))"] [conn_id=None] [downstream_id=None] [region_id=434]

➜  grep register issue_321_tikv_cdc211.5.16.172.log|grep region_id=438
[2020/03/08 08:36:18.672 -04:00] [INFO] [endpoint.rs:242] ["cdc register region"] [region_id=438]
[2020/03/08 08:37:10.061 -04:00] [INFO] [endpoint.rs:169] ["cdc deregister region"] [error="Some(Request(message: \"peer is not leader for region 438, leader may None\" not_leader { region_id: 438 }))"] [conn_id=None] [downstream_id=None] [region_id=438]

resolved ts of sbtest1(table_id=47) is not update since 2020/03/08 08:36:17 -04:00 (approximately)
region ids for table sbtest1: http://139.219.11.38:8000/C0Zxb/region_ids.log
cdc log: http://139.219.11.38:8000/bKqMz/issue_321_cdc.log.tar.gz
tikv log: http://139.219.11.38:8000/DKisT/tikv.log.tar.gz

Screen Shot 2020-03-08 at 20 53 37

@amyangfei
Copy link
Contributor Author

amyangfei commented Mar 9, 2020

Found three abnormal regions: 15032, 15172, 588

such as the region=15172

:) tidb@cdc210.5.16.172 in [~/cdcbench1/deploy/log] [5:42:43]
➜  grep register issue_321_tikv_cdc210.5.16.172.log|grep region_id=15172
[2020/03/09 05:11:12.076 -04:00] [INFO] [endpoint.rs:242] ["cdc register region"] [region_id=15172]
[2020/03/09 05:12:33.328 -04:00] [INFO] [endpoint.rs:169] ["cdc deregister region"] [error="Some(Request(message: \"peer is not leader for region 15172, leader may None\" not_leader { region_id: 15172 }))"] [conn_id=None] [downstream_id=None] [region_id=15172]

:) tidb@cdc211.5.16.172 in [~/cdcbench1/deploy/log] [5:45:40]
➜  grep register issue_321_tikv_cdc211.5.16.172.log|grep region_id=15172
[2020/03/09 05:12:33.335 -04:00] [INFO] [endpoint.rs:242] ["cdc register region"] [region_id=15172]
[2020/03/09 05:12:33.336 -04:00] [INFO] [endpoint.rs:169] ["cdc deregister region"] [error="Some(Request(message: \"peer is not leader for region 15172, leader may Some(id: 15175 store_id: 5)\" not_leader { region_id: 15172 leader { id: 15175 store_id: 5 } }))"] [conn_id=Some(ConnID(2))] [downstream_id=Some(DownstreamID(3685))] [region_id=15172]

regions for table sbtest3(53): http://139.219.11.38:8000/xromH/region_ids.log
cdc.log: http://139.219.11.38:8000/wc6di/issue_321_cdc.log.tar.gz
TiKV log: http://139.219.11.38:8000/XIxF0/tikv.log.tar.gz

Screen Shot 2020-03-09 at 17 52 29

@amyangfei
Copy link
Contributor Author

another test

abnormal regions: http://139.219.11.38:8000/9Djy8/abnormal_regions.log

Screen Shot 2020-03-09 at 19 35 26

@amyangfei
Copy link
Contributor Author

amyangfei commented Mar 10, 2020

abnormal region, region_id=42680

:) tidb@cdc209.5.16.172 in [~/cdcbench1/deploy/log] [0:24:26]
➜  grep register tikv.log2020-03-10-00:12:03.415082273|grep "region_id=42680"

[2020/03/10 00:01:00.218 -04:00] [INFO] [endpoint.rs:242] ["cdc register region"] [region_id=42680]
[2020/03/10 00:01:00.218 -04:00] [INFO] [endpoint.rs:169] ["cdc deregister region"] [error="Some(Request(message: \"peer is not leader for region 42680, leader may None\" not_leader { region_id: 42680 }))"] [conn_id=Some(ConnID(0))] [downstream_id=Some(DownstreamID(5324))] [region_id=42680]

:) tidb@cdc211.5.16.172 in [~/cdcbench1/deploy/log] [0:24:32]
➜  grep register tikv.log2020-03-10-00:17:15.781205190|grep "region_id=42680"
[2020/03/10 00:01:00.161 -04:00] [INFO] [endpoint.rs:242] ["cdc register region"] [region_id=42680]
[2020/03/10 00:01:00.166 -04:00] [INFO] [endpoint.rs:169] ["cdc deregister region"] [error="Some(Request(message: \"peer is not leader for region 42680, leader may None\" not_leader { region_id: 42680 }))"] [conn_id=None] [downstream_id=None] [region_id=42680]
[2020/03/10 00:01:00.243 -04:00] [INFO] [endpoint.rs:242] ["cdc register region"] [region_id=42680]
[2020/03/10 00:01:00.427 -04:00] [INFO] [endpoint.rs:169] ["cdc deregister region"] [error="Some(Request(message: \"peer is not leader for region 42680, leader may None\" not_leader { region_id: 42680 }))"] [conn_id=Some(ConnID(2))] [downstream_id=Some(DownstreamID(16334))] [region_id=42680]
[2020/03/10 00:01:00.427 -04:00] [INFO] [endpoint.rs:169] ["cdc deregister region"] [error="Some(Request(message: \"peer is not leader for region 42680, leader may None\" not_leader { region_id: 42680 }))"] [conn_id=Some(ConnID(2))] [downstream_id=Some(DownstreamID(16785))] [region_id=42680]
[2020/03/10 00:01:00.427 -04:00] [INFO] [endpoint.rs:169] ["cdc deregister region"] [error="Some(Request(message: \"peer is not leader for region 42680, leader may None\" not_leader { region_id: 42680 }))"] [conn_id=None] [downstream_id=None] [region_id=42680]

:) tidb@cdc210.5.16.172 in [~/cdcbench1/deploy/log] [0:24:26]
➜  grep register tikv.log2020-03-10-00:04:40.631465943|grep "region_id=42680"
[2020/03/10 00:01:00.219 -04:00] [INFO] [endpoint.rs:242] ["cdc register region"] [region_id=42680]
[2020/03/10 00:01:00.219 -04:00] [INFO] [endpoint.rs:169] ["cdc deregister region"] [error="Some(Request(message: \"peer is not leader for region 42680, leader may None\" not_leader { region_id: 42680 }))"] [conn_id=Some(ConnID(0))] [downstream_id=Some(DownstreamID(12805))] [region_id=42680]

cdc log: http://139.219.11.38:8000/xshlo/cdc.log.tar.gz

Screen Shot 2020-03-10 at 12 28 14

@amyangfei
Copy link
Contributor Author

amyangfei commented Mar 10, 2020

abnormal region id = 11276

:) tidb@cdc211.5.16.172 in [~/cdcbench1/deploy/log] [9:30:07]
➜  grep "register" issue_321_tikv_cdc211.5.16.172.log|grep 11276
[2020/03/10 09:16:11.924 -04:00] [INFO] [endpoint.rs:242] ["cdc register region"] [region_id=11276]
[2020/03/10 09:16:11.924 -04:00] [INFO] [endpoint.rs:169] ["cdc deregister region"] [error="Some(Request(message: \"peer is not leader for region 11276, leader may None\" not_leader { region_id: 11276 }))"] [conn_id=None] [downstream_id=None] [region_id=11276]
[2020/03/10 09:16:11.931 -04:00] [INFO] [endpoint.rs:242] ["cdc register region"] [region_id=11276]
[2020/03/10 09:16:11.940 -04:00] [INFO] [endpoint.rs:169] ["cdc deregister region"] [error="Some(Request(message: \"peer is not leader for region 11276, leader may None\" not_leader { region_id: 11276 }))"] [conn_id=Some(ConnID(0))] [downstream_id=Some(DownstreamID(4232))] [region_id=11276]
[2020/03/10 09:16:11.940 -04:00] [INFO] [endpoint.rs:169] ["cdc deregister region"] [error="Some(Request(message: \"peer is not leader for region 11276, leader may None\" not_leader { region_id: 11276 }))"] [conn_id=Some(ConnID(0))] [downstream_id=Some(DownstreamID(4240))] [region_id=11276]

:) tidb@cdc209.5.16.172 in [~/cdcbench1/deploy/log] [9:30:07]
➜  grep "register" issue_321_tikv_cdc209.5.16.172.log|grep 11276
[2020/03/10 09:16:11.930 -04:00] [INFO] [endpoint.rs:242] ["cdc register region"] [region_id=11276]
[2020/03/10 09:16:11.930 -04:00] [INFO] [endpoint.rs:169] ["cdc deregister region"] [error="Some(Request(message: \"region 11276 not found\" region_not_found { region_id: 11276 }))"] [conn_id=Some(ConnID(2))] [downstream_id=Some(DownstreamID(1535))] [region_id=11276]

cdc last 10 logs with region 11276

cdc-2020-03-10T09-16-58.914.log:[2020/03/10 09:16:11.926 -04:00] [DEBUG] [client.go:334] ["dispatching region"] [regionID=11276]
cdc-2020-03-10T09-16-58.914.log:[2020/03/10 09:16:11.926 -04:00] [INFO] [client.go:403] ["start new request"] [request="{\"header\":{\"cluster_id\":6802558252252738793},\"region_id\":11276,\"region_epoch\":{\"conf_ver\":5,\"version\":1981},\"checkpoint_ts\":415195770875806460,\"start_key\":\"dIAAAAAAAAD/L19ygAAAAAH/BYcFAAAAAAD6\",\"end_key\":\"dIAAAAAAAAD/L19ygAAAAAH/BZDJAAAAAAD6\"}"] [addr=172.16.5.209:20160]
cdc-2020-03-10T09-16-58.914.log:[2020/03/10 09:16:11.926 -04:00] [DEBUG] [client.go:440] ["request to region successfully sent"] [regionID=11276] [addr=172.16.5.209:20160]
cdc-2020-03-10T09-16-58.914.log:[2020/03/10 09:16:11.927 -04:00] [DEBUG] [client.go:701] ["recv ChangeDataEvent"] [event="[\"regionID:11276, error:region_not_found:<region_id:11276 > ;\"]"] [addr=172.16.5.209:20160]
cdc-2020-03-10T09-16-58.914.log:[2020/03/10 09:16:11.927 -04:00] [DEBUG] [client.go:752] ["spawning partialRegionFeed goroutine for region"] [regionID=11276]
cdc-2020-03-10T09-16-58.914.log:[2020/03/10 09:16:11.927 -04:00] [DEBUG] [client.go:836] ["singleEventFeed got event"] [event="region_id:11276 error:<region_not_found:<region_id:11276 > > "]
cdc-2020-03-10T09-16-58.914.log:[2020/03/10 09:16:11.928 -04:00] [INFO] [client.go:505] ["EventFeed disconnected"] [regionID=11276] [span="{\"Start\":\"dIAAAAAAAAD/L19ygAAAAAH/BYcFAAAAAAD6\",\"End\":\"dIAAAAAAAAD/L19ygAAAAAH/BZDJAAAAAAD6\"}"] [checkpoint=415195770875806460] [error="region_not_found:<region_id:11276 > "] [errorVerbose="region_not_found:<region_id:11276 > \ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.4/errors.go:174\ngithub.com/pingcap/errors.Trace\n\tgithub.com/pingcap/errors@v0.11.4/juju_adaptor.go:15\ngithub.com/pingcap/ticdc/cdc/kv.(*CDCClient).singleEventFeed\n\tgithub.com/pingcap/ticdc@/cdc/kv/client.go:928\ngithub.com/pingcap/ticdc/cdc/kv.(*CDCClient).partialRegionFeed\n\tgithub.com/pingcap/ticdc@/cdc/kv/client.go:494\ngithub.com/pingcap/ticdc/cdc/kv.(*CDCClient).receiveFromStream.func2\n\tgithub.com/pingcap/ticdc@/cdc/kv/client.go:757\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20190423024810-112230192c58/errgroup/errgroup.go:57\nruntime.goexit\n\truntime/asm_amd64.s:1357"]
cdc-2020-03-10T09-16-58.914.log:[2020/03/10 09:16:11.932 -04:00] [DEBUG] [client.go:334] ["dispatching region"] [regionID=11276]
cdc-2020-03-10T09-16-58.914.log:[2020/03/10 09:16:11.932 -04:00] [INFO] [client.go:403] ["start new request"] [request="{\"header\":{\"cluster_id\":6802558252252738793},\"region_id\":11276,\"region_epoch\":{\"conf_ver\":5,\"version\":1981},\"checkpoint_ts\":415195770875806460,\"start_key\":\"dIAAAAAAAAD/L19ygAAAAAH/BYcFAAAAAAD6\",\"end_key\":\"dIAAAAAAAAD/L19ygAAAAAH/BZDJAAAAAAD6\"}"] [addr=172.16.5.211:20160]
cdc-2020-03-10T09-16-58.914.log:[2020/03/10 09:16:11.932 -04:00] [DEBUG] [client.go:440] ["request to region successfully sent"] [regionID=11276] [addr=172.16.5.211:20160]

cdc log: http://139.219.11.38:8000/hVsFk/cdc.log.tar.gz
tikv log: http://139.219.11.38:8000/5LSql/tikv.log.tar.gz

@AkiraXie AkiraXie added the area/ticdc Issues or PRs related to TiCDC. label Mar 9, 2022
@github-actions github-actions bot added this to Need Triage in Question and Bug Reports Mar 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ticdc Issues or PRs related to TiCDC. component/kv-client TiKV kv log client component. component/tikv TiKV component. type/bug This is a bug.
Projects
Development

No branches or pull requests

2 participants