Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data inconsistency after injection of cdc network loss #9344

Closed
fubinzh opened this issue Jul 5, 2023 · 6 comments · Fixed by #9396
Closed

data inconsistency after injection of cdc network loss #9344

fubinzh opened this issue Jul 5, 2023 · 6 comments · Fixed by #9396
Labels
affects-6.1 affects-6.5 affects-7.1 area/ticdc Issues or PRs related to TiCDC. found/automation Bugs found by automation cases severity/critical This is a critical bug. type/bug This is a bug.

Comments

@fubinzh
Copy link

fubinzh commented Jul 5, 2023

What did you do?

  1. TiDB cluster with 3 cdc
  2. Create mysql sink changefeed
  3. Pause changefeed
  4. Run sysbench oltp_update_non_index prepare
sysbench --db-driver=mysql --mysql-host=upstream-tidb.cdc-testbed-tps-1816047-1-21 --mysql-port=4000 --mysql-user=root --mysql-db=workload --tables=100 --table-size=500000 --create_secondary=off --time=3600 --threads=100 oltp_update_non_index prepare
  1. resume changefeed
  2. run oltp_update_non_index run, and at the same time inject randon cdc node network loss for 10s every 30 minutes, totally 4 chaos injection is done in 2 hours.
sysbench --db-driver=mysql --mysql-host=upstream-tidb.cdc-testbed-tps-1816047-1-21 --mysql-port=4000 --mysql-user=root --mysql-db=workload --tables=100 --table-size=500000 --create_secondary=off --time=3600 --threads=100 oltp_update_non_index run
  1. Send finishmark and Wait finishmark sync to downstream
  2. Do data consistency check b/w upstream and downstream.

What did you expect to see?

Data consistency should pass

What did you see instead?

Data inconsistency seen. One table out of 100 tables in total are inconsistent, only 2 rows are different.

image

Versions of the cluster

[2023/07/04 23:09:33.079 +00:00] [INFO] [version.go:47] ["Welcome to Change Data Capture (CDC)"] [release-version=v7.3.0-alpha] [git-hash=41fda209483e6d0d94cd5e09ce784b692b2a614a] [git-branch=heads/refs/tags/v7.3.0-alpha] [utc-build-time="2023-07-04 11:03:24"] [go-version="go version go1.20.5 linux/amd64"] [failpoint-build=false]

@fubinzh fubinzh added area/ticdc Issues or PRs related to TiCDC. type/bug This is a bug. labels Jul 5, 2023
@github-actions github-actions bot added this to Need Triage in Question and Bug Reports Jul 5, 2023
@fubinzh
Copy link
Author

fubinzh commented Jul 5, 2023

@fubinzh
Copy link
Author

fubinzh commented Jul 5, 2023

/found automation
/severity critical

@fubinzh
Copy link
Author

fubinzh commented Jul 10, 2023

This issue also seen when running case tikv_scale_kafka_sync, in which tikv/cdc scale up/in is done while changefeed running:
Steps:

  1. create 2 kafka changefeed one for open protocol, one for canal json
  2. run kafka consumer
  3. run sysbench prepare "sysbench --db-driver=mysql --mysql-host=nslookup upstream-tidb.cdc-testbed-tps-1816831-1-572 | awk -F: '{print $2}' | awk 'NR==5' | sed s/[[:space:]]//g --mysql-port=4000 --mysql-user=root --mysql-db=workload --tables=30 --table-size=1000000 --create_secondary=off --debug=true --threads=20 --mysql-ignore-errors=2013,1213,1105,1205,8022,8027,8028,9004,9007,1062 oltp_write_only prepare"
  4. scale tikv from 3 to 6, and scale cdc from 3 to 1
  5. scale tikv from 6 to 3 nad cdc from 1 to 3 when running sysbench workload "sysbench --db-driver=mysql --mysql-host=nslookup upstream-tidb.cdc-testbed-tps-1816831-1-572 | awk -F: '{print $2}' | awk 'NR==5' | sed s/[[:space:]]//g --mysql-port=4000 --mysql-user=root --mysql-db=workload --tables=30 --table-size=1000000 --create_secondary=off --time=7200 --debug=true --threads=20 --mysql-ignore-errors=2013,1213,1105,1205,8022,8027,8028,9004,9007,1062 oltp_write_only run"
  6. send finishmark, and do data consistency for open protocal when cdc sync is done.

@fubinzh
Copy link
Author

fubinzh commented Jul 17, 2023

This issue also seen during 7.1.1 release testing.

@fubinzh
Copy link
Author

fubinzh commented Jul 17, 2023

/label affects-7.1

@fubinzh
Copy link
Author

fubinzh commented Jul 19, 2023

This issue also seen when running case tikv_scale_kafka_sync, in which tikv/cdc scale up/in is done while changefeed running: Steps:

  1. create 2 kafka changefeed one for open protocol, one for canal json
  2. run kafka consumer
  3. run sysbench prepare "sysbench --db-driver=mysql --mysql-host=nslookup upstream-tidb.cdc-testbed-tps-1816831-1-572 | awk -F: '{print $2}' | awk 'NR==5' | sed s/[[:space:]]//g --mysql-port=4000 --mysql-user=root --mysql-db=workload --tables=30 --table-size=1000000 --create_secondary=off --debug=true --threads=20 --mysql-ignore-errors=2013,1213,1105,1205,8022,8027,8028,9004,9007,1062 oltp_write_only prepare"
  4. scale tikv from 3 to 6, and scale cdc from 3 to 1
  5. scale tikv from 6 to 3 nad cdc from 1 to 3 when running sysbench workload "sysbench --db-driver=mysql --mysql-host=nslookup upstream-tidb.cdc-testbed-tps-1816831-1-572 | awk -F: '{print $2}' | awk 'NR==5' | sed s/[[:space:]]//g --mysql-port=4000 --mysql-user=root --mysql-db=workload --tables=30 --table-size=1000000 --create_secondary=off --time=7200 --debug=true --threads=20 --mysql-ignore-errors=2013,1213,1105,1205,8022,8027,8028,9004,9007,1062 oltp_write_only run"
  6. send finishmark, and do data consistency for open protocal when cdc sync is done.

#9410 created to track this issue, as it has different root cause per analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.1 affects-6.5 affects-7.1 area/ticdc Issues or PRs related to TiCDC. found/automation Bugs found by automation cases severity/critical This is a critical bug. type/bug This is a bug.
Development

Successfully merging a pull request may close this issue.

2 participants