add index status is rollback done after inject network partition between ddl owner and one of tikv #50417
Labels
affects-7.6
component/ddl
This issue is related to DDL of TiDB.
severity/major
type/bug
The issue is confirmed as a bug.
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
1、tidb_enable_dist_task='on' and enable global sort
2、add index
3、inject network partition between ddl owner and one of tikv last for 3mins and recover
2. What did you expect to see? (Required)
after fault recover,add index can success
3. What did you see instead (Required)
after fault recover,add index status is rollback done
the status of ddl job is not synced or done or running or queueing (now: 2024-01-13 19:21:28, jobId: 491, job type: add index /* ingest cloud /, state: rollback done)
operatorLogs:
[2024-01-13 19:03:00] ###### start adding index
alter table sbtest1 add index index_test_1705143780092 (c)
[2024-01-13 19:03:00] ###### wait for ddl job finish
[2024-01-13 19:21:28] ###### wait for ddl job to finish failed
select job_id, job_type, state from information_schema.ddl_jobs where query = 'alter table sbtest1 add index index_test_1705143780092 (c)'
jobId: 491, job type: add index / ingest cloud */, state: rollback done
4. What is your TiDB version? (Required)
./tidb-server -V
Release Version: v7.6.0
Edition: Community
Git Commit Hash: 2df8bd1
Git Branch: heads/refs/tags/v7.6.0
UTC Build Time: 2024-01-12 02:51:35
GoVersion: go1.21.5
Race Enabled: false
Check Table Before Drop: false
Store: unistore
2024-01-13T18:46:17.819+0800
tidb logs:
[2024/01/13 19:21:09.480 +08:00] [INFO] [handle.go:186] ["task not resumable"] [taskKey=ddl/backfill/491]
[2024/01/13 19:21:09.782 +08:00] [ERROR] [handle.go:92] ["task reverted"] [task-id=210029] [error="[tikv:9005]Region is unavailable"]
[2024/01/13 19:21:09.783 +08:00] [WARN] [index.go:2203] ["cannot get task"] [category=ddl] [task_key=ddl/backfill/491] [error="task not found"]
[2024/01/13 19:21:09.784 +08:00] [WARN] [reorg.go:232] ["run reorg job done"] [category=ddl] ["handled rows"=35232986] [error="[tikv:9005]Region is unavailable"]
[2024/01/13 19:21:09.790 +08:00] [WARN] [ddl_worker.go:1118] ["run DDL job error"] [worker="worker 2, tp add index"] [category=ddl] [jobID=491] [conn=710937990] [error="[tikv:9005]Region is unavailable"]
[2024/01/13 19:21:09.797 +08:00] [INFO] [ddl_worker.go:982] ["run DDL job failed, sleeps a while then retries it."] [worker="worker 2, tp add index"] [category=ddl] [jobID=491] [conn=710937990] [waitTime=1s] [error="[tikv:9005]Region is unavailable"]
[2024/01/13 19:21:10.797 +08:00] [INFO] [ddl_worker.go:1367] ["schema version doesn't change"] [category=ddl]
[2024/01/13 19:21:10.807 +08:00] [INFO] [ddl_worker.go:1156] ["run DDL job"] [worker="worker 3, tp add index"] [category=ddl] [jobID=491] [conn=710937990] [category=ddl] [job="ID:491, Type:add index, State:running, SchemaState:write reorganization, SchemaID:104, TableID:245, RowCount:35232986, ArgLen:0, start time: 2024-01-13 19:03:00.09 +0800 CST, Err:[tikv:9005]Region is unavailable, ErrCount:510, SnapshotVersion:446993211166556275, LocalMode: false, UniqueWarnings:0"]
[2024/01/13 19:21:10.808 +08:00] [INFO] [index.go:888] ["index backfill state running"] [category=ddl] ["job ID"=491] [table=sbtest1] ["ingest mode"=true] [index=index_test_1705143780092]
[2024/01/13 19:21:10.813 +08:00] [INFO] [handle.go:186] ["task not resumable"] [taskKey=ddl/backfill/491]
[2024/01/13 19:21:11.115 +08:00] [ERROR] [handle.go:92] ["task reverted"] [task-id=210029] [error="[tikv:9005]Region is unavailable"]
[2024/01/13 19:21:11.116 +08:00] [WARN] [index.go:2203] ["cannot get task"] [category=ddl] [task_key=ddl/backfill/491] [error="task not found"]
[2024/01/13 19:21:11.116 +08:00] [WARN] [reorg.go:232] ["run reorg job done"] [category=ddl] ["handled rows"=35232986] [error="[tikv:9005]Region is unavailable"]
[2024/01/13 19:21:11.122 +08:00] [WARN] [ddl_worker.go:1118] ["run DDL job error"] [worker="worker 3, tp add index"] [category=ddl] [jobID=491] [conn=710937990] [error="[tikv:9005]Region is unavailable"]
[2024/01/13 19:21:11.129 +08:00] [INFO] [ddl_worker.go:1367] ["schema version doesn't change"] [category=ddl]
[2024/01/13 19:21:11.139 +08:00] [INFO] [ddl_worker.go:1156] ["run DDL job"] [worker="worker 2, tp add index"] [category=ddl] [jobID=491] [conn=710937990] [category=ddl] [job="ID:491, Type:add index, State:running, SchemaState:write reorganization, SchemaID:104, TableID:245, RowCount:35232986, ArgLen:0, start time: 2024-01-13 19:03:00.09 +0800 CST, Err:[tikv:9005]Region is unavailable, ErrCount:511, SnapshotVersion:446993211166556275, LocalMode: false, UniqueWarnings:0"]
[2024/01/13 19:21:11.140 +08:00] [INFO] [index.go:888] ["index backfill state running"] [category=ddl] ["job ID"=491] [table=sbtest1] ["ingest mode"=true] [index=index_test_1705143780092]
[2024/01/13 19:21:11.146 +08:00] [INFO] [handle.go:186] ["task not resumable"] [taskKey=ddl/backfill/491]
[2024/01/13 19:21:11.448 +08:00] [ERROR] [handle.go:92] ["task reverted"] [task-id=210029] [error="[tikv:9005]Region is unavailable"]
[2024/01/13 19:21:11.450 +08:00] [WARN] [index.go:2203] ["cannot get task"] [category=ddl] [task_key=ddl/backfill/491] [error="task not found"]
[2024/01/13 19:21:11.450 +08:00] [WARN] [reorg.go:232] ["run reorg job done"] [category=ddl] ["handled rows"=35232986] [error="[tikv:9005]Region is unavailable"]
[2024/01/13 19:21:11.461 +08:00] [WARN] [index.go:1087] ["run add index job failed, convert job to rollback"] [category=ddl] [job="ID:491, Type:add index, State:running, SchemaState:write reorganization, SchemaID:104, TableID:245, RowCount:35232986, ArgLen:6, start time: 2024-01-13 19:03:00.09 +0800 CST, Err:[tikv:9005]Region is unavailable, ErrCount:511, SnapshotVersion:446993211166556275, LocalMode: false, UniqueWarnings:0"] [error="[tikv:9005]Region is unavailable"]
[2024/01/13 19:21:11.501 +08:00] [WARN] [ddl_worker.go:1118] ["run DDL job error"] [worker="worker 2, tp add index"] [category=ddl] [jobID=491] [conn=710937990] [error="[tikv:9005]Region is unavailable"]
[2024/01/13 19:21:11.511 +08:00] [INFO] [domain.go:272] ["diff load InfoSchema success"] [currentSchemaVersion=436] [neededSchemaVersion=437] ["start time"=1.220424ms] [gotSchemaVersion=437] [phyTblIDs="[245]"] [actionTypes="[7]"] [diffTypes="["add index"]"]
[2024/01/13 19:21:11.546 +08:00] [INFO] [domain.go:873] ["mdl gets lock, update to owner"] [jobID=491] [version=437]
tidb-0.tar.gz
tidb-1.tar.gz
The text was updated successfully, but these errors were encountered: