Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add index status was cancelling when inject pd leader network partition #51846

Closed
Lily2025 opened this issue Mar 18, 2024 · 2 comments · Fixed by #52315
Closed

add index status was cancelling when inject pd leader network partition #51846

Lily2025 opened this issue Mar 18, 2024 · 2 comments · Fixed by #52315
Assignees
Labels
affects-8.0 component/ddl This issue is related to DDL of TiDB. severity/major type/bug This issue is a bug.

Comments

@Lily2025
Copy link

Lily2025 commented Mar 18, 2024

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

1、tidb_enable_dist_task='off'
2、run workload
3、add index for one of tables
4、inject pd leader network partition last for 3mins

2. What did you expect to see? (Required)

add index can success

3. What did you see instead (Required)

add index status was cancelling

the status of ddl job is not synced or done or running or queueing (now: 2024-03-17 20:19:07, jobId: 488, job type: add index /* ingest /, state: cancelling)
operatorLogs:
[2024-03-17 20:16:37] ###### start adding index
ALTER TABLE sbtest1 ADD INDEX index_test_1710677797027(c)
[2024-03-17 20:16:37] ###### wait for ddl job finish
[2024-03-17 20:19:07] ###### wait for ddl job to finish failed
select job_id, job_type, state from information_schema.ddl_jobs where query = 'ALTER TABLE sbtest1 ADD INDEX index_test_1710677797027(c)'
jobId: 488, job type: add index /
ingest */, state: cancelling

logs
endless-ha-test-add-index-tps-7531330-1-929-tidb-1.tar.gz
endless-ha-test-add-index-tps-7531330-1-929-tidb-0.tar.gz

[2024/03/17 20:20:07.139 +08:00] [INFO] [checkpoint.go:229] ["close checkpoint manager"] [category=ddl-ingest] [jobID=488] [indexIDs="[37]"]
[2024/03/17 20:20:07.139 +08:00] [INFO] [backend_mgr.go:209] ["close one backend for DDL job"] [category=ddl-ingest] ["job ID"=488] ["current memory usage"=176] ["max memory quota"=8589934592]
[2024/03/17 20:20:07.148 +08:00] [WARN] [index.go:986] ["run reorg job failed, convert job to rollback"] [category=ddl] [job="ID:488, Type:add index, State:rollingback, SchemaState:delete only, SchemaID:350, TableID:245, RowCount:70002201, ArgLen:3, start time: 2024-03-17 20:16:37.029 +0800 CST, Err:[ddl:8201]TiDB server is not a DDL owner, ErrCount:2, SnapshotVersion:448443920489906203, LocalMode: false, UniqueWarnings:0"] [error="[ddl:8214]Cancelled DDL job"]
[2024/03/17 20:20:07.149 +08:00] [INFO] [rollingback.go:547] ["the DDL job is cancelled normally"] [worker="worker 3, tp add index"] [category=ddl] [jobID=488] [conn=3405777252] [job="ID:488, Type:add index, State:rollingback, SchemaState:delete only, SchemaID:350, TableID:245, RowCount:70002201, ArgLen:3, start time: 2024-03-17 20:16:37.029 +0800 CST, Err:[ddl:8201]DDL job rollback, error msg: TiDB server is not a DDL owner, ErrCount:3, SnapshotVersion:448443920489906203, LocalMode: false, UniqueWarnings:0"] [error="[ddl:8214]Cancelled DDL job"]
[2024/03/17 20:20:07.154 +08:00] [INFO] [ddl_worker.go:1381] ["schema version doesn't change"] [category=ddl] [jobID=488]
[2024/03/17 20:20:07.165 +08:00] [INFO] [ddl_worker.go:1170] ["run DDL job"] [worker="worker 3, tp add index"] [category=ddl] [jobID=488] [conn=3405777252] [category=ddl] [job="ID:488, Type:add index, State:rollingback, SchemaState:delete only, SchemaID:350, TableID:245, RowCount:70002201, ArgLen:0, start time: 2024-03-17 20:16:37.029 +0800 CST, Err:[ddl:8201]DDL job rollback, error msg: TiDB server is not a DDL owner, ErrCount:3, SnapshotVersion:448443920489906203, LocalMode: false, UniqueWarnings:0"]
[2024/03/17 20:20:07.196 +08:00] [INFO] [domain.go:278] ["diff load InfoSchema success"] [currentSchemaVersion=406] [neededSchemaVersion=410] ["start time"=7.323718ms] [gotSchemaVersion=410] [phyTblIDs="[245,245]"] [actionTypes="[7,7]"] [diffTypes="["add index","add index"]"]
[2024/03/17 20:20:07.221 +08:00] [INFO] [domain.go:883] ["mdl gets lock, update self version to owner"] [jobID=488] [version=410]
[2024/03/17 20:20:07.236 +08:00] [INFO] [ddl_worker.go:1418] ["wait latest schema version changed(get the metadata lock if tidb_enable_metadata_lock is true)"] [category=ddl] [ver=410] ["take time"=54.241791ms] [job="ID:488, Type:add index, State:rollingback, SchemaState:delete reorganization, SchemaID:350, TableID:245, RowCount:70002201, ArgLen:2, start time: 2024-03-17 20:16:37.029 +0800 CST, Err:[ddl:8201]DDL job rollback, error msg: TiDB server is not a DDL owner, ErrCount:3, SnapshotVersion:448443920489906203, LocalMode: false, UniqueWarnings:0"]
[2024/03/17 20:20:07.254 +08:00] [INFO] [ddl_worker.go:1170] ["run DDL job"] [worker="worker 3, tp add index"] [category=ddl] [jobID=488] [conn=3405777252] [category=ddl] [job="ID:488, Type:add index, State:rollingback, SchemaState:delete reorganization, SchemaID:350, TableID:245, RowCount:70002201, ArgLen:0, start time: 2024-03-17 20:16:37.029 +0800 CST, Err:[ddl:8201]DDL job rollback, error msg: TiDB server is not a DDL owner, ErrCount:3, SnapshotVersion:448443920489906203, LocalMode: false, UniqueWarnings:0"]
[2024/03/17 20:20:07.276 +08:00] [INFO] [domain.go:278] ["diff load InfoSchema success"] [currentSchemaVersion=410] [neededSchemaVersion=411] ["start time"=3.084732ms] [gotSchemaVersion=411] [phyTblIDs="[245]"] [actionTypes="[7]"] [diffTypes="["add index"]"]
[2024/03/17 20:20:07.321 +08:00] [INFO] [domain.go:883] ["mdl gets lock, update self version to owner"] [jobID=488] [version=411]
[2024/03/17 20:20:07.322 +08:00] [INFO] [syncer.go:352] ["syncer check all versions, someone is not synced"] [category=ddl] [info="instance ip tc-tidb-1.tc-tidb-peer.endless-ha-test-add-index-tps-7531330-1-929.svc, port 4000, id 21cc742d-548b-4f94-85ed-1d242bcdd88f"] ["ddl job id"=488] [ver=411]
[2024/03/17 20:20:07.344 +08:00] [INFO] [ddl_worker.go:1418] ["wait latest schema version changed(get the metadata lock if tidb_enable_metadata_lock is true)"] [category=ddl] [ver=411] ["take time"=76.167243ms] [job="ID:488, Type:add index, State:rollback done, SchemaState:none, SchemaID:350, TableID:245, RowCount:70002201, ArgLen:2, start time: 2024-03-17 20:16:37.029 +0800 CST, Err:[ddl:8201]DDL job rollback, error msg: TiDB server is not a DDL owner, ErrCount:3, SnapshotVersion:448443920489906203, LocalMode: false, UniqueWarnings:0"]
[2024/03/17 20:20:07.361 +08:00] [INFO] [delete_range.go:423] ["insert into delete-range indices"] [category=ddl] [jobID=488] [tableID=245] [indexIDs="[37,9223090561878065189]"] [comment="add index: physical table ID(s)"]
[2024/03/17 20:20:07.366 +08:00] [INFO] [delete_range.go:113] ["add job into delete-range table"] [category=ddl] [jobID=488] [jobType="add index"]
[2024/03/17 20:20:07.367 +08:00] [INFO] [ddl_worker.go:734] ["finish DDL job"] [worker="worker 3, tp add index"] [category=ddl] [jobID=488] [conn=3405777252] [job="ID:488, Type:add index, State:rollback done, SchemaState:none, SchemaID:350, TableID:245, RowCount:70002201, ArgLen:2, start time: 2024-03-17 20:16:37.029 +0800 CST, Err:[ddl:8201]DDL job rollback, error msg: TiDB server is not a DDL owner, ErrCount:3, SnapshotVersion:448443920489906203, LocalMode: false, UniqueWarnings:0"]
[2024/03/17 20:20:07.377 +08:00] [INFO] [ddl_worker.go:1381] ["schema version doesn't change"] [category=ddl] [jobID=488]
[2024/03/17 20:20:07.379 +08:00] [INFO] [ddl.go:1296] ["DDL job is failed"] [category=ddl] [jobID=488]

4. What is your TiDB version? (Required)

./tidb-server -V
Release Version: v8.0.0-alpha
Edition: Community
Git Commit Hash: e158c21
Git Branch: heads/refs/tags/v8.0.0-alpha
UTC Build Time: 2024-03-16 11:43:20
GoVersion: go1.21.6
Race Enabled: false
Check Table Before Drop: false
Store: unistore
2024-03-17T20:16:30.066+0800

@Lily2025 Lily2025 added the type/bug This issue is a bug. label Mar 18, 2024
@Lily2025
Copy link
Author

/assign ywqzzy
/type bug
/severity major

@ywqzzy
Copy link
Contributor

ywqzzy commented Mar 18, 2024

[2024/03/17 20:18:58.498 +08:00] [ERROR] [misc.go:113] ["panic in the recoverable goroutine"] [label=ddl-worker] [funcInfo="worker 2, tp add index runDDLJob"] [r="runtime error: invalid memory address or nil pointer dereference"] [stack="[github.com/pingcap/tidb/pkg/util.Recover\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/util/misc.go:117\nruntime.gopanic\n\t/usr/local/go/src/runtime/panic.go:914\nruntime.panicmem\n\t/usr/local/go/src/runtime/panic.go:261\nruntime.sigpanic\n\t/usr/local/go/src/runtime/signal_unix.go:861\ngithub.com/pingcap/tidb/pkg/ddl.(*worker).mergeWarningsIntoJob\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/reorg.go:323\ngithub.com/pingcap/tidb/pkg/ddl.(*worker).runReorgJob\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/reorg.go:260\ngithub.com/pingcap/tidb/pkg/ddl.runReorgJobAndHandleErr\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/index.go:1083\ngithub.com/pingcap/tidb/pkg/ddl.runIngestReorgJob\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/index.go:983\ngithub.com/pingcap/tidb/pkg/ddl.doReorgWorkForCreateIndex\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/index.go:913\ngithub.com/pingcap/tidb/pkg/ddl.(*worker).onCreateIndex\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/index.go:724\ngithub.com/pingcap/tidb/pkg/ddl.(*worker).runDDLJob\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/ddl_worker.go:1241\ngithub.com/pingcap/tidb/pkg/ddl.(*worker).HandleDDLJobTable\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/ddl_worker.go:931\ngithub.com/pingcap/tidb/pkg/ddl.(*ddl).delivery2Worker.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/job_table.go:462\ngithub.com/pingcap/tidb/pkg/util.(*WaitGroupWrapper).Run.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/util/wait_group_wrapper.go:157](http://github.com/pingcap/tidb/pkg/util.Recover/n/t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/util/misc.go:117/nruntime.gopanic/n/t/usr/local/go/src/runtime/panic.go:914/nruntime.panicmem/n/t/usr/local/go/src/runtime/panic.go:261/nruntime.sigpanic/n/t/usr/local/go/src/runtime/signal_unix.go:861/ngithub.com/pingcap/tidb/pkg/ddl.(*worker).mergeWarningsIntoJob/n/t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/reorg.go:323/ngithub.com/pingcap/tidb/pkg/ddl.(*worker).runReorgJob/n/t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/reorg.go:260/ngithub.com/pingcap/tidb/pkg/ddl.runReorgJobAndHandleErr/n/t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/index.go:1083/ngithub.com/pingcap/tidb/pkg/ddl.runIngestReorgJob/n/t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/index.go:983/ngithub.com/pingcap/tidb/pkg/ddl.doReorgWorkForCreateIndex/n/t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/index.go:913/ngithub.com/pingcap/tidb/pkg/ddl.(*worker).onCreateIndex/n/t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/index.go:724/ngithub.com/pingcap/tidb/pkg/ddl.(*worker).runDDLJob/n/t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/ddl_worker.go:1241/ngithub.com/pingcap/tidb/pkg/ddl.(*worker).HandleDDLJobTable/n/t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/ddl_worker.go:931/ngithub.com/pingcap/tidb/pkg/ddl.(*ddl).delivery2Worker.func1/n/t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/job_table.go:462/ngithub.com/pingcap/tidb/pkg/util.(*WaitGroupWrapper).Run.func1/n/t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/util/wait_group_wrapper.go:157)"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-8.0 component/ddl This issue is related to DDL of TiDB. severity/major type/bug This issue is a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants