Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[global sort] add index using global sort can not finished with backend context not found error #49770

Closed
aytrack opened this issue Dec 25, 2023 · 0 comments · Fixed by #49866
Assignees
Labels
affects-7.5 component/ddl This issue is related to DDL of TiDB. severity/critical type/bug This issue is a bug.

Comments

@aytrack
Copy link
Contributor

aytrack commented Dec 25, 2023

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

  1. deploy a cluster and set global sort url
  2. add index (maybe need many times to reproduce this)

2. What did you expect to see? (Required)

add index success

3. What did you see instead (Required)

for normal, it just need about 10s, but this time it's already running for 16min.

[19:34:40]TiDB root:(none)> admin show ddl jobs;
+--------+---------+------------+---------------------------------+----------------------+-----------+----------+-----------+---------------------+---------------------+---------------------+---------+
| JOB_ID | DB_NAME | TABLE_NAME | JOB_TYPE                        | SCHEMA_STATE         | SCHEMA_ID | TABLE_ID | ROW_COUNT | CREATE_TIME         | START_TIME          | END_TIME            | STATE   |
+--------+---------+------------+---------------------------------+----------------------+-----------+----------+-----------+---------------------+---------------------+---------------------+---------+
| 125    | test    | xxx  | add index /* ingest cloud */    | write reorganization | 2         | 123      | 0         | 2023-12-25 11:18:48 | 2023-12-25 11:18:48 | <null>              | running |
[2023/12/25 11:18:48.734 +00:00] [INFO] [ddl_worker.go:253] ["add DDL jobs"] [category=ddl] ["batch count"=1] [jobs="ID:125, Type:add index, State:queueing, SchemaState:none, SchemaID:2, TableID:123, RowCount:0, ArgLen:6, start time: 2023-12-25 11:18:48.709 +0000 UTC, Err
:<nil>, ErrCount:0, SnapshotVersion:0, UniqueWarnings:0; "] [table=true]
[2023/12/25 11:18:48.734 +00:00] [INFO] [ddl.go:1068] ["start DDL job"] [category=ddl] [job="ID:125, Type:add index, State:queueing, SchemaState:none, SchemaID:2, TableID:123, RowCount:0, ArgLen:6, start time: 2023-12-25 11:18:48.709 +0000 UTC, Err:<nil>, ErrCount:0, Snap
shotVersion:0, UniqueWarnings:0"] [query="alter table xxx add index p_id(p_id);"]
[2023/12/25 11:18:48.747 +00:00] [INFO] [ddl_worker.go:1000] ["run DDL job"] [worker="worker 2, tp add index"] [category=ddl] [jobID=125] [conn=3403678664] [category=ddl] [job="ID:125, Type:add index, State:queueing, SchemaState:none, SchemaID:2, TableID:123, RowCount:0,
ArgLen:0, start time: 2023-12-25 11:18:48.709 +0000 UTC, Err:<nil>, ErrCount:0, SnapshotVersion:0, UniqueWarnings:0"]
[2023/12/25 11:18:48.750 +00:00] [INFO] [index.go:629] ["run add index job"] [category=ddl] [job="ID:125, Type:add index, State:running, SchemaState:none, SchemaID:2, TableID:123, RowCount:0, ArgLen:6, start time: 2023-12-25 11:18:48.709 +0000 UTC, Err:<nil>, ErrCount:0,
SnapshotVersion:0, UniqueWarnings:0"] [indexInfo="{\"id\":15,\"idx_name\":{\"O\":\"p_id\",\"L\":\"p_id\"},\"tbl_name\":{\"O\":\"\",\"L\":\"\"},\"idx_cols\":[{\"name\":{\"O\":\"p_id\",\"L\":\"p_id\"},\"offset\":3,\"length\":-1}],\"state\":0,\"backfi
ll_state\":0,\"comment\":\"\",\"index_type\":1,\"is_unique\":false,\"is_primary\":false,\"is_invisible\":false,\"is_global\":false,\"mv_index\":false}"]
[2023/12/25 11:18:48.765 +00:00] [INFO] [domain.go:272] ["diff load InfoSchema success"] [currentSchemaVersion=106] [neededSchemaVersion=107] ["start time"=2.62694ms] [gotSchemaVersion=107] [phyTblIDs="[123]"] [actionTypes="[7]"] [diffTypes="[\"add index\"]"]
[2023/12/25 11:18:48.768 +00:00] [INFO] [domain.go:873] ["mdl gets lock, update to owner"] [jobID=125] [version=107]
[2023/12/25 11:18:48.812 +00:00] [INFO] [ddl_worker.go:1234] ["wait latest schema version changed(get the metadata lock if tidb_enable_metadata_lock is true)"] [category=ddl] [ver=107] ["take time"=52.29732ms] [job="ID:125, Type:add index, State:running, SchemaState:delet
e only, SchemaID:2, TableID:123, RowCount:0, ArgLen:6, start time: 2023-12-25 11:18:48.709 +0000 UTC, Err:<nil>, ErrCount:0, SnapshotVersion:0, UniqueWarnings:0"]
[2023/12/25 11:18:48.826 +00:00] [INFO] [ddl_worker.go:1000] ["run DDL job"] [worker="worker 2, tp add index"] [category=ddl] [jobID=125] [conn=3403678664] [category=ddl] [job="ID:125, Type:add index, State:running, SchemaState:delete only, SchemaID:2, TableID:123, RowCou
nt:0, ArgLen:0, start time: 2023-12-25 11:18:48.709 +0000 UTC, Err:<nil>, ErrCount:0, SnapshotVersion:0, UniqueWarnings:0"]
[2023/12/25 11:18:48.843 +00:00] [INFO] [domain.go:272] ["diff load InfoSchema success"] [currentSchemaVersion=107] [neededSchemaVersion=108] ["start time"=2.674613ms] [gotSchemaVersion=108] [phyTblIDs="[123]"] [actionTypes="[7]"] [diffTypes="[\"add index\"]"]
[2023/12/25 11:18:48.846 +00:00] [INFO] [domain.go:873] ["mdl gets lock, update to owner"] [jobID=125] [version=108]
[2023/12/25 11:18:48.891 +00:00] [INFO] [ddl_worker.go:1234] ["wait latest schema version changed(get the metadata lock if tidb_enable_metadata_lock is true)"] [category=ddl] [ver=108] ["take time"=52.297209ms] [job="ID:125, Type:add index, State:running, SchemaState:writ
e only, SchemaID:2, TableID:123, RowCount:0, ArgLen:6, start time: 2023-12-25 11:18:48.709 +0000 UTC, Err:<nil>, ErrCount:0, SnapshotVersion:0, UniqueWarnings:0"]
[2023/12/25 11:18:48.905 +00:00] [INFO] [ddl_worker.go:1000] ["run DDL job"] [worker="worker 2, tp add index"] [category=ddl] [jobID=125] [conn=3403678664] [category=ddl] [job="ID:125, Type:add index, State:running, SchemaState:write only, SchemaID:2, TableID:123, RowCoun
t:0, ArgLen:0, start time: 2023-12-25 11:18:48.709 +0000 UTC, Err:<nil>, ErrCount:0, SnapshotVersion:0, UniqueWarnings:0"]
[2023/12/25 11:18:48.924 +00:00] [INFO] [domain.go:272] ["diff load InfoSchema success"] [currentSchemaVersion=108] [neededSchemaVersion=109] ["start time"=2.432188ms] [gotSchemaVersion=109] [phyTblIDs="[123]"] [actionTypes="[7]"] [diffTypes="[\"add index\"]"]
[2023/12/25 11:18:48.926 +00:00] [INFO] [domain.go:873] ["mdl gets lock, update to owner"] [jobID=125] [version=109]
[2023/12/25 11:18:48.972 +00:00] [INFO] [ddl_worker.go:1234] ["wait latest schema version changed(get the metadata lock if tidb_enable_metadata_lock is true)"] [category=ddl] [ver=109] ["take time"=53.344865ms] [job="ID:125, Type:add index, State:running, SchemaState:writ
e reorganization, SchemaID:2, TableID:123, RowCount:0, ArgLen:6, start time: 2023-12-25 11:18:48.709 +0000 UTC, Err:<nil>, ErrCount:0, SnapshotVersion:0, UniqueWarnings:0"]
[2023/12/25 11:18:48.984 +00:00] [INFO] [ddl_worker.go:1000] ["run DDL job"] [worker="worker 2, tp add index"] [category=ddl] [jobID=125] [conn=3403678664] [category=ddl] [job="ID:125, Type:add index, State:running, SchemaState:write reorganization, SchemaID:2, TableID:12
3, RowCount:0, ArgLen:0, start time: 2023-12-25 11:18:48.709 +0000 UTC, Err:<nil>, ErrCount:0, SnapshotVersion:0, UniqueWarnings:0"]
[2023/12/25 11:18:48.986 +00:00] [INFO] [index.go:888] ["index backfill state running"] [category=ddl] ["job ID"=125] [table=xxx] ["ingest mode"=true] [index=p_id]
[2023/12/25 11:18:48.988 +00:00] [INFO] [region_request.go:1532] ["send request meet region error without retry"] [req-ts=446563124237828102] [req-type=Scan] [region="{ region id: 24, ver: 103, confVer: 1 }"] [replica-read-type=leader] [stale-read=false] [request-sender="
{rpcError:<nil>,replicaSelector: replicaSelector{selectorStateStr: accessKnownLeader, cacheRegionIsValid: false, replicaStatus: [peer: 25, store: 1, isEpochStale: false, attempts: 1, replica-epoch: 0, store-epoch: 0, store-state: resolved, store-liveness-state: reachable]
}}"] [retry-times=0] [total-backoff-ms=0] [total-backoff-times=0] [max-exec-timeout-ms=60000] [total-region-errors=25-epoch_not_match:1]
[2023/12/25 11:18:48.989 +00:00] [INFO] [reorg.go:635] ["get noop table range"] [category=ddl] [table="&{123 xxx utf8mb4 utf8mb4_bin [0xc013aa8580 0xc013aa86e0 0xc013aa8840 0xc013aa89a0 0xc013aa8b00 0xc013aa8c60 0xc013aa8dc0 0xc013aa8f20 0xc013aa9080 0xc013aa91e0 0x
c013aa9340 0xc013aa94a0 0xc013aa9600 0xc013aa9760 0xc013aa98c0 0xc013aa9a20 0xc013aa9b80 0xc013aa9ce0 0xc013aa9e40 0xc014012000 0xc014012160 0xc0140122c0 0xc014012420 0xc014012580 0xc0140126e0] [0xc013aff4d0 0xc013aff710 0xc013aff950 0xc013affb90 0xc013affdd0 0xc014014090
 0xc0140142d0 0xc014014510 0xc014014750 0xc014014990 0xc014014bd0 0xc014014e10 0xc014015050 0xc014015320 0xc0140155f0] [] [] public false true 1  0 0 0 0 25 15 0 0 446563124211875854 0 0 0 0 0 0 <nil>  <nil> <nil> <nil> 5 <nil> false  disable <nil> <nil> <nil> <nil>}"] ["
table/partition ID"=123] ["start key"=] ["end key"=] ["is empty table"=true]
[2023/12/25 11:18:48.989 +00:00] [INFO] [reorg.go:708] ["job get table range"] [category=ddl] [jobID=125] [physicalTableID=123] [startKey=] [endKey=]
[2023/12/25 11:18:48.998 +00:00] [INFO] [ddl_worker.go:1211] ["schema version doesn't change"] [category=ddl]
[2023/12/25 11:18:49.008 +00:00] [INFO] [ddl_worker.go:1000] ["run DDL job"] [worker="worker 2, tp add index"] [category=ddl] [jobID=125] [conn=3403678664] [category=ddl] [job="ID:125, Type:add index, State:running, SchemaState:write reorganization, SchemaID:2, TableID:12
3, RowCount:0, ArgLen:0, start time: 2023-12-25 11:18:48.709 +0000 UTC, Err:<nil>, ErrCount:0, SnapshotVersion:446563124237828102, UniqueWarnings:0"]
[2023/12/25 11:18:49.011 +00:00] [INFO] [index.go:888] ["index backfill state running"] [category=ddl] ["job ID"=125] [table=xxx] ["ingest mode"=true] [index=p_id]
[2023/12/25 11:18:49.017 +00:00] [INFO] [index.go:2121] ["adjusted add-index task concurrency"] [worker-cnt=4] [task-concurrency=4] [task-key=ddl/backfill/125]
[2023/12/25 11:18:51.884 +00:00] [INFO] [scheduler.go:127] ["execute one task"] [task-id=10] [task-type=backfill] [state=pending] [concurrency=4]
[2023/12/25 11:18:52.386 +00:00] [INFO] [scheduler.go:531] [onNextStage] [task-id=10] [task-type=backfill] [current-step=-1] [next-step=1]
[2023/12/25 11:18:52.386 +00:00] [INFO] [scheduler.go:558] ["eligible instances"] [task-id=10] [task-type=backfill] [num=1]
[2023/12/25 11:18:52.389 +00:00] [INFO] [backfilling_dist_scheduler.go:90] ["on next subtasks batch"] [type=backfill] [task-id=10] [curr-step=init] [next-step=read-index]
[2023/12/25 11:18:52.391 +00:00] [INFO] [reorg.go:635] ["get noop table range"] [category=ddl] [table="&{123 xxx utf8mb4 utf8mb4_bin [0xc0161086e0 0xc016108840 0xc0161089a0 0xc016108b00 0xc016108c60 0xc016108dc0 0xc016108f20 0xc016109080 0xc0161091e0 0xc016109340 0x
c0161094a0 0xc016109600 0xc016109760 0xc0161098c0 0xc016109a20 0xc016109b80 0xc016109ce0 0xc016109e40 0xc01611e000 0xc01611e160 0xc01611e2c0 0xc01611e420 0xc01611e580 0xc01611e6e0 0xc01611e840] [0xc016122240 0xc016122480 0xc0161226c0 0xc016122900 0xc016122b40 0xc016122d80
 0xc016122fc0 0xc016123200 0xc016123440 0xc016123680 0xc0161238c0 0xc016123b00 0xc016123d40 0xc016124090 0xc016124360] [] [] public false true 1  0 0 0 0 25 15 0 0 446563124211875854 0 0 0 0 0 0 <nil>  <nil> <nil> <nil> 5 <nil> false  disable <nil> <nil> <nil> <nil>}"] ["
table/partition ID"=123] ["start key"=] ["end key"=] ["is empty table"=true]
[2023/12/25 11:18:52.391 +00:00] [INFO] [scheduler.go:591] ["schedule subtasks"] [task-id=10] [task-type=backfill] [state=pending] [step=-1] [concurrency=4] [subtasks=0]
[2023/12/25 11:18:52.892 +00:00] [INFO] [scheduler.go:531] [onNextStage] [task-id=10] [task-type=backfill] [current-step=1] [next-step=2]
[2023/12/25 11:18:52.892 +00:00] [INFO] [scheduler.go:558] ["eligible instances"] [task-id=10] [task-type=backfill] [num=1]
[2023/12/25 11:18:52.895 +00:00] [INFO] [backfilling_dist_scheduler.go:90] ["on next subtasks batch"] [type=backfill] [task-id=10] [curr-step=read-index] [next-step=merge-sort]
[2023/12/25 11:18:52.897 +00:00] [INFO] [backfilling_dist_scheduler.go:427] ["skip merge sort"] [type=backfill] [task-id=10] [curr-step=read-index] [next-step=merge-sort]
[2023/12/25 11:18:52.897 +00:00] [INFO] [scheduler.go:591] ["schedule subtasks"] [task-id=10] [task-type=backfill] [state=running] [step=1] [concurrency=4] [subtasks=0]
[2023/12/25 11:18:53.391 +00:00] [INFO] [scheduler.go:531] [onNextStage] [task-id=10] [task-type=backfill] [current-step=2] [next-step=3]
[2023/12/25 11:18:53.391 +00:00] [INFO] [scheduler.go:558] ["eligible instances"] [task-id=10] [task-type=backfill] [num=1]
[2023/12/25 11:18:53.394 +00:00] [INFO] [backfilling_dist_scheduler.go:90] ["on next subtasks batch"] [type=backfill] [task-id=10] [curr-step=merge-sort] [next-step=write&ingest]
[2023/12/25 11:18:53.397 +00:00] [INFO] [s3.go:407] ["succeed to get bucket region from s3"] ["bucket region"=Beijing]
[2023/12/25 11:18:53.397 +00:00] [WARN] [scheduler.go:565] ["generate part of subtasks failed"] [task-id=10] [task-type=backfill] [error="backend context not found"]
[2023/12/25 11:18:53.397 +00:00] [WARN] [scheduler.go:640] ["generate plan failed"] [task-id=10] [task-type=backfill] [error="backend context not found"] [state=running]
[2023/12/25 11:18:53.397 +00:00] [INFO] [scheduler.go:219] ["schedule task meet err, reschedule it"] [task-id=10] [task-type=backfill] [error="backend context not found"]
[2023/12/25 11:18:53.891 +00:00] [INFO] [scheduler.go:531] [onNextStage] [task-id=10] [task-type=backfill] [current-step=2] [next-step=3]
[2023/12/25 11:18:53.891 +00:00] [INFO] [scheduler.go:558] ["eligible instances"] [task-id=10] [task-type=backfill] [num=1]
[2023/12/25 11:18:53.894 +00:00] [INFO] [backfilling_dist_scheduler.go:90] ["on next subtasks batch"] [type=backfill] [task-id=10] [curr-step=merge-sort] [next-step=write&ingest]
[2023/12/25 11:18:53.898 +00:00] [INFO] [s3.go:407] ["succeed to get bucket region from s3"] ["bucket region"=Beijing]
[2023/12/25 11:18:53.898 +00:00] [WARN] [scheduler.go:565] ["generate part of subtasks failed"] [task-id=10] [task-type=backfill] [error="backend context not found"]
[2023/12/25 11:18:53.898 +00:00] [WARN] [scheduler.go:640] ["generate plan failed"] [task-id=10] [task-type=backfill] [error="backend context not found"] [state=running]
[2023/12/25 11:18:53.898 +00:00] [INFO] [scheduler.go:219] ["schedule task meet err, reschedule it"] [task-id=10] [task-type=backfill] [error="backend context not found"]
[2023/12/25 11:18:54.015 +00:00] [INFO] [reorg.go:264] ["run reorg job wait timeout"] [category=ddl] ["wait time"=5s] ["total added row count"=0]
[2023/12/25 11:18:54.017 +00:00] [INFO] [ddl_worker.go:1211] ["schema version doesn't change"] [category=ddl]
[2023/12/25 11:18:54.026 +00:00] [INFO] [ddl_worker.go:1000] ["run DDL job"] [worker="worker 2, tp add index"] [category=ddl] [jobID=125] [conn=3403678664] [category=ddl] [job="ID:125, Type:add index, State:running, SchemaState:write reorganization, SchemaID:2, TableID:12
3, RowCount:0, ArgLen:0, start time: 2023-12-25 11:18:48.709 +0000 UTC, Err:<nil>, ErrCount:0, SnapshotVersion:446563124237828102, UniqueWarnings:0"]
[2023/12/25 11:18:54.028 +00:00] [INFO] [index.go:888] ["index backfill state running"] [category=ddl] ["job ID"=125] [table=xxx] ["ingest mode"=true] [index=p_id]
[2023/12/25 11:18:54.390 +00:00] [INFO] [scheduler.go:531] [onNextStage] [task-id=10] [task-type=backfill] [current-step=2] [next-step=3]
[2023/12/25 11:18:54.390 +00:00] [INFO] [scheduler.go:558] ["eligible instances"] [task-id=10] [task-type=backfill] [num=1]
[2023/12/25 11:18:54.394 +00:00] [INFO] [backfilling_dist_scheduler.go:90] ["on next subtasks batch"] [type=backfill] [task-id=10] [curr-step=merge-sort] [next-step=write&ingest]
[2023/12/25 11:18:54.397 +00:00] [INFO] [s3.go:407] ["succeed to get bucket region from s3"] ["bucket region"=Beijing]
[2023/12/25 11:18:54.397 +00:00] [WARN] [scheduler.go:565] ["generate part of subtasks failed"] [task-id=10] [task-type=backfill] [error="backend context not found"]
[2023/12/25 11:18:54.397 +00:00] [WARN] [scheduler.go:640] ["generate plan failed"] [task-id=10] [task-type=backfill] [error="backend context not found"] [state=running]
[2023/12/25 11:18:54.397 +00:00] [INFO] [scheduler.go:219] ["schedule task meet err, reschedule it"] [task-id=10] [task-type=backfill] [error="backend context not found"]
[2023/12/25 11:18:54.891 +00:00] [INFO] [scheduler.go:531] [onNextStage] [task-id=10] [task-type=backfill] [current-step=2] [next-step=3]
[2023/12/25 11:18:54.892 +00:00] [INFO] [scheduler.go:558] ["eligible instances"] [task-id=10] [task-type=backfill] [num=1]

4. What is your TiDB version? (Required)

cb7d2b7

@aytrack aytrack added type/bug This issue is a bug. severity/critical component/ddl This issue is related to DDL of TiDB. labels Dec 25, 2023
@aytrack aytrack changed the title [global sort] add index using global sort can not finished [global sort] add index using global sort can not finished with backend context not found error Dec 25, 2023
@tangenta tangenta self-assigned this Dec 28, 2023
@jebter jebter removed the affects-7.6 label Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-7.5 component/ddl This issue is related to DDL of TiDB. severity/critical type/bug This issue is a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants