Bug Report
When add-index runs via the DDL distributed framework (DXF) with ingest, the backfill step can fail with:
[Lightning:KV:ErrCreateKVClient] create kv client error: context deadline exceeded
- etcd client:
dial tcp: missing address
So ingest backfill can fail because the etcd client used for safe point KV is created with empty PD endpoints.
Introduced by: #55433 (ddl: directly use BackendConfig rather than use lightning config)
That PR switched DDL ingest to build local.BackendConfig via genConfig() in pkg/ddl/ingest/config.go instead of the full Lightning config. genConfig() does not set PDAddr; it only sets fields like LocalStoreDir, KeyspaceName, concurrency, etc. So in the DDL ingest path, BackendConfig.PDAddr is always the zero value (empty string).
pkg/lightning/backend/local/local.go then had two uses of PD addresses:
- PD client:
pdAddrs from pdSvcDiscovery.GetServiceURLs() when pdSvcDiscovery != nil (DDL path), so the PD client gets valid addresses.
- Etcd safe point KV: it used
config.PDAddr for NewEtcdSafePointKV(), which in the DDL path is never set and stays empty.
So when DXF runs add-index and creates a local backend on an executor node, the etcd client is created with empty endpoints → "missing address" and the step fails (often reported as context deadline exceeded).
Note: Already Fixed on master by:#59757 (*: upgrade to the latest client-go)
1. Minimal reproduce step (Required)
- Deploy a TiDB cluster (e.g. 8.5.x) with PD and TiKV.
- Create a table with enough data so that add-index uses the ingest path (e.g. tens of thousands of rows or more).
- Ensure add-index runs via the distributed reorg path (e.g.
tidb_enable_dist_task / DDL distributed framework enabled, or conditions that make the job use IsDistReorg).
- Run
ALTER TABLE t ADD INDEX idx(c); (or add primary key / other index that uses ingest).
- Observe the backfill step on the executor node: it creates a local backend and then fails.
2. What did you expect to see? (Required)
Add-index backfill (ingest) should complete successfully: the executor creates a local backend, connects to PD/etcd with valid addresses, and the backfill step succeeds.
3. What did you see instead? (Required)
The backfill step fails with:
- TiDB log:
[Lightning:KV:ErrCreateKVClient] create kv client error: context deadline exceeded
- TiDB log:
["build ingest backend failed"] ["job ID"=...] [error="[Lightning:KV:ErrCreateKVClient]create kv client error: context deadline exceeded"]
- etcd client log (if visible):
dial tcp: missing address and/or retrying of unary invoker failed ... latest balancer error: last connection error: ... dial tcp: missing address
The step often hits the step context deadline (~5s) and retries repeatedly with the same error.
4. What is your TiDB version? (Required)
v8.5.2
Bug Report
When add-index runs via the DDL distributed framework (DXF) with ingest, the backfill step can fail with:
[Lightning:KV:ErrCreateKVClient] create kv client error: context deadline exceededdial tcp: missing addressSo ingest backfill can fail because the etcd client used for safe point KV is created with empty PD endpoints.
Introduced by: #55433 (ddl: directly use BackendConfig rather than use lightning config)
That PR switched DDL ingest to build
local.BackendConfigviagenConfig()inpkg/ddl/ingest/config.goinstead of the full Lightning config.genConfig()does not setPDAddr; it only sets fields likeLocalStoreDir,KeyspaceName, concurrency, etc. So in the DDL ingest path,BackendConfig.PDAddris always the zero value (empty string).pkg/lightning/backend/local/local.gothen had two uses of PD addresses:pdAddrsfrompdSvcDiscovery.GetServiceURLs()whenpdSvcDiscovery != nil(DDL path), so the PD client gets valid addresses.config.PDAddrforNewEtcdSafePointKV(), which in the DDL path is never set and stays empty.So when DXF runs add-index and creates a local backend on an executor node, the etcd client is created with empty endpoints → "missing address" and the step fails (often reported as context deadline exceeded).
Note: Already Fixed on master by:#59757 (*: upgrade to the latest client-go)
1. Minimal reproduce step (Required)
tidb_enable_dist_task/ DDL distributed framework enabled, or conditions that make the job useIsDistReorg).ALTER TABLE t ADD INDEX idx(c);(or add primary key / other index that uses ingest).2. What did you expect to see? (Required)
Add-index backfill (ingest) should complete successfully: the executor creates a local backend, connects to PD/etcd with valid addresses, and the backfill step succeeds.
3. What did you see instead? (Required)
The backfill step fails with:
[Lightning:KV:ErrCreateKVClient] create kv client error: context deadline exceeded["build ingest backend failed"] ["job ID"=...] [error="[Lightning:KV:ErrCreateKVClient]create kv client error: context deadline exceeded"]dial tcp: missing addressand/orretrying of unary invoker failed ... latest balancer error: last connection error: ... dial tcp: missing addressThe step often hits the step context deadline (~5s) and retries repeatedly with the same error.
4. What is your TiDB version? (Required)
v8.5.2