lightning failed with '503 Service Unavailable' when inject pd leader network partition #49744
Labels
component/lightning
This issue is related to Lightning of TiDB.
severity/major
type/bug
The issue is confirmed as a bug.
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
1、run lightning
2、inject pd leader network partition from all other pod
2. What did you expect to see? (Required)
lightning can success whenpd leader network partition
3. What did you see instead (Required)
lightning failed when inject pd leader network partition
Verbose debug logs will be written to /tmp/lightning.log.2023-12-23T02.30.44Z
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
| # | CHECK ITEM | TYPE | PASSED |
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
| 1 | Source data files size is proper | performance | true |
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
| 2 | the checkpoints are valid | critical | true |
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
| 3 | table schemas are valid | critical | true |
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
| 4 | all importing tables on the target are empty | critical | true |
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
| 5 | Cluster version check passed | critical | true |
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
| 6 | Lightning has the correct storage permission | critical | true |
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
| 7 | local disk resources are rich, estimate sorted data size 26.05GiB, local available is 3.399TiB | critical | true |
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
| 8 | The storage space is rich, which TiKV/Tiflash is 5.368TiB/0B. The estimated storage space is 78.16GiB/0B. | performance | true |
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
| 9 | Cluster doesn't have too many empty regions | performance | true |
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
| 10 | Cluster region distribution is balanced | performance | true |
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
| 11 | no CDC or PiTR task found | critical | true |
+----+-----------------------------------------------------------------------------------------------------------+-------------+--------+
{"level":"warn","ts":"2023-12-23T02:32:12.363771Z","logger":"etcd-client","caller":"v3@v3.5.10/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc001f321c0/tc-pd-2.tc-pd-peer.ha-test-lightning-tps-5340957-1-769.svc:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
{"level":"warn","ts":"2023-12-23T02:32:12.364291Z","logger":"etcd-client","caller":"v3@v3.5.10/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc001ce6000/tc-pd:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
{"level":"warn","ts":"2023-12-23T02:32:28.370028Z","logger":"etcd-client","caller":"v3@v3.5.10/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc001f321c0/tc-pd-2.tc-pd-peer.ha-test-lightning-tps-5340957-1-769.svc:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
{"level":"warn","ts":"2023-12-23T02:32:48.400951Z","logger":"etcd-client","caller":"v3@v3.5.10/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc001ce6000/tc-pd:2379","attempt":0,"error":"rpc error: code = Unknown desc = context deadline exceeded"}
{"level":"warn","ts":"2023-12-23T02:35:54.402124Z","logger":"etcd-client","caller":"v3@v3.5.10/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc001f321c0/tc-pd-2.tc-pd-peer.ha-test-lightning-tps-5340957-1-769.svc:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
tidb lightning encountered error: [Lightning:Restore:ErrRestoreTable]restore table
sysbench
.user_data1
failed: request pd http api failed with status: '503 Service Unavailable'tidb logs:
tidb-0.log
tidb-1.log
request pd http api failed with status: '503 Service Unavailable'
github.com/tikv/pd/client/http.(*clientInner).doRequest
/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20231219031951-25f48f0bdd27/http/client.go:242
github.com/tikv/pd/client/http.(*clientInner).requestWithRetry
/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20231219031951-25f48f0bdd27/http/client.go:156
github.com/tikv/pd/client/http.(*client).request
/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20231219031951-25f48f0bdd27/http/client.go:414
github.com/tikv/pd/client/http.(*client).SetRegionLabelRule
/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20231219031951-25f48f0bdd27/http/interface.go:540
github.com/pingcap/tidb/br/pkg/pdutil.pauseSchedulerByKeyRangeWithTTL
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/pdutil/pd.go:1002
github.com/pingcap/tidb/br/pkg/pdutil.PauseSchedulersByKeyRange
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/pdutil/pd.go:970
github.com/pingcap/tidb/br/pkg/lightning/backend/local.(*Backend).ImportEngine
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/local.go:1564
github.com/pingcap/tidb/br/pkg/lightning/backend.(*ClosedEngine).Import
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/lightning/backend/backend.go:366
github.com/pingcap/tidb/br/pkg/lightning/importer.(*TableImporter).importKV
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/lightning/importer/table_import.go:1343
github.com/pingcap/tidb/br/pkg/lightning/importer.(*TableImporter).importEngine
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/lightning/importer/table_import.go:919
github.com/pingcap/tidb/br/pkg/lightning/importer.(*TableImporter).importEngines.func3
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/lightning/importer/table_import.go:525
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1650
4. What is your TiDB version? (Required)
./tidb-server -V
Release Version: v7.6.0-alpha
Edition: Community
Git Commit Hash: 5c279d8
Git Branch: heads/refs/tags/v7.6.0-alpha
UTC Build Time: 2023-12-21 07:56:10
GoVersion: go1.21.5
Race Enabled: false
Check Table Before Drop: false
Store: unistore
2023-12-23T04:16:24.782+0800
The text was updated successfully, but these errors were encountered: