Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BR: retry for PD request error and TiKV IO error. #27787

Closed
Little-Wallace opened this issue Sep 3, 2021 · 2 comments · Fixed by #27803
Closed

BR: retry for PD request error and TiKV IO error. #27787

Little-Wallace opened this issue Sep 3, 2021 · 2 comments · Fixed by #27803
Labels
component/br This issue is related to BR of TiDB. type/bug This issue is a bug.

Comments

@Little-Wallace
Copy link
Contributor

Bug Report

I use BR to backup data from TiKV to S3. And BR task failed with the following log:

["failed to backup"] [error="msg:\"Io(Custom { kind: Other, error: \\\"failed to put object timeout after 15mins for upload in s3 storage\\\" })\" : [BR:KV:ErrKVUnknown]unknown tikv error"] [errorVerbose="[BR:KV:ErrKVUnknown]unknown tikv error\nmsg:\"Io(Custom { kind: Other, error: \\\"failed to put object timeout after 15mins for upload in s3 storage\\\" })\"

I check the code and find that IO::Custom does not include in errors which are allowed to retry request.

https://github.com/pingcap/tidb/blob/master/br/pkg/backup/push.go#L162

But I think it is usually that the requests to S3 fail or break just because timeout. Because user usually deploy a S3 cluster with the normal disk instead of nvme. So we shall retry this kind of error to avoid the whole BR task fails.

We also need to retry for pd request here:

https://github.com/pingcap/tidb/blob/master/br/pkg/backup/client.go#L464

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

BR shall retry request automatic.

3. What did you see instead (Required)

4. What is your TiDB version? (Required)

TiDB v4.0.6, TiDB v4.0.13

@Little-Wallace Little-Wallace added the type/bug This issue is a bug. label Sep 3, 2021
@Little-Wallace Little-Wallace changed the title BR shall retry for PD request error and TiKV IO error. BR: retry for PD request error and TiKV IO error. Sep 3, 2021
@Little-Wallace Little-Wallace added the component/br This issue is related to BR of TiDB. label Sep 3, 2021
@joccau
Copy link
Member

joccau commented Sep 3, 2021

assign @joccau

@github-actions
Copy link

github-actions bot commented Sep 6, 2021

Please check whether the issue should be labeled with 'affects-x.y' or 'backport-x.y.z',
and then remove 'needs-more-info' label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/br This issue is related to BR of TiDB. type/bug This issue is a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants