Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry to gRPC stream client getter #390

Merged
merged 2 commits into from
Mar 29, 2020

Conversation

amyangfei
Copy link
Contributor

What problem does this PR solve?

Fix #387

What is changed and how it works?

  • Add retry to gRPC stream client getter
  • To deal with the store permanently down scenario, try to relocate the active store, so retry the mainloop of dispatchRequest when getting stream client failed.

Check List

Tests

  • Unit test
  • Integration test

@amyangfei amyangfei added type/bugfix This PR fixes a bug. component/kv-client TiKV kv log client component. labels Mar 28, 2020
@amyangfei
Copy link
Contributor Author

/run-integration-tests

@codecov-io
Copy link

codecov-io commented Mar 28, 2020

Codecov Report

Merging #390 into master will increase coverage by 2.0226%.
The diff coverage is 19.9152%.

@@               Coverage Diff                @@
##             master       #390        +/-   ##
================================================
+ Coverage   29.2417%   31.2643%   +2.0226%     
================================================
  Files            59         60         +1     
  Lines          5328       5655       +327     
================================================
+ Hits           1558       1768       +210     
- Misses         3629       3730       +101     
- Partials        141        157        +16     

@amyangfei
Copy link
Contributor Author

/run-integration-tests

@amyangfei amyangfei added the status/ptal Could you please take a look? label Mar 28, 2020
log.Warn("get grpc stream client failed", zap.Error(err))
bo := tikv.NewBackoffer(ctx, tikvRequestMaxBackoff)
c.regionCache.OnSendFail(bo, rpcCtx, needReloadRegion(sri.failStoreIDs, rpcCtx), err)
continue MainLoop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if tikv is not active, will we fall into an endless loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c.regionCache.OnSendFail(bo, rpcCtx, needReloadRegion(sri.failStoreIDs, rpcCtx), err) has set this tikv as failed store

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the server is all down?

Copy link
Contributor Author

@amyangfei amyangfei Mar 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cdc retries to connect to tikv until some TiKVs are up and the cluster is available

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact this error only happens when gRPC connection is not established, which means at the CDC startup procedure.

@zier-one zier-one merged commit d5795ec into pingcap:master Mar 29, 2020
@zier-one zier-one added LGT1 and removed status/ptal Could you please take a look? labels Mar 29, 2020
5kbpers pushed a commit to 5kbpers/ticdc that referenced this pull request Aug 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/kv-client TiKV kv log client component. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cdc server exits because of some RPC error
3 participants