Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tikvclient: fix a bug that double close channels. #10991

Merged
merged 31 commits into from
Jul 5, 2019

Conversation

hicqu
Copy link
Contributor

@hicqu hicqu commented Jun 30, 2019

Signed-off-by: qupeng qupeng@pingcap.com

What problem does this PR solve?

There is a bug in tikvclient module about double closing channels, which causes some threads panic when holding a lock. Although these threads can recover, they will be blocked on that lock so they can't work finally.

What is changed and how it works?

This PR fixed the channel double closing problem, and use defer to make the lock logic more clear.

Check List

Tests

  • It's hard to add unit tests. We need to cover the case in our tikv client test framework.

Related changes

  • Need to cherry-pick to the release branch

Signed-off-by: qupeng <qupeng@pingcap.com>
@hicqu hicqu requested review from lysu and zz-jason June 30, 2019 12:19
@hicqu hicqu added the type/bug-fix This PR fixes a bug. label Jun 30, 2019
Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add an unit test

@hicqu
Copy link
Contributor Author

hicqu commented Jun 30, 2019

@zz-jason I guess you didn't read the initial comment. Unit tests can't work for these cases because we need some real TiKV services to run tikvclient code. Currently we test tikvclient.go in 2 schrodinger test cases, we can add more cases there instead of here.

@siddontang
Copy link
Member

siddontang commented Jun 30, 2019

you can use faketikv

seem we also need to care client-go , @disksing

@lysu lysu requested a review from tiancaiamao July 1, 2019 12:28
store/tikv/client.go Outdated Show resolved Hide resolved
@lysu
Copy link
Collaborator

lysu commented Jul 1, 2019

almost lgtm, but have some question..

can we do not call failPendingRequests in sendLoop for errors execept io.EOF?

it seems when sendMsg fail will finish csAttempt https://github.com/grpc/grpc-go/blob/master/stream.go#L670

so it will trigger retry or http2Client#closeStream(will write error to recvBuffer), so it seems recvMsg always got error - -?

..... but maybe do it in both side will more safe

@lysu lysu added the priority/release-blocker This PR blocks a release. Please review it ASAP. label Jul 2, 2019
@hicqu
Copy link
Contributor Author

hicqu commented Jul 3, 2019

@lysu I think don't depend gRPC's internal behaviors is better. So let's call failPendingRequests in a locked context.

@codecov
Copy link

codecov bot commented Jul 3, 2019

Codecov Report

Merging #10991 into master will decrease coverage by 0.4708%.
The diff coverage is 72.2222%.

@@               Coverage Diff                @@
##             master     #10991        +/-   ##
================================================
- Coverage   81.5977%   81.1269%   -0.4709%     
================================================
  Files           420        420                
  Lines         90902      89466      -1436     
================================================
- Hits          74174      72581      -1593     
- Misses        11421      11630       +209     
+ Partials       5307       5255        -52

@hicqu
Copy link
Contributor Author

hicqu commented Jul 3, 2019

PTAL @lysu @zz-jason @lonng thanks!

store/tikv/client.go Outdated Show resolved Hide resolved
@tiancaiamao
Copy link
Contributor

Please fix CI @hicqu

@tiancaiamao
Copy link
Contributor

/run-all-tests

@tiancaiamao
Copy link
Contributor

/run-all-tests

@tiancaiamao
Copy link
Contributor

/run-all-tests

@tiancaiamao tiancaiamao added status/all tests passed status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jul 5, 2019
@tiancaiamao tiancaiamao merged commit 21d2590 into pingcap:master Jul 5, 2019
@hicqu hicqu deleted the fix-tikvclient-double-close branch July 5, 2019 08:53
hicqu added a commit to hicqu/tidb that referenced this pull request Jul 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/tikv priority/release-blocker This PR blocks a release. Please review it ASAP. status/LGT2 Indicates that a PR has LGTM 2. type/bug-fix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants