Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ticlient: Add keep alive #7099

Merged
merged 3 commits into from Jul 19, 2018

Conversation

@breeswish
Copy link
Member

commented Jul 18, 2018

What have you changed? (mandatory)

This PR adds keep alive settings for ticlient, using the same configuration as TiKV's (time = 10s, timeout = 3s).

By adding keep alive, we can avoid firewall dropping our inactive connections, which will cause SQL queries to fail.

Since the client meets issue on our release-2.0 branch, this fix is proposed over release-2.0 branch instead of master. It will be cherry picked to master later.

What is the type of the changes? (mandatory)

  • Bug fix (non-breaking change which fixes an issue)

How has this PR been tested? (mandatory)

To test whether this fix is effective, we first need to reproduce the issue in our own environment.

Since I don't have such firewalls, so I tried to simulate this firewall by using the following scripts:

echo "*filter" > rule
echo ":INPUT ACCEPT [549:776388]" >> rule
echo ":FORWARD ACCEPT [0:0]" >> rule
echo ":OUTPUT ACCEPT [596:577866]" >> rule
netstat -anp | grep tidb | grep ESTABLISHED | grep tcp | grep 20160 | awk '{ print $4 }' | awk -F':' '{ print $2 }' | sort | uniq | awk '{ print "-A INPUT -p tcp --dport " $1 " -j DROP"; print "-A OUTPUT -p tcp --sport " $1 " -j DROP" }' >> rule
echo "COMMIT" >> rule
cat rule | tee /etc/sysconfig/iptables
service iptables restart

This script captures all alive source ports of established connections between current host's TiDB and other TiKVs. These source ports will be added it to iptables' rule (drop packet). After execution, all future packets in these ports (connection) will be dropped, just like the firewall.

For the current TiDB master as well as release-2.0 branch

For TiDB, after dropping start working, existing gRPC connections were still used to send requests (and will never receive response) so all queries from this TiDB took a very long time and its QPS is 0:

image

image

Sysbench will fail:

image

According to netstat, these dead connections were kept for more than 15 minutes since we started another sysbench after dropping them. After that, they were destroyed and new connections were established, so that everything backed to normal again.

For this fixed version (Test 1)

I started a sysbench immediately after these connections are dropped by iptables:

image

image

We can see that initially QPS was affected (notice that we deployed multiple TiDBs and only 1 is affected). After about 30 seconds it was recovered. This is far better than the 15-minute-recovery previously. Also sysbench did not fail.

For this fixed version (Test 2)

I started a sysbench 1 minute after these connections are dropped by iptables:

image

We can see that QPS was not affected totally.

Does this PR affect documentation (docs/docs-cn) update? (mandatory)

No.

Does this PR affect tidb-ansible update? (mandatory)

pingcap/tidb-ansible#469

Does this PR need to be added to the release notes? (mandatory)

No.

Refer to a related PR or issue link (optional)

Benchmark result if necessary (optional)

Add a few positive/negative examples (optional)

@zhexuany
Copy link
Member

left a comment

LGTM

@@ -389,6 +389,8 @@ func setGlobalVars() {
if cfg.TiKVClient.GrpcConnectionCount > 0 {
tikv.MaxConnectionCount = cfg.TiKVClient.GrpcConnectionCount
}
tikv.GrpcKeepAliveTime = time.Duration(cfg.TiKVClient.GrpcKeepAliveTime) * time.Second

This comment has been minimized.

Copy link
@zz-jason

zz-jason Jul 19, 2018

Member

I think we should check where the configuration is valid. For example, the configured time duration should be greater than zero.

This comment has been minimized.

Copy link
@breeswish

breeswish Jul 19, 2018

Author Member

It seems that other configurations are not checked as well, except for the 1 config above (GrpcConnectionCount). I think it would be better to leave to another PR to do this.

This comment has been minimized.

Copy link
@zz-jason

zz-jason Jul 19, 2018

Member

OK, could you file an github issue about this?

This comment has been minimized.

Copy link
@breeswish

breeswish Jul 19, 2018

Author Member

@zz-jason Yes! I just created one: #7103

@ngaut

This comment has been minimized.

Copy link
Member

commented Jul 19, 2018

Well done.

@coocood

This comment has been minimized.

Copy link
Member

commented Jul 19, 2018

/run-all-tests tidb-test=release-2.0 tikv=release-2.0 pd=release-2.0

@coocood

This comment has been minimized.

Copy link
Member

commented Jul 19, 2018

LGTM

@breeswish

This comment has been minimized.

Copy link
Member Author

commented Jul 19, 2018

/run-all-tests tidb-test=release-2.0 tikv=release-2.0 pd=release-2.0

1 similar comment
@zhexuany

This comment has been minimized.

Copy link
Member

commented Jul 19, 2018

/run-all-tests tidb-test=release-2.0 tikv=release-2.0 pd=release-2.0

@breeswish

This comment has been minimized.

Copy link
Member Author

commented Jul 19, 2018

/run-integration-common-test tidb-test=release-2.0 tikv=release-2.0 pd=release-2.0

1 similar comment
@breeswish

This comment has been minimized.

Copy link
Member Author

commented Jul 19, 2018

/run-integration-common-test tidb-test=release-2.0 tikv=release-2.0 pd=release-2.0

@zhexuany
Copy link
Member

left a comment

LGTM

@coocood coocood merged commit 5c61f4c into pingcap:release-2.0 Jul 19, 2018

11 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
jenkins-ci-tidb/build Jenkins job succeeded.
Details
jenkins-ci-tidb/common-test Jenkins job succeeded.
Details
jenkins-ci-tidb/integration-common-test Jenkins job succeeded.
Details
jenkins-ci-tidb/integration-compatibility-test Jenkins job succeeded.
Details
jenkins-ci-tidb/integration-ddl-test Jenkins job succeeded.
Details
jenkins-ci-tidb/mybatis-test Jenkins job succeeded.
Details
jenkins-ci-tidb/sqllogic-test Jenkins job succeeded.
Details
jenkins-ci-tidb/unit-test Jenkins job succeeded.
Details
license/cla Contributor License Agreement is signed.
Details

@breeswish breeswish deleted the breeswish:wenxuan/keepalive_2.0 branch Jul 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.