Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiDB cluster is much slower using sysbench #5328

Closed
zzx8170 opened this issue Dec 6, 2017 · 19 comments
Closed

TiDB cluster is much slower using sysbench #5328

zzx8170 opened this issue Dec 6, 2017 · 19 comments

Comments

@zzx8170
Copy link

zzx8170 commented Dec 6, 2017

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
    参考文档:https://github.com/pingcap/docs-cn/blob/master/benchmark/sysbench.md
    设置了:2 tidb server/3 pd/12 tikv(3台虚拟机,每个上4个tikv实例)
    测试脚本:https://github.com/pingcap/tidb-bench/tree/cwen/not_prepared_statement/sysbench
    new_oltp.sh
    测试结果:并发1、2、4、8、16、32、164、128、256,最高tps 530,qps:10000,而虚拟机上一个单实例mysql 并发256时候tps 约3000,qps约50000,是tidb的5倍多,和你们的测试结果差异很大,测试期间io wait很低,io不是瓶颈,会是什么原因?

  2. What did you expect to see?

  3. What did you see instead?

  4. What version of TiDB are you using (tidb-server -V)?
    1.0

@shenli
Copy link
Member

shenli commented Dec 6, 2017

@zzx8170 We do not encourage using virtual machine to run PD/TiKV.

Please provide your hardware info. How many tables are there in your test and how many rows are there in a table?

Please refer to our test result: https://github.com/pingcap/docs/blob/master/benchmark/sysbench.md

@zzx8170
Copy link
Author

zzx8170 commented Dec 6, 2017

tcount=16
tsize=1000000

虚拟机配置:cpu 10核 cpu MHz : 2299.996 内存:32G 硬盘:ssd

@dbjoa
Copy link
Contributor

dbjoa commented Dec 6, 2017

@zzx8170,

Would you check the value of "innodb_flush_log_at_trx_commit" in MySQL configuration and the value of "sync-log" in TiKV configuration for a fair comparison?

If the value of "innodb_flush_log_at_trx_commit" is "0" or "2", the value of "sync-log" should be "false".

[raftstore]
sync-log = false

If you don't want to change the default value of "sync-log", the value of "innodb_flush_log_at_trx_commit" should be "1".

innodb_flush_log_at_trx_commit = 1

@zzx8170
Copy link
Author

zzx8170 commented Dec 6, 2017

tikv配置文件如下,sync-log=false:

more tikv.toml

# TiKV config template
#  Human-readable big numbers:
#   File size(based on byte): KB, MB, GB, TB, PB
#    e.g.: 1_048_576 = "1MB"
#   Time(based on ms): ms, s, m, h
#    e.g.: 78_000 = "1.3m"

# log level: trace, debug, info, warn, error, off.
# log-level = "info"
# file to store log, write to stderr if it's empty.
# log-file = ""

[server]
labels = { host = "tikv2" }

[storage]

[pd]
# This section will be overwritten by command line parameters

[metric]
address = "10.1.13.210:9091"
interval = "15s"
job = "tikv"

[raftstore]
raftdb-path = ""
sync-log = false

[rocksdb]
wal-dir = ""

[rocksdb.defaultcf]

[rocksdb.lockcf]

[rocksdb.writecf]
block-cache-size = "1GB"

[raftdb]

[raftdb.defaultcf]                   

mysql没有改默认设置innodb_flush_log_at_trx_commit=1,即使这样,也远远高于tidb的结果,改成0或2就差距更大了,tikv之前3个节点的时候和现在12个节点结果几乎没有任何差距,why?增加tidb server会有线性增加,但和mysql差距依然巨大,还会有其他原因吗?可以加qq细聊:41517897

@zzx8170
Copy link
Author

zzx8170 commented Dec 8, 2017

有人支持吗?

@shenli
Copy link
Member

shenli commented Dec 8, 2017

@zzx8170 Could I access to your grafana to see where is the bottle-neck of your cluster?
There are some documents your should refer: https://github.com/pingcap/docs/blob/master/op-guide/recommendation.md and https://github.com/pingcap/docs/blob/master/op-guide/tune-TiKV.md.

@zzx8170
Copy link
Author

zzx8170 commented Dec 8, 2017

@shenli 当然可以,不过是内网ip无法访问,你可以加我qq:41517897,我发监控截图给你

@zzx8170
Copy link
Author

zzx8170 commented Dec 11, 2017

@shenli 看了那两篇文档,但是没有实质帮助,还有其他办法吗?

@shenli
Copy link
Member

shenli commented Dec 12, 2017

@zzx8170 Some advices:

  • Use a separate machine to deploy tidb-server and pd-server
  • The hardware for tikv-server is not very good. Please upgrade the hardware or deploy only one tikv-server for each machine.
  • Show me the graph of Thread CPU on tikv panel of grafana.

@zzx8170
Copy link
Author

zzx8170 commented Dec 12, 2017

image

@zzx8170
Copy link
Author

zzx8170 commented Dec 12, 2017

image

@zzx8170
Copy link
Author

zzx8170 commented Dec 12, 2017

image

@zzx8170
Copy link
Author

zzx8170 commented Dec 12, 2017

image

@zzx8170
Copy link
Author

zzx8170 commented Dec 12, 2017

image

@zzx8170
Copy link
Author

zzx8170 commented Dec 12, 2017

@shenli

one of machine top result:
top - 16:11:54 up 32 days, 10 min, 1 user, load average: 9.82, 7.15, 5.23
Tasks: 168 total, 1 running, 167 sleeping, 0 stopped, 0 zombie
%Cpu0 : 33.4 us, 9.7 sy, 0.0 ni, 46.9 id, 0.0 wa, 0.0 hi, 0.0 si, 10.0 st
%Cpu1 : 20.5 us, 6.2 sy, 0.0 ni, 57.5 id, 0.0 wa, 0.0 hi, 0.3 si, 15.6 st
%Cpu2 : 38.1 us, 10.0 sy, 0.0 ni, 41.2 id, 0.0 wa, 0.0 hi, 0.0 si, 10.7 st
%Cpu3 : 18.1 us, 5.0 sy, 0.0 ni, 63.8 id, 0.0 wa, 0.0 hi, 0.0 si, 13.1 st
%Cpu4 : 7.6 us, 3.5 sy, 0.0 ni, 73.2 id, 0.0 wa, 0.0 hi, 0.3 si, 15.3 st
%Cpu5 : 30.6 us, 7.8 sy, 0.0 ni, 50.3 id, 0.0 wa, 0.0 hi, 0.0 si, 11.2 st
%Cpu6 : 30.1 us, 11.8 sy, 0.0 ni, 42.3 id, 0.0 wa, 0.0 hi, 8.6 si, 7.2 st
%Cpu7 : 26.7 us, 6.2 sy, 0.0 ni, 55.5 id, 0.0 wa, 0.0 hi, 0.0 si, 11.6 st
KiB Mem : 8010432 total, 138708 free, 6701080 used, 1170644 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 791200 avail Mem

sysbench oltp result:
470s ] thds: 1024 tps: 420.90 qps: 8089.79 (r/w/o: 5662.36/1519.12/908.31) lat (ms,95%): 4599.99 err/s: 0.00 reconn/s: 0.00
[ 480s ] thds: 1024 tps: 419.90 qps: 8435.41 (r/w/o: 5883.71/1637.50/914.20) lat (ms,95%): 3982.86 err/s: 0.00 reconn/s: 0.00
[ 490s ] thds: 1024 tps: 324.00 qps: 6763.25 (r/w/o: 4782.36/1262.79/718.09) lat (ms,95%): 5709.50 err/s: 0.00 reconn/s: 0.00
[ 500s ] thds: 1024 tps: 314.40 qps: 6162.97 (r/w/o: 4330.45/1152.01/680.51) lat (ms,95%): 7479.98 err/s: 0.00 reconn/s: 0.00
...

@shenli
Copy link
Member

shenli commented Dec 12, 2017

TiKV use raft to replicate data and use 2pc for distributed transaction. So in your scenario, TiDB could not be faster than single node MySQL.
You could add more node to your cluster to get higher throughput. TiDB could get linear growth for throughput when adding new node. See : https://github.com/pingcap/docs/blob/master/benchmark/sysbench.md#scenario-two-tidb-horizontal-scalability-test

@zzx8170
Copy link
Author

zzx8170 commented Dec 13, 2017

TiKV use raft to replicate data and use 2pc for distributed transaction. So in your scenario, TiDB could not be faster than single node MySQL。

@shenli 感谢回复,但上面的解释并不能让我理解我的场景为什么比单实例的慢?raft和2pc需要在什么环境下才能快起来?

@shenli
Copy link
Member

shenli commented Dec 14, 2017

You data is not too much. MySQL could cache most of the data in memory. So it could beat any distributed database system on this scenario.

@zzx8170
Copy link
Author

zzx8170 commented Dec 14, 2017

ok,I‘ll retry on higher performance machine,thanks again!

@zzx8170 zzx8170 closed this as completed Dec 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants