TiDB cluster is much slower using sysbench #5328

zzx8170 · 2017-12-06T08:59:53Z

Please answer these questions before submitting your issue. Thanks!

What did you do?
参考文档:https://github.com/pingcap/docs-cn/blob/master/benchmark/sysbench.md
设置了：2 tidb server/3 pd/12 tikv(3台虚拟机，每个上4个tikv实例)
测试脚本：https://github.com/pingcap/tidb-bench/tree/cwen/not_prepared_statement/sysbench
new_oltp.sh
测试结果：并发1、2、4、8、16、32、164、128、256，最高tps 530，qps：10000，而虚拟机上一个单实例mysql 并发256时候tps 约3000，qps约50000，是tidb的5倍多，和你们的测试结果差异很大，测试期间io wait很低，io不是瓶颈，会是什么原因？
What did you expect to see?
What did you see instead?
What version of TiDB are you using (tidb-server -V)?
1.0

The text was updated successfully, but these errors were encountered:

shenli · 2017-12-06T09:06:47Z

@zzx8170 We do not encourage using virtual machine to run PD/TiKV.

Please provide your hardware info. How many tables are there in your test and how many rows are there in a table?

Please refer to our test result: https://github.com/pingcap/docs/blob/master/benchmark/sysbench.md

zzx8170 · 2017-12-06T10:15:39Z

tcount=16
tsize=1000000

虚拟机配置：cpu 10核 cpu MHz : 2299.996 内存：32G 硬盘：ssd

dbjoa · 2017-12-06T11:13:44Z

@zzx8170,

Would you check the value of "innodb_flush_log_at_trx_commit" in MySQL configuration and the value of "sync-log" in TiKV configuration for a fair comparison?

If the value of "innodb_flush_log_at_trx_commit" is "0" or "2", the value of "sync-log" should be "false".

[raftstore]
sync-log = false

If you don't want to change the default value of "sync-log", the value of "innodb_flush_log_at_trx_commit" should be "1".

innodb_flush_log_at_trx_commit = 1

zzx8170 · 2017-12-06T14:25:29Z

tikv配置文件如下，sync-log=false:

more tikv.toml

# TiKV config template
#  Human-readable big numbers:
#   File size(based on byte): KB, MB, GB, TB, PB
#    e.g.: 1_048_576 = "1MB"
#   Time(based on ms): ms, s, m, h
#    e.g.: 78_000 = "1.3m"

# log level: trace, debug, info, warn, error, off.
# log-level = "info"
# file to store log, write to stderr if it's empty.
# log-file = ""

[server]
labels = { host = "tikv2" }

[storage]

[pd]
# This section will be overwritten by command line parameters

[metric]
address = "10.1.13.210:9091"
interval = "15s"
job = "tikv"

[raftstore]
raftdb-path = ""
sync-log = false

[rocksdb]
wal-dir = ""

[rocksdb.defaultcf]

[rocksdb.lockcf]

[rocksdb.writecf]
block-cache-size = "1GB"

[raftdb]

[raftdb.defaultcf]

mysql没有改默认设置innodb_flush_log_at_trx_commit=1，即使这样，也远远高于tidb的结果，改成0或2就差距更大了，tikv之前3个节点的时候和现在12个节点结果几乎没有任何差距，why？增加tidb server会有线性增加，但和mysql差距依然巨大，还会有其他原因吗？可以加qq细聊：41517897

zzx8170 · 2017-12-08T03:42:34Z

有人支持吗？

shenli · 2017-12-08T03:54:09Z

@zzx8170 Could I access to your grafana to see where is the bottle-neck of your cluster?
There are some documents your should refer: https://github.com/pingcap/docs/blob/master/op-guide/recommendation.md and https://github.com/pingcap/docs/blob/master/op-guide/tune-TiKV.md.

zzx8170 · 2017-12-08T06:16:06Z

@shenli 当然可以，不过是内网ip无法访问，你可以加我qq：41517897，我发监控截图给你

zzx8170 · 2017-12-11T09:45:39Z

@shenli 看了那两篇文档，但是没有实质帮助，还有其他办法吗？

shenli · 2017-12-12T02:06:09Z

@zzx8170 Some advices:

Use a separate machine to deploy tidb-server and pd-server
The hardware for tikv-server is not very good. Please upgrade the hardware or deploy only one tikv-server for each machine.
Show me the graph of Thread CPU on tikv panel of grafana.

zzx8170 · 2017-12-12T08:08:01Z

zzx8170 · 2017-12-12T08:08:53Z

zzx8170 · 2017-12-12T08:09:39Z

zzx8170 · 2017-12-12T08:10:19Z

zzx8170 · 2017-12-12T08:10:56Z

zzx8170 · 2017-12-12T08:13:19Z

@shenli

one of machine top result：
top - 16:11:54 up 32 days, 10 min, 1 user, load average: 9.82, 7.15, 5.23
Tasks: 168 total, 1 running, 167 sleeping, 0 stopped, 0 zombie
%Cpu0 : 33.4 us, 9.7 sy, 0.0 ni, 46.9 id, 0.0 wa, 0.0 hi, 0.0 si, 10.0 st
%Cpu1 : 20.5 us, 6.2 sy, 0.0 ni, 57.5 id, 0.0 wa, 0.0 hi, 0.3 si, 15.6 st
%Cpu2 : 38.1 us, 10.0 sy, 0.0 ni, 41.2 id, 0.0 wa, 0.0 hi, 0.0 si, 10.7 st
%Cpu3 : 18.1 us, 5.0 sy, 0.0 ni, 63.8 id, 0.0 wa, 0.0 hi, 0.0 si, 13.1 st
%Cpu4 : 7.6 us, 3.5 sy, 0.0 ni, 73.2 id, 0.0 wa, 0.0 hi, 0.3 si, 15.3 st
%Cpu5 : 30.6 us, 7.8 sy, 0.0 ni, 50.3 id, 0.0 wa, 0.0 hi, 0.0 si, 11.2 st
%Cpu6 : 30.1 us, 11.8 sy, 0.0 ni, 42.3 id, 0.0 wa, 0.0 hi, 8.6 si, 7.2 st
%Cpu7 : 26.7 us, 6.2 sy, 0.0 ni, 55.5 id, 0.0 wa, 0.0 hi, 0.0 si, 11.6 st
KiB Mem : 8010432 total, 138708 free, 6701080 used, 1170644 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 791200 avail Mem

sysbench oltp result:
470s ] thds: 1024 tps: 420.90 qps: 8089.79 (r/w/o: 5662.36/1519.12/908.31) lat (ms,95%): 4599.99 err/s: 0.00 reconn/s: 0.00
[ 480s ] thds: 1024 tps: 419.90 qps: 8435.41 (r/w/o: 5883.71/1637.50/914.20) lat (ms,95%): 3982.86 err/s: 0.00 reconn/s: 0.00
[ 490s ] thds: 1024 tps: 324.00 qps: 6763.25 (r/w/o: 4782.36/1262.79/718.09) lat (ms,95%): 5709.50 err/s: 0.00 reconn/s: 0.00
[ 500s ] thds: 1024 tps: 314.40 qps: 6162.97 (r/w/o: 4330.45/1152.01/680.51) lat (ms,95%): 7479.98 err/s: 0.00 reconn/s: 0.00
...

shenli · 2017-12-12T15:56:57Z

TiKV use raft to replicate data and use 2pc for distributed transaction. So in your scenario, TiDB could not be faster than single node MySQL.
You could add more node to your cluster to get higher throughput. TiDB could get linear growth for throughput when adding new node. See : https://github.com/pingcap/docs/blob/master/benchmark/sysbench.md#scenario-two-tidb-horizontal-scalability-test

zzx8170 · 2017-12-13T01:44:35Z

TiKV use raft to replicate data and use 2pc for distributed transaction. So in your scenario, TiDB could not be faster than single node MySQL。

@shenli 感谢回复，但上面的解释并不能让我理解我的场景为什么比单实例的慢？raft和2pc需要在什么环境下才能快起来？

shenli · 2017-12-14T03:18:11Z

You data is not too much. MySQL could cache most of the data in memory. So it could beat any distributed database system on this scenario.

zzx8170 · 2017-12-14T09:21:47Z

ok，I‘ll retry on higher performance machine,thanks again!

shenli added the type/performance label Dec 6, 2017

zzx8170 closed this as completed Dec 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TiDB cluster is much slower using sysbench #5328

TiDB cluster is much slower using sysbench #5328

zzx8170 commented Dec 6, 2017 •

edited

shenli commented Dec 6, 2017

zzx8170 commented Dec 6, 2017

dbjoa commented Dec 6, 2017

zzx8170 commented Dec 6, 2017 •

edited by shenli

zzx8170 commented Dec 8, 2017

shenli commented Dec 8, 2017

zzx8170 commented Dec 8, 2017

zzx8170 commented Dec 11, 2017

shenli commented Dec 12, 2017

zzx8170 commented Dec 12, 2017

zzx8170 commented Dec 12, 2017

zzx8170 commented Dec 12, 2017

zzx8170 commented Dec 12, 2017

zzx8170 commented Dec 12, 2017

zzx8170 commented Dec 12, 2017

shenli commented Dec 12, 2017

zzx8170 commented Dec 13, 2017

shenli commented Dec 14, 2017

zzx8170 commented Dec 14, 2017

TiDB cluster is much slower using sysbench #5328

TiDB cluster is much slower using sysbench #5328

Comments

zzx8170 commented Dec 6, 2017 • edited

shenli commented Dec 6, 2017

zzx8170 commented Dec 6, 2017

dbjoa commented Dec 6, 2017

zzx8170 commented Dec 6, 2017 • edited by shenli

zzx8170 commented Dec 8, 2017

shenli commented Dec 8, 2017

zzx8170 commented Dec 8, 2017

zzx8170 commented Dec 11, 2017

shenli commented Dec 12, 2017

zzx8170 commented Dec 12, 2017

zzx8170 commented Dec 12, 2017

zzx8170 commented Dec 12, 2017

zzx8170 commented Dec 12, 2017

zzx8170 commented Dec 12, 2017

zzx8170 commented Dec 12, 2017

shenli commented Dec 12, 2017

zzx8170 commented Dec 13, 2017

shenli commented Dec 14, 2017

zzx8170 commented Dec 14, 2017

zzx8170 commented Dec 6, 2017 •

edited

zzx8170 commented Dec 6, 2017 •

edited by shenli