Skip to content

BENCHMARK : Pipy 0.90 Multi Threads HTTP1.1

CaiShu edited this page Jan 27, 2023 · 11 revisions

昨天flomesh发布了pipy 0.90版本,这个版本主要的变化是增加了多线程支持。详细的Release Note可以参考这里。

Pipy最初是作为sidecar proxy为目标进行设计和研发的,随着发展,pipy被使用在了越来越多的非sidecar proxy场景下。如0.50版本引入的MQTT协议支持,是为了满足用户对数亿IoT设备接入的需求。当前的0.90版本引入的多线程模式,是为了满足用户在高性能硬件平台上使用Pipy自建负载均衡器的需求。通过在高性能的白牌服务器上运行Pipy,可以实现媲美F5 BigIP等商业硬件产品的负载均衡能力,同时整体成本大幅降低。

Pipy多线程的实现基于asio的线程库,同时采用Linux内核的port reuse作为线程间的负载均衡。

这次的Benchmark测试里,我们主要关注pipy在多线程模式下随着线程数增加,pipy所能处理的HTTP1.1请求是否线性增长;以及在给定的硬件平台上,完成一百万RPS所需要的资源情况;以及在高负载基本HTTP1.1处理时候是否有明显的内存泄漏。

这次测试所采用的硬件,最开始我们选择了一台单处理Intel Xeon Gold 6144的服务器。这是2017年推出的当时高端处理器,具有8核心16线程,24M缓存。我们采购这种二手服务器做测试主要是成本原因。我们测试从pipy 一个线程、二个线程、四个线程、八个线程、十二个线程逐渐递加的模式。压测软件我们采用了wrk。在开始阶段,我们在同一台服务器运行pipy和wrk;但是在8个线程以上的测试中,我们需要在另外一台AMD Ryzen5 5600G台式机上运行wrk,pipy所在的Intel服务器和wrk所在的AMD台式机之间采用10G光纤连接。这次是最基础的HTTP1.1协议解析和网络IO处理的测试,因此对内存要求不高。Intel服务器配置了32G内存,AMD台式机配置了64G内存,但是实际测试用到的内存非常少。HTTP1.1是细节非常多的协议,是目前互联网上使用最广泛的协议。这次测试是基本的测试,pipy通过PipyJS直接返回“hi“,类似helloworld的测试;这次测试并不包含复杂的场景;更多的复杂场景可以以此为基础扩展。

测试硬件规格

运行pipy的Intel服务器:

[root@localhost ~]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                16
On-line CPU(s) list:   0-15
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Gold 6144 CPU @ 3.50GHz
Stepping:              4
CPU MHz:               3500.000
BogoMIPS:              7000.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              25344K
NUMA node0 CPU(s):     0-15
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp_epp
[root@localhost ~]# free
              total        used        free      shared  buff/cache   available
Mem:       32253728      839788    22518024       17416     8895916    30993832
Swap:      16252924           0    16252924

在做12线程测试时候,运行wrk的AMD台式机:

root@pve8:~# lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   48 bits physical, 48 bits virtual
CPU(s):                          12
On-line CPU(s) list:             0-11
Thread(s) per core:              2
Core(s) per socket:              6
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      25
Model:                           80
Model name:                      AMD Ryzen 5 5600G with Radeon Graphics
Stepping:                        0
Frequency boost:                 enabled
CPU MHz:                         782.422
CPU max MHz:                     4463.6709
CPU min MHz:                     1400.0000
BogoMIPS:                        7785.53
Virtualization:                  AMD-V
L1d cache:                       192 KiB
L1i cache:                       192 KiB
L2 cache:                        3 MiB
L3 cache:                        16 MiB
NUMA node0 CPU(s):               0-11
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccom
                                 p
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitizati
                                 on
Vulnerability Spectre v2:        Mitigation; Full AMD retpoline, IBPB conditional, IBRS_FW, STIBP a
                                 lways-on, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
                                 pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe
                                 1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_ap
                                 icid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2
                                  movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm ext
                                 apic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skini
                                 t wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx
                                  cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgs
                                 base bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap cl
                                 flushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_oc
                                 cup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru
                                  wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean fl
                                 ushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmlo
                                 ad vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_
                                 recov succor smca fsrm
root@pve8:~# free
               total        used        free      shared  buff/cache   available
Mem:        61615912    48749632     5226164       59712     7640116    12094084
Swap:        8388604       54016     8334588

在做12线程测试的时候,Intel服务器和AMD台式机之间采用10G光纤连接,他们之间的ping值是:

root@pve8:~# ping 10.10.6.1
PING 10.10.6.1 (10.10.6.1) 56(84) bytes of data.
64 bytes from 10.10.6.1: icmp_seq=1 ttl=64 time=0.084 ms
64 bytes from 10.10.6.1: icmp_seq=2 ttl=64 time=0.093 ms
64 bytes from 10.10.6.1: icmp_seq=3 ttl=64 time=0.094 ms

测试软件环境

运行pipy的服务器采用CentOS 7.9版本:

[root@localhost conf]# cat /etc/redhat-release 
CentOS Linux release 7.9.2009 (Core)
[root@localhost conf]# uname -a
Linux localhost.localdomain 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

在12线程测试案例中,运行wrk的AMD服务器采用Debian11.1版本:

root@pve8:~# cat /etc/debian_version 
11.1
root@pve8:~# uname -a
Linux pve8 5.13.19-2-pve #1 SMP PVE 5.13.19-4 (Mon, 29 Nov 2021 12:10:09 +0100) x86_64 GNU/Linux

pipy是从GitHub release页面直接下载的,下载链接,运行的命令是:

pipy -e pipy().listen(8080).serveHTTP(new Message("hi")) --reuse-port --threads=12

在测试中,我们尝试了--threads=1、2、4、8、10、12,对应的我们wrk的线程数采用了1、2、4、6、10、12、16。其中pipy 8 thread情况下,wrk采用了6 thread,且和pipy运行在同一个主机上;pipy在8线程以上的情况下,我们在AMD台式机上运行了10线程的wrk。pipy和wrk版本如下:

[root@localhost conf]# pipy -v
Version     : 0.90.0-18
Commit      : d0ffc6f7613f8b6c4bf79461ea6b546eeb80b378
Commit Date : Thu, 26 Jan 2023 09:36:30 +0800
Host        : Linux-5.15.0-1031-azure x86_64
OpenSSL     : OpenSSL 1.1.1q  5 Jul 2022
Builtin GUI : No
Samples     : No
[root@localhost conf]# wrk -v
wrk 4.2.0 [epoll] Copyright (C) 2012 Will Glozer
Usage: wrk <options> <url>                            
  Options:                                            
    -c, --connections <N>  Connections to keep open   
    -d, --duration    <T>  Duration of test           
    -t, --threads     <N>  Number of threads to use   
                                                      
    -s, --script      <S>  Load Lua script file       
    -H, --header      <H>  Add header to request      
        --latency          Print latency statistics   
        --timeout     <T>  Socket/request timeout     
    -v, --version          Print version details      
                                                      
  Numeric arguments may include a SI unit (1k, 1M, 1G)
  Time arguments may include a time unit (2s, 2m, 2h)

在整个测试中,除了调大最大文件打开数,我们没有调整其他的内核参数。

[root@localhost conf]# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 120116
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 120116
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

测试中我没有做numa绑定,在另外的测试中我们观察到numa绑定对性能有益,在特定的CPU上甚至能提高性能接近30%。

[root@localhost conf]# numastat
                           node0
numa_hit                46354014
numa_miss                      0
numa_foreign                   0
interleave_hit             30100
local_node              46354014
other_node                     0
[root@localhost conf]# numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 32304 MB
node 0 free: 21986 MB
node distances:
node   0 
  0:  10 

测试过程和结果

在测试过程中,我们对每个测试都执行3次以考察稳定性。

TC1 : 1 Thread

[root@localhost ~]# wrk -c100 -t1 -d30 http://localhost:8080/ --latency
Running 30s test @ http://localhost:8080/
  1 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   718.77us   22.50us   2.31ms   97.36%
    Req/Sec   139.51k     2.16k  143.51k    90.33%
  Latency Distribution
     50%  719.00us
     75%  724.00us
     90%  728.00us
     99%  751.00us
  4167073 requests in 30.00s, 254.34MB read
Requests/sec: 138888.90
Transfer/sec:      8.48MB
[root@localhost ~]# wrk -c100 -t1 -d30 http://localhost:8080/ --latency
Running 30s test @ http://localhost:8080/
  1 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   708.90us   24.64us   2.28ms   97.74%
    Req/Sec   141.43k     2.14k  145.43k    90.00%
  Latency Distribution
     50%  709.00us
     75%  715.00us
     90%  720.00us
     99%  740.00us
  4224342 requests in 30.00s, 257.83MB read
Requests/sec: 140794.98
Transfer/sec:      8.59MB
[root@localhost ~]# wrk -c100 -t1 -d30 http://localhost:8080/ --latency
Running 30s test @ http://localhost:8080/
  1 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   722.16us   26.61us   2.28ms   97.41%
    Req/Sec   138.90k     3.05k  144.70k    71.00%
  Latency Distribution
     50%  726.00us
     75%  735.00us
     90%  739.00us
     99%  749.00us
  4147754 requests in 30.00s, 253.16MB read
Requests/sec: 138245.50
Transfer/sec:      8.44MB
top - 23:39:57 up 1 day, 25 min,  3 users,  load average: 0.82, 0.59, 0.35
Tasks: 249 total,   1 running, 248 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.3 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  6.3 us, 40.5 sy,  0.0 ni, 34.6 id,  0.0 wa,  0.0 hi, 18.5 si,  0.0 st
%Cpu4  : 42.1 us, 31.8 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 26.2 si,  0.0 st
%Cpu5  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu8  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu9  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu10 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu11 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu12 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu13 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu14 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu15 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 32253728 total, 22542776 free,   813532 used,  8897420 buff/cache
KiB Swap: 16252924 total, 16252924 free,        0 used. 31020088 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                    
 93877 root      20   0  157160   7528   3060 S  99.7  0.0   1:10.63 pipy9                                      
 93892 root      20   0  187816   3852   1320 S  83.4  0.0   0:09.05 wrk     

TC2 : 2 Threads

[root@localhost ~]# wrk -c200 -t2 -d30 http://localhost:8080/ --latency
Running 30s test @ http://localhost:8080/
  2 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   713.81us  101.03us  22.77ms   95.89%
    Req/Sec   140.50k     4.80k  152.62k    91.83%
  Latency Distribution
     50%  681.00us
     75%  761.00us
     90%  771.00us
     99%    1.08ms
  8391393 requests in 30.00s, 512.17MB read
Requests/sec: 279667.72
Transfer/sec:     17.07MB
[root@localhost ~]# wrk -c200 -t2 -d30 http://localhost:8080/ --latency
Running 30s test @ http://localhost:8080/
  2 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   696.38us   35.87us  11.98ms   97.95%
    Req/Sec   144.00k     3.42k  152.96k    95.85%
  Latency Distribution
     50%  695.00us
     75%  702.00us
     90%  708.00us
     99%  739.00us
  8627069 requests in 30.10s, 526.55MB read
Requests/sec: 286570.16
Transfer/sec:     17.49MB
[root@localhost ~]# wrk -c200 -t2 -d30 http://localhost:8080/ --latency
Running 30s test @ http://localhost:8080/
  2 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   686.39us   45.86us   2.30ms   75.95%
    Req/Sec   146.05k     2.40k  153.49k    92.19%
  Latency Distribution
     50%  697.00us
     75%  727.00us
     90%  733.00us
     99%  749.00us
  8751690 requests in 30.10s, 534.16MB read
Requests/sec: 290723.14
Transfer/sec:     17.74MB
top - 23:43:42 up 1 day, 29 min,  3 users,  load average: 0.58, 0.47, 0.35
Threads: 270 total,   5 running, 265 sleeping,   0 stopped,   0 zombie
%Cpu0  : 42.4 us, 35.4 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 22.2 si,  0.0 st
%Cpu1  :  1.7 us,  7.6 sy,  0.0 ni, 86.9 id,  0.0 wa,  0.0 hi,  3.8 si,  0.0 st
%Cpu2  :  4.0 us, 20.2 sy,  0.0 ni, 66.5 id,  0.0 wa,  0.0 hi,  9.2 si,  0.0 st
%Cpu3  :  1.3 us,  6.0 sy,  0.0 ni, 88.9 id,  0.0 wa,  0.0 hi,  3.7 si,  0.0 st
%Cpu4  :  7.2 us, 36.9 sy,  0.0 ni, 37.3 id,  0.0 wa,  0.0 hi, 18.5 si,  0.0 st
%Cpu5  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  :  0.0 us,  0.3 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  : 42.2 us, 33.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 24.3 si,  0.0 st
%Cpu8  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu9  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu10 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu11 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu12 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu13 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu14 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu15 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 32253728 total, 22544080 free,   812548 used,  8897100 buff/cache
KiB Swap: 16252924 total, 16252924 free,        0 used. 31021072 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                     
 93900 root      20   0  230892   7060   3060 R 99.9  0.0   0:11.72 pipy9                                       
 93901 root      20   0  230892   7060   3060 R 99.7  0.0   0:11.71 pipy9                                       
 93906 root      20   0  262492   4376   1408 R 72.5  0.0   0:08.80 wrk                                         
 93905 root      20   0  262492   4376   1408 R 72.2  0.0   0:08.42 wrk       

TC3 : 4 Threads

[root@localhost ~]# wrk -c400 -t4 -d30 http://localhost:8080/ --latency
Running 30s test @ http://localhost:8080/
  4 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   706.09us  252.17us  47.47ms   99.02%
    Req/Sec   142.48k     6.71k  157.61k    86.33%
  Latency Distribution
     50%  682.00us
     75%  730.00us
     90%  770.00us
     99%    0.96ms
  17015503 requests in 30.00s, 1.01GB read
Requests/sec: 567093.25
Transfer/sec:     34.61MB
[root@localhost ~]# wrk -c400 -t4 -d30 http://localhost:8080/ --latency
Running 30s test @ http://localhost:8080/
  4 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   683.24us   58.77us  15.00ms   79.55%
    Req/Sec   146.56k     6.43k  157.10k    77.08%
  Latency Distribution
     50%  679.00us
     75%  714.00us
     90%  741.00us
     99%  797.00us
  17563955 requests in 30.11s, 1.05GB read
Requests/sec: 583406.21
Transfer/sec:     35.61MB
[root@localhost ~]# wrk -c400 -t4 -d30 http://localhost:8080/ --latency
Running 30s test @ http://localhost:8080/
  4 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   671.29us   67.94us   8.17ms   68.98%
    Req/Sec   149.04k     7.32k  164.91k    73.59%
  Latency Distribution
     50%  678.00us
     75%  725.00us
     90%  763.00us
     99%  792.00us
  17861568 requests in 30.10s, 1.06GB read
Requests/sec: 593334.86
Transfer/sec:     36.21MB
top - 00:17:33 up 1 day,  1:03,  3 users,  load average: 1.51, 0.37, 0.17
Threads: 274 total,   9 running, 265 sleeping,   0 stopped,   0 zombie
%Cpu0  :  5.4 us, 21.4 sy,  0.0 ni, 61.8 id,  0.0 wa,  0.0 hi, 11.4 si,  0.0 st
%Cpu1  : 44.7 us, 33.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 22.0 si,  0.0 st
%Cpu2  : 42.9 us, 34.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 22.6 si,  0.0 st
%Cpu3  : 10.9 us, 38.3 sy,  0.0 ni, 30.5 id,  0.0 wa,  0.0 hi, 20.3 si,  0.0 st
%Cpu4  :  3.8 us, 14.1 sy,  0.0 ni, 75.5 id,  0.0 wa,  0.0 hi,  6.6 si,  0.0 st
%Cpu5  : 43.9 us, 34.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 21.6 si,  0.0 st
%Cpu6  : 42.7 us, 34.8 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 22.5 si,  0.0 st
%Cpu7  :  4.5 us, 16.3 sy,  0.0 ni, 72.3 id,  0.0 wa,  0.0 hi,  6.9 si,  0.0 st
%Cpu8  :  2.8 us, 12.6 sy,  0.0 ni, 78.6 id,  0.0 wa,  0.0 hi,  6.0 si,  0.0 st
%Cpu9  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu10 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu11 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu12 :  6.5 us, 27.7 sy,  0.0 ni, 51.8 id,  0.0 wa,  0.0 hi, 14.0 si,  0.0 st
%Cpu13 :  0.0 us,  0.3 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu14 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu15 :  6.4 us, 27.0 sy,  0.0 ni, 52.8 id,  0.0 wa,  0.0 hi, 13.8 si,  0.0 st
KiB Mem : 32253728 total, 22530596 free,   826040 used,  8897092 buff/cache
KiB Swap: 16252924 total, 16252924 free,        0 used. 31007580 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                     
 93937 root      20   0  378356  19384   3068 R 99.9  0.1   0:21.62 pipy9                                       
 93939 root      20   0  378356  19384   3068 R 99.9  0.1   0:21.62 pipy9                                       
 93938 root      20   0  378356  19384   3068 R 99.7  0.1   0:21.61 pipy9                                       
 93936 root      20   0  378356  19384   3068 R 99.3  0.1   0:21.61 pipy9                                       
 93944 root      20   0  412008   5684   1420 R 78.4  0.0   0:16.63 wrk                                         
 93945 root      20   0  412008   5684   1420 R 78.4  0.0   0:16.62 wrk                                         
 93946 root      20   0  412008   5684   1420 R 77.4  0.0   0:16.62 wrk                                         
 93943 root      20   0  412008   5684   1420 R 77.1  0.0   0:16.75 wrk   

TC4 : 8 Threads

从8线程开始,我们增加从AMD台式机的测试,因为pipy的线程数加上wrk的线程数已经达到和超过Intel服务器总线程数量。

本地测试结果:

[root@localhost ~]# wrk -c800 -t6 -d30 http://localhost:8080/ --latency
Running 30s test @ http://localhost:8080/
  6 threads and 800 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.90ms  213.88us  10.58ms   62.50%
    Req/Sec   147.38k     6.66k  159.94k    64.50%
  Latency Distribution
     50%    0.93ms
     75%    1.04ms
     90%    1.17ms
     99%    1.32ms
  26394824 requests in 30.01s, 1.57GB read
Requests/sec: 879636.03
Transfer/sec:     53.69MB
[root@localhost ~]# wrk -c800 -t6 -d30 http://localhost:8080/ --latency
Running 30s test @ http://localhost:8080/
  6 threads and 800 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.92ms  227.16us  15.83ms   60.63%
    Req/Sec   144.33k     7.16k  160.47k    65.83%
  Latency Distribution
     50%    1.00ms
     75%    1.09ms
     90%    1.15ms
     99%    1.23ms
  25853073 requests in 30.01s, 1.54GB read
Requests/sec: 861582.11
Transfer/sec:     52.59MB
[root@localhost ~]# wrk -c800 -t6 -d30 http://localhost:8080/ --latency
Running 30s test @ http://localhost:8080/
  6 threads and 800 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.90ms  214.27us  19.25ms   55.80%
    Req/Sec   147.55k     6.99k  162.61k    75.17%
  Latency Distribution
     50%    0.96ms
     75%    1.06ms
     90%    1.15ms
     99%    1.24ms
  26428474 requests in 30.01s, 1.58GB read
Requests/sec: 880760.83
Transfer/sec:     53.76MB
top - 00:21:24 up 1 day,  1:07,  3 users,  load average: 1.78, 1.05, 0.50
Threads: 280 total,  16 running, 264 sleeping,   0 stopped,   0 zombie
%Cpu0  :  7.0 us,  8.4 sy,  0.0 ni, 81.2 id,  0.0 wa,  0.0 hi,  3.4 si,  0.0 st
%Cpu1  : 19.6 us, 43.3 sy,  0.0 ni, 18.6 id,  0.0 wa,  0.0 hi, 18.6 si,  0.0 st
%Cpu2  : 19.0 us, 44.3 sy,  0.0 ni, 16.3 id,  0.0 wa,  0.0 hi, 20.4 si,  0.0 st
%Cpu3  : 38.3 us, 36.9 sy,  0.0 ni,  4.0 id,  0.0 wa,  0.0 hi, 20.8 si,  0.0 st
%Cpu4  : 27.2 us, 31.3 sy,  0.0 ni, 23.5 id,  0.0 wa,  0.0 hi, 18.0 si,  0.0 st
%Cpu5  : 15.0 us, 51.7 sy,  0.0 ni, 11.6 id,  0.0 wa,  0.0 hi, 21.8 si,  0.0 st
%Cpu6  : 13.2 us, 35.9 sy,  0.0 ni, 35.3 id,  0.0 wa,  0.0 hi, 15.6 si,  0.0 st
%Cpu7  : 19.2 us, 32.3 sy,  0.0 ni, 34.7 id,  0.0 wa,  0.0 hi, 13.8 si,  0.0 st
%Cpu8  : 49.7 us, 33.4 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 16.9 si,  0.0 st
%Cpu9  : 15.0 us, 51.9 sy,  0.0 ni,  9.8 id,  0.0 wa,  0.0 hi, 23.3 si,  0.0 st
%Cpu10 : 53.2 us, 29.2 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 17.6 si,  0.0 st
%Cpu11 : 44.6 us, 36.6 sy,  0.0 ni,  1.7 id,  0.0 wa,  0.0 hi, 17.1 si,  0.0 st
%Cpu12 : 48.7 us, 32.3 sy,  0.0 ni,  0.3 id,  0.0 wa,  0.0 hi, 18.7 si,  0.0 st
%Cpu13 : 36.7 us, 39.1 sy,  0.0 ni,  4.4 id,  0.0 wa,  0.0 hi, 19.9 si,  0.0 st
%Cpu14 : 50.5 us, 30.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 18.9 si,  0.0 st
%Cpu15 : 51.5 us, 32.2 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 16.3 si,  0.0 st
KiB Mem : 32253728 total, 22531304 free,   825596 used,  8896828 buff/cache
KiB Swap: 16252924 total, 16252924 free,        0 used. 31008024 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                     
 93961 root      20   0  673284  17168   3060 R 99.9  0.1   0:11.76 pipy9                                       
 93966 root      20   0  673284  17168   3060 R 99.9  0.1   0:11.76 pipy9                                       
 93967 root      20   0  673284  17168   3060 R 99.9  0.1   0:11.76 pipy9                                       
 93962 root      20   0  673284  17168   3060 R 99.7  0.1   0:11.75 pipy9                                       
 93963 root      20   0  673284  17168   3060 R 99.7  0.1   0:11.75 pipy9                                       
 93964 root      20   0  673284  17168   3060 R 99.7  0.1   0:11.75 pipy9                                       
 93965 root      20   0  673284  17168   3060 R 99.7  0.1   0:11.75 pipy9                                       
 93968 root      20   0  673284  17168   3060 R 99.7  0.1   0:11.75 pipy9                                       
 93973 root      20   0  561720   7988   1384 R 95.0  0.0   0:10.71 wrk                                         
 93976 root      20   0  561720   7988   1384 R 95.0  0.0   0:10.72 wrk                                         
 93972 root      20   0  561720   7988   1384 R 94.7  0.0   0:10.81 wrk                                         
 93974 root      20   0  561720   7988   1384 R 94.4  0.0   0:10.67 wrk                                         
 93975 root      20   0  561720   7988   1384 R 94.4  0.0   0:10.67 wrk                                         
 93971 root      20   0  561720   7988   1384 R 91.7  0.0   0:10.82 wrk        

从AMD台式机测试结果:

注意,从pipy 8线程开始,测试结果中开始出现超过100万RPS的结果。

root@pve8:~# wrk -c800 -t8 -d20 http://10.10.6.1:8080/ --latency
Running 20s test @ http://10.10.6.1:8080/
  8 threads and 800 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.85ms  523.28us  52.97ms   91.47%
    Req/Sec   116.81k     9.25k  173.89k    76.25%
  Latency Distribution
     50%  789.00us
     75%    1.03ms
     90%    1.30ms
     99%    2.00ms
  18598311 requests in 20.03s, 1.11GB read
Requests/sec: 928379.69
Transfer/sec:     56.66MB
root@pve8:~# wrk -c800 -t8 -d20 http://10.10.6.1:8080/ --latency
Running 20s test @ http://10.10.6.1:8080/
  8 threads and 800 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   779.32us  515.44us  19.19ms   87.39%
    Req/Sec   124.69k     9.15k  179.50k    69.52%
  Latency Distribution
     50%  639.00us
     75%    0.99ms
     90%    1.30ms
     99%    1.76ms
  19864558 requests in 20.10s, 1.18GB read
Requests/sec: 988290.11
Transfer/sec:     60.32MB
root@pve8:~# wrk -c800 -t8 -d20 http://10.10.6.1:8080/ --latency
Running 20s test @ http://10.10.6.1:8080/
  8 threads and 800 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   752.40us  593.56us  46.36ms   91.47%
    Req/Sec   130.36k     9.02k  164.74k    70.50%
  Latency Distribution
     50%  573.00us
     75%    0.93ms
     90%    1.18ms
     99%    1.98ms
  20756097 requests in 20.03s, 1.24GB read
Requests/sec: 1036149.19
Transfer/sec:     63.24MB
top - 00:32:01 up 1 day,  1:17,  3 users,  load average: 1.64, 2.82, 1.86
Threads: 273 total,   9 running, 264 sleeping,   0 stopped,   0 zombie
%Cpu0  :  1.0 us,  1.0 sy,  0.0 ni, 98.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  : 37.3 us, 25.3 sy,  0.0 ni,  1.3 id,  0.0 wa,  0.0 hi, 36.0 si,  0.0 st
%Cpu2  : 58.7 us, 32.7 sy,  0.0 ni,  1.0 id,  0.0 wa,  0.0 hi,  7.7 si,  0.0 st
%Cpu3  : 52.5 us, 41.4 sy,  0.0 ni,  5.8 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
%Cpu4  : 12.4 us, 10.7 sy,  0.0 ni, 74.5 id,  0.0 wa,  0.0 hi,  2.3 si,  0.0 st
%Cpu5  : 43.5 us, 33.8 sy,  0.0 ni, 10.4 id,  0.0 wa,  0.0 hi, 12.4 si,  0.0 st
%Cpu6  :  2.3 us,  2.7 sy,  0.0 ni, 95.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  : 54.8 us, 30.2 sy,  0.0 ni,  2.7 id,  0.0 wa,  0.0 hi, 12.3 si,  0.0 st
%Cpu8  :  3.3 us,  2.9 sy,  0.0 ni, 92.6 id,  0.0 wa,  0.0 hi,  1.2 si,  0.0 st
%Cpu9  : 54.8 us, 33.2 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 12.0 si,  0.0 st
%Cpu10 : 44.5 us, 25.8 sy,  0.0 ni,  4.7 id,  0.0 wa,  0.0 hi, 25.1 si,  0.0 st
%Cpu11 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu12 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu13 :  6.8 us,  5.2 sy,  0.0 ni, 85.5 id,  0.0 wa,  0.0 hi,  2.4 si,  0.0 st
%Cpu14 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu15 : 36.6 us, 23.1 sy,  0.0 ni, 19.0 id,  0.0 wa,  0.0 hi, 21.4 si,  0.0 st
KiB Mem : 32253728 total, 22513244 free,   843416 used,  8897068 buff/cache
KiB Swap: 16252924 total, 16252924 free,        0 used. 30990204 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                     
 93963 root      20   0  673284  39556   3064 R 99.7  0.1   3:12.04 pipy9                                       
 93966 root      20   0  673284  39556   3064 R 99.7  0.1   3:12.06 pipy9                                       
 93961 root      20   0  673284  39556   3064 R 99.3  0.1   3:12.00 pipy9                                       
 93962 root      20   0  673284  39556   3064 R 99.3  0.1   3:12.03 pipy9                                       
 93964 root      20   0  673284  39556   3064 R 99.3  0.1   3:12.06 pipy9                                       
 93967 root      20   0  673284  39556   3064 R 99.3  0.1   3:12.01 pipy9                                       
 93968 root      20   0  673284  39556   3064 R 99.3  0.1   3:12.04 pipy9                                       
 93965 root      20   0  673284  39556   3064 R 99.0  0.1   3:12.04 pipy9    

TC5 : 10 Threads

可以观察到,在pipy 10线程时候,可以稳定实现超过100万RPS的处理能力;甚至第三次测试达到116万RPS。

root@pve8:~# wrk -c1000 -t10 -d20 http://10.10.6.1:8080/ --latency
Running 20s test @ http://10.10.6.1:8080/
  10 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.01ms    1.06ms  64.50ms   96.78%
    Req/Sec   106.25k    11.39k  143.87k    66.90%
  Latency Distribution
     50%    0.87ms
     75%    1.10ms
     90%    1.40ms
     99%    5.57ms
  21170346 requests in 20.08s, 1.26GB read
Requests/sec: 1054249.43
Transfer/sec:     64.35MB
root@pve8:~# wrk -c1000 -t10 -d20 http://10.10.6.1:8080/ --latency
Running 20s test @ http://10.10.6.1:8080/
  10 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.97ms    1.06ms  47.04ms   96.60%
    Req/Sec   111.95k    10.86k  153.71k    70.25%
  Latency Distribution
     50%  812.00us
     75%    1.04ms
     90%    1.36ms
     99%    5.98ms
  22303769 requests in 20.08s, 1.33GB read
Requests/sec: 1110522.81
Transfer/sec:     67.78MB
root@pve8:~# wrk -c1000 -t10 -d20 http://10.10.6.1:8080/ --latency
Running 20s test @ http://10.10.6.1:8080/
  10 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.93ms    1.18ms  59.22ms   96.78%
    Req/Sec   117.45k    12.51k  155.26k    68.40%
  Latency Distribution
     50%  766.00us
     75%    1.02ms
     90%    1.33ms
     99%    6.58ms
  23406622 requests in 20.08s, 1.40GB read
Requests/sec: 1165900.45
Transfer/sec:     71.16MB
top - 00:38:09 up 1 day,  1:24,  3 users,  load average: 1.68, 1.80, 1.72
Tasks: 248 total,   1 running, 247 sleeping,   0 stopped,   0 zombie
%Cpu0  :  7.3 us,  5.0 sy,  0.0 ni, 85.7 id,  0.0 wa,  0.0 hi,  2.0 si,  0.0 st
%Cpu1  : 35.6 us, 19.7 sy,  0.0 ni, 33.1 id,  0.0 wa,  0.0 hi, 11.6 si,  0.0 st
%Cpu2  : 33.0 us, 23.3 sy,  0.0 ni, 37.3 id,  0.0 wa,  0.0 hi,  6.3 si,  0.0 st
%Cpu3  : 41.6 us, 25.3 sy,  0.0 ni, 15.2 id,  0.0 wa,  0.0 hi, 17.9 si,  0.0 st
%Cpu4  : 13.1 us, 10.4 sy,  0.0 ni, 73.8 id,  0.0 wa,  0.0 hi,  2.7 si,  0.0 st
%Cpu5  : 29.9 us, 20.7 sy,  0.0 ni, 36.9 id,  0.0 wa,  0.0 hi, 12.5 si,  0.0 st
%Cpu6  :  8.3 us,  5.6 sy,  0.0 ni, 83.4 id,  0.0 wa,  0.0 hi,  2.7 si,  0.0 st
%Cpu7  : 53.8 us, 32.2 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 14.0 si,  0.0 st
%Cpu8  : 32.6 us, 26.7 sy,  0.0 ni, 34.4 id,  0.0 wa,  0.0 hi,  6.3 si,  0.0 st
%Cpu9  : 54.6 us, 31.5 sy,  0.0 ni,  0.7 id,  0.0 wa,  0.0 hi, 13.2 si,  0.0 st
%Cpu10 : 32.6 us, 21.9 sy,  0.0 ni, 41.6 id,  0.0 wa,  0.0 hi,  3.9 si,  0.0 st
%Cpu11 : 53.5 us, 32.2 sy,  0.0 ni,  0.3 id,  0.0 wa,  0.0 hi, 14.0 si,  0.0 st
%Cpu12 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu13 : 38.3 us, 25.9 sy,  0.0 ni, 28.5 id,  0.0 wa,  0.0 hi,  7.3 si,  0.0 st
%Cpu14 : 52.2 us, 34.3 sy,  0.0 ni,  1.7 id,  0.0 wa,  0.0 hi, 11.8 si,  0.0 st
%Cpu15 : 51.5 us, 32.6 sy,  0.0 ni,  1.0 id,  0.0 wa,  0.0 hi, 15.0 si,  0.0 st
KiB Mem : 32253728 total, 22524340 free,   831944 used,  8897444 buff/cache
KiB Swap: 16252924 total, 16252924 free,        0 used. 31001676 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                    
 94022 root      20   0  820748  30468   3064 S 990.1  0.1   5:14.37 pipy9    

TC6 : 12 Threads

可以观察到,在12线程时候,wrk的结果开始出现Socket errors,主要是因为运行wrk的AMD台式机只有12线程。同时可以观察到,在12线程时候,RPS可以稳定在120万以上。

root@pve8:~# wrk -c1200 -t12 -d20 http://10.10.6.1:8080/ --latency
Running 20s test @ http://10.10.6.1:8080/
  12 threads and 1200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.89ms    1.22ms  28.10ms   96.60%
    Req/Sec   111.20k    32.79k  167.96k    89.23%
  Latency Distribution
     50%  722.00us
     75%    0.88ms
     90%    1.11ms
     99%    7.20ms
  24400093 requests in 20.10s, 1.45GB read
  Socket errors: connect 191, read 0, write 0, timeout 0
Requests/sec: 1214173.16
Transfer/sec:     74.11MB
root@pve8:~# wrk -c1200 -t12 -d20 http://10.10.6.1:8080/ --latency
Running 20s test @ http://10.10.6.1:8080/
  12 threads and 1200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.99ms    1.54ms  34.87ms   95.07%
    Req/Sec   101.08k    26.68k  166.28k    65.23%
  Latency Distribution
     50%  703.00us
     75%    0.90ms
     90%    1.18ms
     99%    9.06ms
  24176741 requests in 20.09s, 1.44GB read
  Socket errors: connect 191, read 0, write 0, timeout 0
Requests/sec: 1203599.89
Transfer/sec:     73.46MB
root@pve8:~# wrk -c1200 -t12 -d20 http://10.10.6.1:8080/ --latency
Running 20s test @ http://10.10.6.1:8080/
  12 threads and 1200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.97ms    1.49ms  42.10ms   95.13%
    Req/Sec   102.44k    27.64k  163.32k    74.67%
  Latency Distribution
     50%  697.00us
     75%    0.88ms
     90%    1.15ms
     99%    9.04ms
  24497260 requests in 20.10s, 1.46GB read
  Socket errors: connect 191, read 0, write 0, timeout 0
Requests/sec: 1219031.76
Transfer/sec:     74.40MB
top - 00:41:10 up 1 day,  1:27,  2 users,  load average: 2.29, 1.91, 1.79
Tasks: 246 total,   1 running, 245 sleeping,   0 stopped,   0 zombie
%Cpu0  : 40.6 us, 21.5 sy,  0.0 ni, 26.8 id,  0.0 wa,  0.0 hi, 11.1 si,  0.0 st
%Cpu1  : 48.3 us, 31.3 sy,  0.0 ni,  3.7 id,  0.0 wa,  0.0 hi, 16.7 si,  0.0 st
%Cpu2  : 42.3 us, 26.6 sy,  0.0 ni, 25.9 id,  0.0 wa,  0.0 hi,  5.2 si,  0.0 st
%Cpu3  : 45.2 us, 31.0 sy,  0.0 ni, 10.2 id,  0.0 wa,  0.0 hi, 13.6 si,  0.0 st
%Cpu4  : 17.5 us, 12.7 sy,  0.0 ni, 64.6 id,  0.0 wa,  0.0 hi,  5.2 si,  0.0 st
%Cpu5  : 41.3 us, 27.4 sy,  0.0 ni, 18.4 id,  0.0 wa,  0.0 hi, 12.8 si,  0.0 st
%Cpu6  : 33.6 us, 17.4 sy,  0.0 ni, 40.9 id,  0.0 wa,  0.0 hi,  8.1 si,  0.0 st
%Cpu7  : 51.9 us, 29.6 sy,  0.0 ni,  6.1 id,  0.0 wa,  0.0 hi, 12.5 si,  0.0 st
%Cpu8  : 40.0 us, 24.6 sy,  0.0 ni, 19.3 id,  0.0 wa,  0.0 hi, 16.1 si,  0.0 st
%Cpu9  : 55.7 us, 30.0 sy,  0.0 ni,  1.0 id,  0.0 wa,  0.0 hi, 13.3 si,  0.0 st
%Cpu10 : 33.7 us, 20.4 sy,  0.0 ni, 36.9 id,  0.0 wa,  0.0 hi,  9.0 si,  0.0 st
%Cpu11 : 54.3 us, 30.5 sy,  0.0 ni,  2.0 id,  0.0 wa,  0.0 hi, 13.2 si,  0.0 st
%Cpu12 :  1.0 us,  0.7 sy,  0.0 ni, 98.0 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
%Cpu13 : 39.3 us, 27.9 sy,  0.0 ni, 24.5 id,  0.0 wa,  0.0 hi,  8.3 si,  0.0 st
%Cpu14 : 51.4 us, 31.4 sy,  0.0 ni,  5.1 id,  0.0 wa,  0.0 hi, 12.2 si,  0.0 st
%Cpu15 : 43.9 us, 28.7 sy,  0.0 ni, 12.2 id,  0.0 wa,  0.0 hi, 15.2 si,  0.0 st
KiB Mem : 32253728 total, 22529704 free,   826476 used,  8897548 buff/cache
KiB Swap: 16252924 total, 16252924 free,        0 used. 31007172 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                    
 94036 root      20   0  968212  26276   3060 S  1172  0.1   3:00.04 pipy9            

TC7 :16 Threads

最后我们把pipy的线程数设置成CPU最大线程数。从TC6我们知道,wrk所在的AMD主机已经接近性能极限,但是我们还是试下更极限的测试~结果是在wrk主机负载饱和时,pipy的吞吐量稳定在130万RPS之上。

root@pve8:~# wrk -c1600 -t10 -d20 http://10.10.6.1:8080/ --latency
Running 20s test @ http://10.10.6.1:8080/
  10 threads and 1600 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   667.16us    0.91ms  48.94ms   98.03%
    Req/Sec   133.92k    22.06k  178.08k    81.39%
  Latency Distribution
     50%  571.00us
     75%  725.00us
     90%    0.88ms
     99%    4.01ms
  26672324 requests in 20.09s, 1.59GB read
  Socket errors: connect 589, read 0, write 0, timeout 0
Requests/sec: 1327589.21
Transfer/sec:     81.03MB
root@pve8:~# wrk -c1600 -t10 -d20 http://10.10.6.1:8080/ --latency
Running 20s test @ http://10.10.6.1:8080/
  10 threads and 1600 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   611.91us  733.13us  28.98ms   98.19%
    Req/Sec   134.61k    28.27k  195.17k    84.54%
  Latency Distribution
     50%  498.00us
     75%  721.00us
     90%    0.92ms
     99%    2.64ms
  26798275 requests in 20.04s, 1.60GB read
  Socket errors: connect 589, read 0, write 0, timeout 0
Requests/sec: 1337142.66
Transfer/sec:     81.61MB
root@pve8:~# wrk -c1600 -t10 -d20 http://10.10.6.1:8080/ --latency
Running 20s test @ http://10.10.6.1:8080/
  10 threads and 1600 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   671.95us    0.89ms  34.32ms   97.99%
    Req/Sec   131.52k    27.67k  187.85k    78.93%
  Latency Distribution
     50%  572.00us
     75%  741.00us
     90%    0.89ms
     99%    4.11ms
  26185447 requests in 20.08s, 1.56GB read
  Socket errors: connect 589, read 0, write 0, timeout 0
Requests/sec: 1303802.77
Transfer/sec:     79.58MB
top - 00:52:04 up 1 day,  1:37,  2 users,  load average: 6.93, 3.79, 2.74
Tasks: 247 total,   1 running, 246 sleeping,   0 stopped,   0 zombie
%Cpu0  : 52.2 us, 29.2 sy,  0.0 ni,  4.4 id,  0.0 wa,  0.0 hi, 14.2 si,  0.0 st
%Cpu1  : 46.0 us, 30.9 sy,  0.0 ni,  4.0 id,  0.0 wa,  0.0 hi, 19.1 si,  0.0 st
%Cpu2  : 46.6 us, 32.2 sy,  0.0 ni,  3.0 id,  0.0 wa,  0.0 hi, 18.1 si,  0.0 st
%Cpu3  : 46.3 us, 31.2 sy,  0.0 ni,  3.0 id,  0.0 wa,  0.0 hi, 19.5 si,  0.0 st
%Cpu4  : 48.1 us, 30.6 sy,  0.0 ni,  3.4 id,  0.0 wa,  0.0 hi, 17.8 si,  0.0 st
%Cpu5  : 47.7 us, 31.0 sy,  0.0 ni,  2.0 id,  0.0 wa,  0.0 hi, 19.3 si,  0.0 st
%Cpu6  : 53.2 us, 30.6 sy,  0.0 ni,  3.0 id,  0.0 wa,  0.0 hi, 13.1 si,  0.0 st
%Cpu7  : 51.4 us, 31.4 sy,  0.0 ni,  2.7 id,  0.0 wa,  0.0 hi, 14.5 si,  0.0 st
%Cpu8  : 45.8 us, 31.6 sy,  0.0 ni,  3.0 id,  0.0 wa,  0.0 hi, 19.5 si,  0.0 st
%Cpu9  : 54.4 us, 28.0 sy,  0.0 ni,  4.1 id,  0.0 wa,  0.0 hi, 13.5 si,  0.0 st
%Cpu10 : 47.0 us, 30.5 sy,  0.0 ni,  4.4 id,  0.0 wa,  0.0 hi, 18.1 si,  0.0 st
%Cpu11 : 50.9 us, 30.4 sy,  0.0 ni,  5.1 id,  0.0 wa,  0.0 hi, 13.7 si,  0.0 st
%Cpu12 : 52.0 us, 31.8 sy,  0.0 ni,  2.7 id,  0.0 wa,  0.0 hi, 13.5 si,  0.0 st
%Cpu13 : 47.0 us, 31.1 sy,  0.0 ni,  3.4 id,  0.0 wa,  0.0 hi, 18.6 si,  0.0 st
%Cpu14 : 52.5 us, 30.5 sy,  0.0 ni,  2.7 id,  0.0 wa,  0.0 hi, 14.2 si,  0.0 st
%Cpu15 : 45.8 us, 31.6 sy,  0.0 ni,  2.4 id,  0.0 wa,  0.0 hi, 20.2 si,  0.0 st
KiB Mem : 32253728 total, 22509992 free,   846128 used,  8897608 buff/cache
KiB Swap: 16252924 total, 16252924 free,        0 used. 30987524 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                    
 94055 root      20   0 1263272  46092   3064 S  1531  0.1  18:29.62 pipy9    

总结

这是一个简单的HTTP1.1 benchmark测试,从测试结果我们可以看到如下结论:

  • 随着pipy线程数的增加,吞吐量RPS接近线性的增长
  • 在给定的Intel处理器环境下,pipy在8线程时候出现100万RPS的测试结果;并在10线程时候稳定在100万RPS以上
  • 在整个测试过程中,wrk所记录的P50、P75、P90、P99的数据分布稳定,而且波动较小,几乎没有观测到长尾现象
  • 在整个测试过程中,pipy所占用的内存稳定,基本可以判断在测试中无内存泄漏