Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDP relay 传输性能随着发送速率增加极速下降 #877

Merged
merged 1 commit into from Jun 28, 2022

Conversation

wangjian1009
Copy link
Contributor

测试环境: 两台笔记本,连接在同一个wifi下

iperf3 -s 以及 ss-server 运行在一台电脑
iperf3 -s 以及 ss-local 运行在一台电脑上

测试命令: iperf3 -c 127.0.0.1 -p 9700 -b 100M -u -l 1260
随着 -b 参数的改变,传输速率极速下降

iperf3直连(不走代理)

-b c->s bps c->s size c->s lost s->c bps s->c size s->c lost
100M 100M 119M 0% 8M 9M 1%
200M 107M 128M 0% 8M 9M 1%
400M 111M 132M 0% 8M 9M 1%
800M 113M 134M 0% 8M 9M 1%

ss-rust

-b c->s bps c->s size c->s lost s->c bps s->c size s->c lost
100M 100M 119M 0% 4M 5M 1%
200M 200M 238M 0% 593K 732K 1%
400M 400M 477M 0% 666K 826K 1%
800M 800M 954M 0% 602K 747K 1%

ss-libev

-b c->s bps c->s size c->s lost s->c bps s->c size s->c lost
100M 100M 119M 0% 15M 18M 1%
200M 200M 238M 0% 7M 9M 1%
400M 400M 477M 0% 7M 9M 1%
800M 800M 954M 0% 7M 9M 1%

@zonyitoo
Copy link
Collaborator

zonyitoo commented Jun 28, 2022

  1. Which version are you using?
  2. How to reproduce, which method are you using?
  3. How about CPU consumption during tests?

Here are some benchmarks: shadowsocks/shadowsocks-org#194 (comment)

@wangjian1009
Copy link
Contributor Author

我判断是tokio任务调度的问题,在上行的udp转发任务特别多的情况下,任务调度分布不均匀。
我做了提交的两处修改以后,现象得到极大的改善

t-bps c->s bps c->s size c->s lost s->c bps s->c size s->c lost
100M 100M 119M 0% 15M 18M 1%
200M 200M 238M 0% 9M 11M 0%
400M 400M 477M 0% 2M 2M 1%
800M 800M 954M 0% 2M 2M 1%

@zonyitoo zonyitoo merged commit fd4f1f1 into shadowsocks:master Jun 28, 2022
@zonyitoo
Copy link
Collaborator

zonyitoo commented Jun 28, 2022

https://github.com/shadowsocks/shadowsocks-rust/blob/master/crates/shadowsocks-service/src/server/udprelay.rs#L166

@wangjian1009 Please try again by shrinking the number of UDP recv tasks to half of the CPU cores.

@wangjian1009
Copy link
Contributor Author

在单机环境上现象更明显,以下测试 iperf3 -siperf3 -sss-serverss-local 都运行在同一台机器

设备: MacBookPro
芯片:Apple M1 Pro
核总数:10(8性能和2能效)
内存:32 GB
ss-rust: v1.15.0-alpha.5 (从master代码编译)
ss-libev: 3.3.5 (brew 安装版本)
method: aes-256-gcm

  1. 选择 20Gbps 已经跑到了 iperf3 的性能极限
  2. rust 的ss-local UDP 转发有问题
  3. 由于性能问题在 local,server端增加 worker 数量数据上没有改善
  4. socs5.rs udprelay.rs 中增加yield的方案是无效的
svr-app svr-version svr-worker cli-app cli-version protocol t-bps s->c bps s->c size s->c lost CPU state
direct 20G 5G 5G 0% 3 core 100%
libev origin rust origin-v1.15.0-alpha.5 ss 20G 1M 2M 1% 3 core 90% / 2 core 50%
libev origin libev origin ss 20G 361M 430M 1% 4 core 100% / 4 core 50%
rust origin-v1.15.0-alpha.5 rust origin-v1.15.0-alpha.5 ss 20G 1M 1M 0% 3 core 90% / 2 core 50%
rust origin-v1.15.0-alpha.5 2 libev origin ss 20G 344M 410M 1% 4 core 100%
rust origin-v1.15.0-alpha.5 4 libev origin ss 20G 352M 420M 1% 4 core 100%
rust origin-v1.15.0-alpha.5 2 rust origin-v1.15.0-alpha.5 ss 20G 1M 1M 0% 3 core 90% / 2 core 50%
rust origin-v1.15.0-alpha.5 4 rust origin-v1.15.0-alpha.5 ss 20G 1M 1M 1% 3 core 90% / 2 core 50%
rust origin-v1.15.0-alpha.5-yield rust origin-v1.15.0-alpha.5-yield ss 20G 1M 1M 1% 3 core 90% / 2 core 50%

@zonyitoo
Copy link
Collaborator

zonyitoo commented Jun 29, 2022

How can you use iperf3 with sslocal's SOCKS5 frontend?

Why it doesn't match the result of our previous test: shadowsocks/shadowsocks-org#194 (comment)

@wangjian1009
Copy link
Contributor Author

是我错了,使用iperf3测试的场景下,使用的是tunnel接口
在 tunnel/udprelay.rs 中增加yield以后,有一定的改善,

svr-app svr-version svr-worker cli-app cli-version protocol t-bps s->c bps s->c size s->c lost CPU state
direct 20G 5G 6G 0% 3 core 100%
libev origin libev origin ss 20G 378M 451M 1% 4 core 100% / 4 core 50%
libev origin rust origin-v1.15.0-alpha.5 ss 20G 1M 1M 0% 3 core 90% / 2 core 50%
rust origin-v1.15.0-alpha.5 rust origin-v1.15.0-alpha.5 ss 20G 1M 2M 1% 3 core 90% / 2 core 50%
rust origin-v1.15.0-alpha.5 2 libev origin ss 20G 355M 423M 1% 4 core 100%
rust origin-v1.15.0-alpha.5 4 libev origin ss 20G 345M 412M 1% 4 core 100%
rust origin-v1.15.0-alpha.5 2 rust origin-v1.15.0-alpha.5 ss 20G 2M 2M 1% 3 core 90% / 2 core 50%
rust origin-v1.15.0-alpha.5 4 rust origin-v1.15.0-alpha.5 ss 20G 1M 1M 0% 3 core 90% / 2 core 50%
rust origin-v1.15.0-alpha.5-yield rust origin-v1.15.0-alpha.5-yield ss 20G 105M 125M 1% 3 core 90% / 2 core 50%

@wangjian1009
Copy link
Contributor Author

iperf3 直接进行udp测试都无法达到文档中的能力

关于这个表现,我常识增加并行的流进行测试,结果在下表,得出的结论是

  1. 当有多个流发送数据时,是否增加 yeild 调用的区别消失了
  2. iperf3 测试工具在增加并行的情况下,性能就开始下降了
  3. ss-rust 性能还是和 ss-libev 性能有差别
svr-app svr-version cli-app cli-version protocol parallel t-bps s->c bps s->c size s->c lost
direct 2G 2G 2G 0%
direct 10 2G 725M 864M 0%
direct 100 2G 67M 80M 0%
libev origin libev origin ss 2G 380M 453M 1%
libev origin libev origin ss 10 2G 31M 37M 1%
libev origin libev origin ss 100 2G 1M 1M 1%
rust origin-v1.15.0-alpha.5 rust origin-v1.15.0-alpha.5 ss 2G 109M 131M 1%
rust origin-v1.15.0-alpha.5 rust origin-v1.15.0-alpha.5 ss 10 2G 1M 1M 1%
rust origin-v1.15.0-alpha.5 rust origin-v1.15.0-alpha.5 ss 100 2G 978K 1M 1%
rust origin-v1.15.0-alpha.5-yield rust origin-v1.15.0-alpha.5-yield ss 2G 110M 131M 1%
rust origin-v1.15.0-alpha.5-yield rust origin-v1.15.0-alpha.5-yield ss 10 2G 9M 11M 1%
rust origin-v1.15.0-alpha.5-yield rust origin-v1.15.0-alpha.5-yield ss 100 2G 798K 975K 1%

@database64128
Copy link
Contributor

I also cannot reproduce your results.

Platform: Linux 5.18.6
shadowsocks-rust: Built from latest commit on master.
Assumed MTU: 1500

ssserver config:

{
    "server": "::",
    "server_port": 30000,
    "method": "2022-blake3-aes-256-gcm",
    "password": "rGoUo0rmtp+p2rQQRSbkPkXzm5kQsEHaHJjiXDbYMG0=",
    "mode": "tcp_and_udp",
    "no_delay": true,
    "ipv6_first": true,
    "fast_open": true
}

sslocal config:

{
    "locals": [
        {
            "local_address": "::",
            "local_port": 30001,
            "protocol": "tunnel",
            "forward_address": "::1",
            "forward_port": 5201
        }
    ],
    "servers": [
        {
            "name": "localhost-test",
            "address": "::1",
            "port": 30000,
            "password": "rGoUo0rmtp+p2rQQRSbkPkXzm5kQsEHaHJjiXDbYMG0=",
            "method": "2022-blake3-aes-256-gcm"
        }
    ],
    "mode": "tcp_and_udp",
    "no_delay": true,
    "ipv6_first": true,
    "fast_open": true
}

Downlink test result:

❯ iperf3 -c ::1 -p 30001 -Rub 0 -l 1382
Connecting to host ::1, port 30001
Reverse mode, remote host ::1 is sending
[  5] local ::1 port 51523 connected to ::1 port 30001
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-1.00   sec   111 MBytes   934 Mbits/sec  0.012 ms  152792/237250 (64%)  
[  5]   1.00-2.00   sec   121 MBytes  1.02 Gbits/sec  0.044 ms  133863/225845 (59%)  
[  5]   2.00-3.00   sec  98.3 MBytes   824 Mbits/sec  0.018 ms  149654/224216 (67%)  
[  5]   3.00-4.00   sec   119 MBytes   998 Mbits/sec  0.030 ms  147974/238236 (62%)  
[  5]   4.00-5.00   sec   112 MBytes   944 Mbits/sec  0.015 ms  142838/228194 (63%)  
[  5]   5.00-6.00   sec   106 MBytes   885 Mbits/sec  0.014 ms  159582/239642 (67%)  
[  5]   6.00-7.00   sec   101 MBytes   850 Mbits/sec  0.012 ms  154972/231845 (67%)  
[  5]   7.00-8.00   sec   117 MBytes   981 Mbits/sec  0.012 ms  145861/234625 (62%)  
[  5]   8.00-9.00   sec   126 MBytes  1.06 Gbits/sec  0.013 ms  139377/235085 (59%)  
[  5]   9.00-10.00  sec  98.4 MBytes   826 Mbits/sec  0.038 ms  156356/231031 (68%)  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.04  sec  3.01 GBytes  2.57 Gbits/sec  0.000 ms  0/2337110 (0%)  sender
[  5]   0.00-10.00  sec  1.08 GBytes   932 Mbits/sec  0.038 ms  1483269/2325969 (64%)  receiver

iperf Done.

Uplink test result:

❯ iperf3 -c ::1 -p 30001 -ub 0 -l 1390
Connecting to host ::1, port 30001
[  5] local ::1 port 55916 connected to ::1 port 30001
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec   250 MBytes  2.10 Gbits/sec  188510  
[  5]   1.00-2.00   sec   279 MBytes  2.34 Gbits/sec  210560  
[  5]   2.00-3.00   sec   276 MBytes  2.32 Gbits/sec  208300  
[  5]   3.00-4.00   sec   277 MBytes  2.33 Gbits/sec  209250  
[  5]   4.00-5.00   sec   285 MBytes  2.39 Gbits/sec  215070  
[  5]   5.00-6.00   sec   283 MBytes  2.38 Gbits/sec  213600  
[  5]   6.00-7.00   sec   281 MBytes  2.36 Gbits/sec  211840  
[  5]   7.00-8.00   sec   305 MBytes  2.56 Gbits/sec  230110  
[  5]   8.00-9.00   sec   275 MBytes  2.31 Gbits/sec  207550  
[  5]   9.00-10.00  sec   269 MBytes  2.26 Gbits/sec  202890  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec  2.72 GBytes  2.33 Gbits/sec  0.000 ms  0/2097680 (0%)  sender
[  5]   0.00-10.00  sec   913 MBytes   766 Mbits/sec  0.051 ms  1407338/2095857 (67%)  receiver

iperf Done.

Assuming the MTU is 1500, download speed can reach 932Mbps, upload speed is 766Mbps.

@wangjian1009
Copy link
Contributor Author

ssserver -s 0.0.0.0:9701 -U -m aes-256-gcm -k 123456
sslocal -b 0.0.0.0:9700 -U --protocol=tunnel --forward-addr=127.0.0.1:5201 -s 127.0.0.1:9701 -m aes-256-gcm -k 123456

iperf3 -c 127.0.0.1 -p 9700 -b 0G -u -l 1260

Connecting to host 127.0.0.1, port 9700
[ 5] local 127.0.0.1 port 56378 connected to 127.0.0.1 port 9700
[ ID] Interval Transfer Bitrate Total Datagrams
[ 5] 0.00-1.00 sec 746 MBytes 6.25 Gbits/sec 620480
[ 5] 1.00-2.00 sec 674 MBytes 5.65 Gbits/sec 560750
[ 5] 2.00-3.00 sec 796 MBytes 6.67 Gbits/sec 662080
[ 5] 3.00-4.00 sec 836 MBytes 7.02 Gbits/sec 696000
[ 5] 4.00-5.00 sec 828 MBytes 6.94 Gbits/sec 688830
[ 5] 5.00-6.00 sec 759 MBytes 6.37 Gbits/sec 631710
[ 5] 6.00-7.00 sec 734 MBytes 6.16 Gbits/sec 610670
[ 5] 7.00-8.00 sec 766 MBytes 6.42 Gbits/sec 637390
[ 5] 8.00-9.00 sec 812 MBytes 6.81 Gbits/sec 675950
[ 5] 9.00-10.00 sec 788 MBytes 6.61 Gbits/sec 656030


[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-10.00 sec 7.56 GBytes 6.49 Gbits/sec 0.000 ms 0/6439890 (0%) sender
[ 5] 0.00-10.16 sec 1.25 MBytes 1.03 Mbits/sec 0.147 ms 523/1561 (34%) receiver
iperf Done.

@database64128
Copy link
Contributor

Something's wrong with your test environment. What's your iperf3 version?

iperf3 --version
iperf 3.11 (cJSON 1.7.13)
Linux localhost.localdomain 5.18.6-1-default #1 SMP PREEMPT_DYNAMIC Thu Jun 23 05:46:18 UTC 2022 (5aa0763) x86_64
Optional features available: CPU affinity setting, IPv6 flow label, TCP congestion algorithm setting, sendfile / zerocopy, socket pacing, bind to device, support IPv4 don't fragment

@wangjian1009
Copy link
Contributor Author

iperf 3.11 (cJSON 1.7.13)
Darwin MacBook-Pro 21.5.0 Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:37 PDT 2022; root:xnu-8020.121.3~4/RELEASE_ARM64_T6000 arm64
Optional features available: sendfile / zerocopy, authentication, support IPv4 don't fragment

@wangjian1009
Copy link
Contributor Author

Maybe ss-rust build configuration error, what`s you build script?

@database64128
Copy link
Contributor

I have the environment variable RUSTFLAGS globally set to -Ctarget-cpu=native in my shell, and I build shadowsocks-rust with cargo build --release --bin ssservice --bin ssurl --features aead-cipher-2022-extra.

@database64128
Copy link
Contributor

Just came across this article today: https://tokio.rs/blog/2020-04-preemption

Automatic cooperative task yielding was added in tokio-rs/tokio#2160 and released in v0.2.14.

@wangjian1009 Can you still reproduce this issue? If so, I think it's likely to be a bug in the tokio runtime on your specific platform (macOS ARM64).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants