Skip to content
This repository has been archived by the owner on Apr 18, 2024. It is now read-only.

Out-of-tree MPTCP uses only 8 interfaces out of 16 #406

Closed
arter97 opened this issue Jan 27, 2021 · 12 comments
Closed

Out-of-tree MPTCP uses only 8 interfaces out of 16 #406

arter97 opened this issue Jan 27, 2021 · 12 comments

Comments

@arter97
Copy link
Contributor

arter97 commented Jan 27, 2021

Possibly related to #128 but the description and comments don't seem to quite match with what I'm seeing.

We recently had the opportunity to upgrade the server environment from 8 Ethernet ports to 16, but MPTCP doesn’t scale beyond 8 interfaces.

As the server has real users/clients, it’s quite hard to conduct experiments on the server so I created 2 VMs to replicate the issue. The same issue happens on the VM as well.

VM 1 has 17 virtio NICs(eth0-16), each throttled to 30 Mbps.
VM 2 has 1 virtio NIC(eth0), unthrottled.

VM 1:

# ifconfig|grep 'eth[0-9]\|192'
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.122.216  netmask 255.255.255.0  broadcast 192.168.122.255
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.122.221  netmask 255.255.255.0  broadcast 192.168.122.255
eth2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.122.222  netmask 255.255.255.0  broadcast 192.168.122.255
eth3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.122.223  netmask 255.255.255.0  broadcast 192.168.122.255
eth4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.122.224  netmask 255.255.255.0  broadcast 192.168.122.255
eth5: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.122.225  netmask 255.255.255.0  broadcast 192.168.122.255
eth6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.122.226  netmask 255.255.255.0  broadcast 192.168.122.255
eth7: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.122.227  netmask 255.255.255.0  broadcast 192.168.122.255
eth8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.122.228  netmask 255.255.255.0  broadcast 192.168.122.255
eth9: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.122.229  netmask 255.255.255.0  broadcast 192.168.122.255
eth10: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.122.236  netmask 255.255.255.0  broadcast 192.168.122.255
eth11: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.122.237  netmask 255.255.255.0  broadcast 192.168.122.255
eth12: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.122.238  netmask 255.255.255.0  broadcast 192.168.122.255
eth13: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.122.239  netmask 255.255.255.0  broadcast 192.168.122.255
eth14: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.122.240  netmask 255.255.255.0  broadcast 192.168.122.255
eth15: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.122.241  netmask 255.255.255.0  broadcast 192.168.122.255
eth16: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.122.242  netmask 255.255.255.0  broadcast 192.168.122.255

VM 2:

# ifconfig|grep 'eth[0-9]\|192'
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.122.211  netmask 255.255.255.0  broadcast 192.168.122.255

VM 1 initiates MPTCP connection via SSH to VM 2:

# ssh arter97@192.168.122.211 cat /dev/urandom | pv > /dev/null
 211MiB 0:00:08 [27.0MiB/s] [                 <=>                                             ]

For some reason, MPTCP uses eth0,1,10,11,12,13,14,15 but nothing else.
(Checked via ifconfig’s TX packets usage)

The issue happens on both mptcp_v0.95(Linux v4.19) and mptcp_trunk(Linux v5.4).
Linux v5.10’s MPTCP v1 uses only 1 interface(eth0) and the performance is capped at 3.41 MiB/s.

Here’s the relevant kernel configs:

CONFIG_MPTCP=y
CONFIG_MPTCP_PM_ADVANCED=y
CONFIG_MPTCP_FULLMESH=y
CONFIG_MPTCP_NDIFFPORTS=y
CONFIG_MPTCP_BINDER=y
CONFIG_MPTCP_NETLINK=y
CONFIG_DEFAULT_MPTCP_PM="fullmesh"
CONFIG_MPTCP_SCHED_ADVANCED=y
# CONFIG_MPTCP_BLEST is not set
CONFIG_MPTCP_ROUNDROBIN=y
CONFIG_MPTCP_REDUNDANT=y
# CONFIG_MPTCP_ECF is not set
CONFIG_DEFAULT_MPTCP_SCHED="default"
# grep . /proc/sys/net/mptcp/*
/proc/sys/net/mptcp/mptcp_checksum:1
/proc/sys/net/mptcp/mptcp_debug:1
/proc/sys/net/mptcp/mptcp_enabled:1
/proc/sys/net/mptcp/mptcp_path_manager:fullmesh
/proc/sys/net/mptcp/mptcp_scheduler:default
/proc/sys/net/mptcp/mptcp_syn_retries:3
/proc/sys/net/mptcp/mptcp_version:0

Here are logs after turning on mptcp_debug.
VM 1:

[ 1410.105894] mptcp_alloc_mpcb: created mpcb with token 0x17fc0de1
[ 1410.106836] mptcp_add_sock: token 0x17fc0de1 pi 1, src_addr:192.168.122.216:50656 dst_addr:192.168.122.211:22
[ 1410.108194] mptcp_add_sock: token 0x17fc0de1 pi 2, src_addr:0.0.0.0:0 dst_addr:0.0.0.0:0
[ 1410.109259] __mptcp_init4_subsockets: token 0x17fc0de1 pi 2 src_addr:192.168.122.241:0 dst_addr:192.168.122.211:22 ifidx: 17
[ 1410.110444] mptcp_add_sock: token 0x17fc0de1 pi 3, src_addr:0.0.0.0:0 dst_addr:0.0.0.0:0
[ 1410.112260] __mptcp_init4_subsockets: token 0x17fc0de1 pi 3 src_addr:192.168.122.221:0 dst_addr:192.168.122.211:22 ifidx: 3
[ 1410.113987] mptcp_add_sock: token 0x17fc0de1 pi 4, src_addr:0.0.0.0:0 dst_addr:0.0.0.0:0
[ 1410.115746] __mptcp_init4_subsockets: token 0x17fc0de1 pi 4 src_addr:192.168.122.236:0 dst_addr:192.168.122.211:22 ifidx: 12
[ 1410.116987] mptcp_add_sock: token 0x17fc0de1 pi 5, src_addr:0.0.0.0:0 dst_addr:0.0.0.0:0
[ 1410.118232] __mptcp_init4_subsockets: token 0x17fc0de1 pi 5 src_addr:192.168.122.237:0 dst_addr:192.168.122.211:22 ifidx: 13
[ 1410.119481] mptcp_add_sock: token 0x17fc0de1 pi 6, src_addr:0.0.0.0:0 dst_addr:0.0.0.0:0
[ 1410.120748] __mptcp_init4_subsockets: token 0x17fc0de1 pi 6 src_addr:192.168.122.238:0 dst_addr:192.168.122.211:22 ifidx: 14
[ 1410.122033] mptcp_add_sock: token 0x17fc0de1 pi 7, src_addr:0.0.0.0:0 dst_addr:0.0.0.0:0
[ 1410.123242] __mptcp_init4_subsockets: token 0x17fc0de1 pi 7 src_addr:192.168.122.239:0 dst_addr:192.168.122.211:22 ifidx: 15
[ 1410.124547] mptcp_add_sock: token 0x17fc0de1 pi 8, src_addr:0.0.0.0:0 dst_addr:0.0.0.0:0
[ 1410.125695] __mptcp_init4_subsockets: token 0x17fc0de1 pi 8 src_addr:192.168.122.240:0 dst_addr:192.168.122.211:22 ifidx: 16

SSH Process ^C

[ 1417.701211] mptcp_close: Close of meta_sk with tok 0x17fc0de1
[ 1417.702439] mptcp_del_sock: Removing subsock tok 0x17fc0de1 pi:8 state 7 is_meta? 0
[ 1417.703854] mptcp_del_sock: Removing subsock tok 0x17fc0de1 pi:7 state 7 is_meta? 0
[ 1417.704928] mptcp_del_sock: Removing subsock tok 0x17fc0de1 pi:4 state 7 is_meta? 0
[ 1417.706020] mptcp_del_sock: Removing subsock tok 0x17fc0de1 pi:6 state 7 is_meta? 0
[ 1417.707097] mptcp_del_sock: Removing subsock tok 0x17fc0de1 pi:2 state 7 is_meta? 0
[ 1417.708414] mptcp_del_sock: Removing subsock tok 0x17fc0de1 pi:3 state 7 is_meta? 0
[ 1417.709302] mptcp_del_sock: Removing subsock tok 0x17fc0de1 pi:1 state 7 is_meta? 0
[ 1417.710163] mptcp_del_sock: Removing subsock tok 0x17fc0de1 pi:5 state 7 is_meta? 0
[ 1417.711122] mptcp_sock_destruct destroying meta-sk token 0x17fc0de1

VM 2:

[ 1465.399436] mptcp_alloc_mpcb: created mpcb with token 0x735227f5
[ 1465.399525] mptcp_add_sock: token 0x735227f5 pi 1, src_addr:192.168.122.211:22 dst_addr:192.168.122.216:50656
[ 1465.405560] mptcp_add_sock: token 0x735227f5 pi 2, src_addr:192.168.122.211:22 dst_addr:192.168.122.241:44461
[ 1465.408522] mptcp_add_sock: token 0x735227f5 pi 3, src_addr:192.168.122.211:22 dst_addr:192.168.122.221:52675
[ 1465.411203] mptcp_add_sock: token 0x735227f5 pi 4, src_addr:192.168.122.211:22 dst_addr:192.168.122.236:47681
[ 1465.413732] mptcp_add_sock: token 0x735227f5 pi 5, src_addr:192.168.122.211:22 dst_addr:192.168.122.237:46163
[ 1465.416097] mptcp_add_sock: token 0x735227f5 pi 6, src_addr:192.168.122.211:22 dst_addr:192.168.122.238:50525
[ 1465.418678] mptcp_add_sock: token 0x735227f5 pi 7, src_addr:192.168.122.211:22 dst_addr:192.168.122.239:39503
[ 1465.418951] mptcp_add_sock: token 0x735227f5 pi 8, src_addr:192.168.122.211:22 dst_addr:192.168.122.240:57097

SSH Process ^C

[ 1472.993924] mptcp_del_sock: Removing subsock tok 0x735227f5 pi:8 state 7 is_meta? 0
[ 1472.994392] mptcp_del_sock: Removing subsock tok 0x735227f5 pi:7 state 7 is_meta? 0
[ 1472.994442] mptcp_del_sock: Removing subsock tok 0x735227f5 pi:6 state 7 is_meta? 0
[ 1472.994475] mptcp_del_sock: Removing subsock tok 0x735227f5 pi:4 state 7 is_meta? 0
[ 1472.994505] mptcp_del_sock: Removing subsock tok 0x735227f5 pi:3 state 7 is_meta? 0
[ 1472.994551] mptcp_del_sock: Removing subsock tok 0x735227f5 pi:2 state 7 is_meta? 0
[ 1472.994596] mptcp_del_sock: Removing subsock tok 0x735227f5 pi:1 state 7 is_meta? 0
[ 1472.994622] mptcp_del_sock: Removing subsock tok 0x735227f5 pi:5 state 7 is_meta? 0
[ 1472.994653] mptcp_close: Close of meta_sk with tok 0x735227f5
[ 1472.994710] mptcp_sock_destruct destroying meta-sk token 0x735227f5

Here’s libvirt definition for both VMs, in case you guys want to try this setup:
VM 1: https://pastebin.com/VeWCLmac
VM 2: https://pastebin.com/NXXmz9tj

Thanks in advance :)

@arter97
Copy link
Contributor Author

arter97 commented Jan 27, 2021

Mainline kernel's MPTCP config:

# cat /boot/config-5.10.10-051010-generic | grep -i mptcp
CONFIG_MPTCP=y
CONFIG_INET_MPTCP_DIAG=m
CONFIG_MPTCP_IPV6=y
# cat /proc/sys/net/mptcp/enabled 
1

@matttbe
Copy link
Member

matttbe commented Jan 27, 2021

Hello,

I see that you are using the Fullmesh PM. This PM has a hard limit: https://github.com/multipath-tcp/mptcp/blob/mptcp_v0.95/net/mptcp/mptcp_fullmesh.c#L23

Is your goal to use more than 8 addresses per connection? We already talked about that in the past and it was hard for us to find a realistic use case to use so many subflows :-)

You can check the addresses picked by the PM by looking at /proc/net/mptcp_fullmesh. Does it correspond to what you see?

@arter97
Copy link
Contributor Author

arter97 commented Jan 28, 2021

Is your goal to use more than 8 addresses per connection?

Yup.

We already talked about that in the past and it was hard for us to find a realistic use case to use so many subflows :-)

Yeah, I admit my use-case won't be the primary example of MPTCP.

You can check the addresses picked by the PM by looking at /proc/net/mptcp_fullmesh. Does it correspond to what you see?

Yup, it matches it.

I see that you are using the Fullmesh PM. This PM has a hard limit: https://github.com/multipath-tcp/mptcp/blob/mptcp_v0.95/net/mptcp/mptcp_fullmesh.c#L23

Thanks for the pointer.
I've managed to play around with it for a few hours to raise the limit to 16.

The throughput of SSH increased linearly, now reaching 54.0 MiB/s.

I can see why the limit of 8 was put - struct mptcp_cb's u8 mptcp_pm[MPTCP_PM_SIZE] size increases quite drastically, from 608 to 720.
Following the same principle here: https://github.com/multipath-tcp/mptcp_net-next/wiki#overview

sk_buff structure size can't get bigger. It's already large and, if anything, the maintainers hope to reduce its size. Changes to the data structure size are amplified by the large number of instances in a busy system.

I can understand that 8 is a reasonable limitation.

For those who're interested though, I'll leave the commit here:
arter97/x86-kernel@443fcdf

Thanks for the help!

@arter97 arter97 closed this as completed Jan 28, 2021
@matttbe
Copy link
Member

matttbe commented Jan 28, 2021

Thank you for having tried and shared the modified code! It can help others :)

By chance, may you share your use case? Maintaining more than 8 addresses, with possibly 8x8 subflows, that's a lot :-)

@arter97
Copy link
Contributor Author

arter97 commented Feb 2, 2021

Hey, sorry for the late reply, got caught up with work recently.

I don't think I can provide the details of the company's internal networking infrastructure, but if I were to make an analogy, we're kind of in a weird position of being able to get as many IP addresses from the ISP as we want, but with each limited to < 50 Mbps.

We know for a fact that the whole switching capacity well exceeds the throughput of the entire addresses combined, so we deployed an MPTCP environment that relays SOCKS5 proxy server from outside's unlimited/unthrottled computer to get faster Internet access.

We're currently using WireGuard with MPTCP, microsocks and redsocks2 for the entire setup.
It works well(ish), but when it doesn't, it's usually the microsocks/redsocks's fault, not MPTCP's :)

@matttbe
Copy link
Member

matttbe commented Feb 2, 2021

I see why you need to use more addresses, thank you for the explanation, an interesting use-case!

And nice to see it works well with all these proxies! Can we force WireGuard to use TCP? Or I guess MPTCP is in a tunnel managed by WireGuard.

@arter97
Copy link
Contributor Author

arter97 commented Feb 4, 2021

Yeah, MPTCP is living inside WireGuard tunnels.

I didn't conduct an experiment yet to see whether which is better: "Multiple WireGuarded interfaces with MPTCP and unencrypted microsocks proxy" or "Unencrypted interfaces with MPTCP and encrypted SOCKS5 proxy(e.g., ssh or shadowsocks)"

I opted for WireGuard as it naturally gets parallelized across multiple CPU cores, but who knows, maybe the latter can outperform ¯_(ツ)_/¯

I should experiment around that sooner or later..

@arter97
Copy link
Contributor Author

arter97 commented May 2, 2021

Just leaving here an update on our use-case :)

We settled on using WireGuard + MPTCP + shadowsocks-rust (without encryption: plain), and it is rock solid for months now.

If we don't use WireGuard, something goes wrong with shadowsocks-rust and TCP connections randomly hang, which I don't believe is due to either MPTCP or shadowsocks-rust itself.
After setting up WireGuard and forcing our Internet connections to go through UDP fixed everything. Now that the connections are encrypted, we simply switched to plain encryption from shadowsocks-rust configuration.

@matttbe
Copy link
Member

matttbe commented May 3, 2021

Thank you for sharing this, always useful from our development point of view to know how MPTCP is used :)

@starkovv
Copy link

@arter97 what version of the kernel and mptcp do you use in your setup?

@arter97
Copy link
Contributor Author

arter97 commented May 20, 2021

@starkovv I use a custom kernel based on v5.4 with mptcp_trunk branch merged.

Notable change is arter97/x86-kernel@443fcdf as mentioned in the above comment.

https://github.com/arter97/x86-kernel/tree/5.4

@arinc9
Copy link
Contributor

arinc9 commented Dec 31, 2021

Just leaving here an update on our use-case :)

We settled on using WireGuard + MPTCP + shadowsocks-rust (without encryption: plain), and it is rock solid for months now.

If we don't use WireGuard, something goes wrong with shadowsocks-rust and TCP connections randomly hang, which I don't believe is due to either MPTCP or shadowsocks-rust itself. After setting up WireGuard and forcing our Internet connections to go through UDP fixed everything. Now that the connections are encrypted, we simply switched to plain encryption from shadowsocks-rust configuration.

This is more or less the setup I have at home. I can get as many 100 Mbps links as I want from the ISP so I plan to use 10 subflows to get 1 Gbps connection.

I use WireGuard to take care of all the non-TCP traffic over the most stable link (especially helpful with encypting DNS traffic and delay-sensitive use cases). iptables picks up TCP traffic and forwards it to the proxy (I use v2ray's vless for that) which goes over multiple links, plaintext.

The reason I use an unknown Chinese protocol is because my home router cannot handle high throughput with encryption. And where I live, I'm pretty sure the ISP uses their firewall to track SOCKS traffic. So I believe using an unknown protocol like vless keeps me under the radar.

@arter97 @matttbe

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants