MPTCP uses only 4 interfaces out of 8 #128

matthiasATzema · 2016-06-20T12:59:03Z

Hi experts,

i got two servers equipped with 2 quadport 1Gb ethernet adapters and 2 1Gb ports onboard. (sum of 10 ports a 1Gb per server).

i want to use 2 ports as local LAN (no multipath) and 8 ports bundled as a 8Gb link between both servers.

To check the connection speed i started "iperf -s" at the first machine and "iperf -c 10.250.240.20" at the second machine. In combination with "nload" i can see that only 4 interfaces are being used at the same time. What can i do to use all 8 interfaces at the same time?

How can in debug this issue?

Many thanks in advance!

GinesGarcia · 2016-06-20T13:09:38Z

Hi,
are you using "fullmesh" path-manager? have you configure all the routing tables needed by MPTCP (one per interface)?if so, can you provide more information about your configuration?

matthiasATzema · 2016-06-21T06:17:07Z

Yes, i use the fullmesh path-manager.

My hardware setup looks like this:
serverA & serverB:
HP ProLiant SE326M1R2 2x Intel Xeon L5640 Six Core 2.26 GHz (12 cores total per server)
2x Intel(R) PRO/1000 e1000e (quad port gigabit, 8x 1Gb direct connection between serverA and serverB)
Linux serverA 3.18.25.jessiemptcp #3 SMP Wed Jan 13 05:33:31 UTC 2016 x86_64 GNU/Linux

This i configured manually on serverA:
ip rule add from 10.250.240.20 table 1
ip rule add from 10.250.241.20 table 2
ip rule add from 10.250.242.20 table 3
ip rule add from 10.250.243.20 table 4
ip rule add from 10.250.244.20 table 5
ip rule add from 10.250.245.20 table 6
ip rule add from 10.250.246.20 table 7
ip rule add from 10.250.247.20 table 8
ip route add 10.250.240.0/24 dev enp5s0f0 scope link table 1
ip route add 10.250.241.0/24 dev enp5s0f1 scope link table 2
ip route add 10.250.242.0/24 dev enp6s0f0 scope link table 3
ip route add 10.250.243.0/24 dev enp6s0f1 scope link table 4
ip route add 10.250.244.0/24 dev enp12s0f0 scope link table 5
ip route add 10.250.245.0/24 dev enp12s0f1 scope link table 6
ip route add 10.250.246.0/24 dev enp13s0f0 scope link table 7
ip route add 10.250.247.0/24 dev enp13s0f1 scope link table 8

This on serverB:
ip rule add from 10.250.240.21 table 1
ip rule add from 10.250.241.21 table 2
ip rule add from 10.250.242.21 table 3
ip rule add from 10.250.243.21 table 4
ip rule add from 10.250.244.21 table 5
ip rule add from 10.250.245.21 table 6
ip rule add from 10.250.246.21 table 7
ip rule add from 10.250.247.21 table 8
ip route add 10.250.240.0/24 dev enp5s0f0 scope link table 1
ip route add 10.250.241.0/24 dev enp5s0f1 scope link table 2
ip route add 10.250.242.0/24 dev enp6s0f0 scope link table 3
ip route add 10.250.243.0/24 dev enp6s0f1 scope link table 4
ip route add 10.250.244.0/24 dev enp12s0f0 scope link table 5
ip route add 10.250.245.0/24 dev enp12s0f1 scope link table 6
ip route add 10.250.246.0/24 dev enp13s0f0 scope link table 7
ip route add 10.250.247.0/24 dev enp13s0f1 scope link table 8

according to /var/log/kern.log at serverA i can say that a subflow will be created from each ip address to each ip address. So this looks like i "fullmesh". But maybe here is the problem, because serverA and serverB are direct connected via 8 ethernet cables, so most of this subflows will be deadends.

cpaasch · 2016-06-22T05:53:29Z

As MPTCP creates a fullmesh across all IP-addresses, and additionally to that the maximum number of subflows is limited to 32, it can happen that the host ends up not using all interfaces in your case.

Because, with each host having 8 interfaces, a total of 64 subflows would be created (if there were not limit on the number of subflows).

You should install iptables-rules with target REJECT to prevent these unnecessary subflows.

matthiasATzema · 2016-06-23T08:25:43Z

My boss cancled the project so sadly i dont have any time left to test this. :(

serverA
iptables -A INPUT -s 10.250.240.20 -d 10.250.240.0/24 -j ACCEPT
iptables -A INPUT -s 10.250.240.20 -j REJECT
iptables -A INPUT -s 10.250.241.20 -d 10.250.241.0/24 -j ACCEPT
iptables -A INPUT -s 10.250.241.20 -j REJECT
iptables -A INPUT -s 10.250.242.20 -d 10.250.242.0/24 -j ACCEPT
iptables -A INPUT -s 10.250.242.20 -j REJECT
iptables -A INPUT -s 10.250.243.20 -d 10.250.243.0/24 -j ACCEPT
iptables -A INPUT -s 10.250.243.20 -j REJECT
iptables -A INPUT -s 10.250.244.20 -d 10.250.244.0/24 -j ACCEPT
iptables -A INPUT -s 10.250.244.20 -j REJECT
iptables -A INPUT -s 10.250.245.20 -d 10.250.245.0/24 -j ACCEPT
iptables -A INPUT -s 10.250.245.20 -j REJECT
iptables -A INPUT -s 10.250.246.20 -d 10.250.246.0/24 -j ACCEPT
iptables -A INPUT -s 10.250.246.20 -j REJECT
iptables -A INPUT -s 10.250.247.20 -d 10.250.247.0/24 -j ACCEPT
iptables -A INPUT -s 10.250.247.20 -j REJECT

serverB
iptables -A INPUT -s 10.250.240.21 -d 10.250.240.0/24 -j ACCEPT
iptables -A INPUT -s 10.250.240.21 -j REJECT
iptables -A INPUT -s 10.250.241.21 -d 10.250.241.0/24 -j ACCEPT
iptables -A INPUT -s 10.250.241.21 -j REJECT
iptables -A INPUT -s 10.250.242.21 -d 10.250.242.0/24 -j ACCEPT
iptables -A INPUT -s 10.250.242.21 -j REJECT
iptables -A INPUT -s 10.250.243.21 -d 10.250.243.0/24 -j ACCEPT
iptables -A INPUT -s 10.250.243.21 -j REJECT
iptables -A INPUT -s 10.250.244.21 -d 10.250.244.0/24 -j ACCEPT
iptables -A INPUT -s 10.250.244.21 -j REJECT
iptables -A INPUT -s 10.250.245.21 -d 10.250.245.0/24 -j ACCEPT
iptables -A INPUT -s 10.250.245.21 -j REJECT
iptables -A INPUT -s 10.250.246.21 -d 10.250.246.0/24 -j ACCEPT
iptables -A INPUT -s 10.250.246.21 -j REJECT
iptables -A INPUT -s 10.250.247.21 -d 10.250.247.0/24 -j ACCEPT
iptables -A INPUT -s 10.250.247.21 -j REJECT

[ Upstream commit 4adfa79 ] When we dump the ip6mr mfc entries via proc, we initialize an iterator with the table to dump but we don't clear the cache pointer which might be initialized from a prior read on the same descriptor that ended. This can result in lock imbalance (an unnecessary unlock) leading to other crashes and hangs. Clear the cache pointer like ipmr does to fix the issue. Thanks for the reliable reproducer. Here's syzbot's trace: WARNING: bad unlock balance detected! 4.15.0-rc3+ #128 Not tainted syzkaller971460/3195 is trying to release lock (mrt_lock) at: [<000000006898068d>] ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553 but there are no more locks to release! other info that might help us debug this: 1 lock held by syzkaller971460/3195: #0: (&p->lock){+.+.}, at: [<00000000744a6565>] seq_read+0xd5/0x13d0 fs/seq_file.c:165 stack backtrace: CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 print_unlock_imbalance_bug+0x12f/0x140 kernel/locking/lockdep.c:3561 __lock_release kernel/locking/lockdep.c:3775 [inline] lock_release+0x5f9/0xda0 kernel/locking/lockdep.c:4023 __raw_read_unlock include/linux/rwlock_api_smp.h:225 [inline] _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255 ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553 traverse+0x3bc/0xa00 fs/seq_file.c:135 seq_read+0x96a/0x13d0 fs/seq_file.c:189 proc_reg_read+0xef/0x170 fs/proc/inode.c:217 do_loop_readv_writev fs/read_write.c:673 [inline] do_iter_read+0x3db/0x5b0 fs/read_write.c:897 compat_readv+0x1bf/0x270 fs/read_write.c:1140 do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189 C_SYSC_preadv fs/read_write.c:1209 [inline] compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203 do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline] do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389 entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125 RIP: 0023:0xf7f73c79 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 BUG: sleeping function called from invalid context at lib/usercopy.c:25 in_atomic(): 1, irqs_disabled(): 0, pid: 3195, name: syzkaller971460 INFO: lockdep is turned off. CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6060 __might_sleep+0x95/0x190 kernel/sched/core.c:6013 __might_fault+0xab/0x1d0 mm/memory.c:4525 _copy_to_user+0x2c/0xc0 lib/usercopy.c:25 copy_to_user include/linux/uaccess.h:155 [inline] seq_read+0xcb4/0x13d0 fs/seq_file.c:279 proc_reg_read+0xef/0x170 fs/proc/inode.c:217 do_loop_readv_writev fs/read_write.c:673 [inline] do_iter_read+0x3db/0x5b0 fs/read_write.c:897 compat_readv+0x1bf/0x270 fs/read_write.c:1140 do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189 C_SYSC_preadv fs/read_write.c:1209 [inline] compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203 do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline] do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389 entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125 RIP: 0023:0xf7f73c79 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 WARNING: CPU: 1 PID: 3195 at lib/usercopy.c:26 _copy_to_user+0xb5/0xc0 lib/usercopy.c:26 Reported-by: syzbot <bot+eceb3204562c41a438fa1f2335e0fe4f6886d669@syzkaller.appspotmail.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…p PTE entries commit 36b7840 upstream. H_PAGE_THP_HUGE is used to differentiate between a THP hugepage and hugetlb hugepage entries. The difference is WRT how we handle hash fault on these address. THP address enables MPSS in segments. We want to manage devmap hugepage entries similar to THP pt entries. Hence use H_PAGE_THP_HUGE for devmap huge PTE entries. With current code while handling hash PTE fault, we do set is_thp = true when finding devmap PTE huge PTE entries. Current code also does the below sequence we setting up huge devmap entries. entry = pmd_mkhuge(pfn_t_pmd(pfn, prot)); if (pfn_t_devmap(pfn)) entry = pmd_mkdevmap(entry); In that case we would find both H_PAGE_THP_HUGE and PAGE_DEVMAP set for huge devmap PTE entries. This results in false positive error like below. kernel BUG at /home/kvaneesh/src/linux/mm/memory.c:4321! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 56 PID: 67996 Comm: t_mmap_dio Not tainted 5.6.0-rc4-59640-g371c804dedbc #128 .... NIP [c00000000044c9e4] __follow_pte_pmd+0x264/0x900 LR [c0000000005d45f8] dax_writeback_one+0x1a8/0x740 Call Trace: str_spec.74809+0x22ffb4/0x2d116c (unreliable) dax_writeback_one+0x1a8/0x740 dax_writeback_mapping_range+0x26c/0x700 ext4_dax_writepages+0x150/0x5a0 do_writepages+0x68/0x180 __filemap_fdatawrite_range+0x138/0x180 file_write_and_wait_range+0xa4/0x110 ext4_sync_file+0x370/0x6e0 vfs_fsync_range+0x70/0xf0 sys_msync+0x220/0x2e0 system_call+0x5c/0x68 This is because our pmd_trans_huge check doesn't exclude _PAGE_DEVMAP. To make this all consistent, update pmd_mkdevmap to set H_PAGE_THP_HUGE and pmd_trans_huge check now excludes _PAGE_DEVMAP correctly. Fixes: ebd3119 ("powerpc/mm: Add devmap support for ppc64") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200313094842.351830-1-aneesh.kumar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cpaasch added the question label Jun 21, 2016

cpaasch closed this as completed Jun 23, 2016

arter97 mentioned this issue Jan 27, 2021

Out-of-tree MPTCP uses only 8 interfaces out of 16 #406

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPTCP uses only 4 interfaces out of 8 #128

MPTCP uses only 4 interfaces out of 8 #128

matthiasATzema commented Jun 20, 2016

GinesGarcia commented Jun 20, 2016

matthiasATzema commented Jun 21, 2016

cpaasch commented Jun 22, 2016

matthiasATzema commented Jun 23, 2016

MPTCP uses only 4 interfaces out of 8 #128

MPTCP uses only 4 interfaces out of 8 #128

Comments

matthiasATzema commented Jun 20, 2016

GinesGarcia commented Jun 20, 2016

matthiasATzema commented Jun 21, 2016

cpaasch commented Jun 22, 2016

matthiasATzema commented Jun 23, 2016