Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

see mptcp options when running iperf locally #247

Closed
teto opened this issue Dec 7, 2021 · 11 comments
Closed

see mptcp options when running iperf locally #247

teto opened this issue Dec 7, 2021 · 11 comments
Labels

Comments

@teto
Copy link

teto commented Dec 7, 2021

I would like to capture some MPTCP traffic locally (I've sysctl net.mptcp.enabled = 1), yet when I run an iperf3 transfer localhost, I dont see any MPTCP option in the TCP headers (capturling localhost with wireshark 3.4.9). I run 5.15.4. The out of tree linux mptcp multipath-tcp.org/ had some code to disable mptcp on localhost (and I had patched it to run my tests). Is it the same with net-next ? I had a quick look at the code but could not find anything similar.

@matttbe
Copy link
Member

matttbe commented Dec 8, 2021

Hi Matthieu,

I would like to capture some MPTCP traffic locally (I've sysctl net.mptcp.enabled = 1), yet when I run an iperf3 transfer localhost, I dont see any MPTCP option in the TCP headers (capturling localhost with wireshark 3.4.9).

What did you do to have IPerf3 creating an MPTCP socket? I guess you noticed your app has to create a specific MPTCP socket and not a regular TCP one. (Linked to How to use MPTCP? section in our wiki.)

When you use your modified/ld_preloaded IPerf3 with other interface, is it OK?

@teto
Copy link
Author

teto commented Dec 9, 2021

thanks again mattt for helping me ;)
arf I had completely missed the wiki info thanks. I run both client and server with mptcpize like mptcpize iperf -s yet wireshark still showed a TCP only transfer. Could anyone confirm whether they had success with mptcpize and iperf3 ?

I haven't tested https://github.com/pabeni/mptcp-tools.git yet but that would be my next try.

I might take a stab at #213 once I can get sthg running. I was looking in man tcp to have a description of net.mptcp.enabled but even there nothing. I wish I can at least fix that.

@vandit86
Copy link

vandit86 commented Dec 10, 2021

Hello,
mptcpize does not work for me either. I use this wrapper from wiki to generate mptcp traffic.

@matttbe
Copy link
Member

matttbe commented Dec 10, 2021

The best would be to use this patched version of IPerf: esnet/iperf#1166

If it works, please mention that on the IPerf PR, not sure why it has not been merged yet.

@pabeni
Copy link

pabeni commented Dec 10, 2021 via email

@teto
Copy link
Author

teto commented Dec 10, 2021

with the patched iperf I can make it work thanks, I will give a try at mptcpize again but I had launched it for both client and server as well.

@pabeni
Copy link

pabeni commented Dec 10, 2021

with the patched iperf I can make it work thanks, I will give a try at mptcpize again but I had launched it for both client and server as well.

Note that you need a recent version of the tool, specifically containing the commit:

commit 5633f08e35552295b2f9414ff32ca4e8e081b2f2
Author: Paolo Abeni paolo.abeni@gmail.com
Date: Tue Oct 12 19:24:33 2021 +0200

mptcpize: force MPTCP usage for IPPROTO_IP, too (#159)

No released version contain the above yet: you must use the current git snapshot.

@teto
Copy link
Author

teto commented Dec 10, 2021

ha that must have been it, I was using 0.8.
I've tried with the master version of mptcpd and with vanilla iperf3 I get:

❯ result/bin/mptcpize run iperf3 -s 
-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
iperf3: error - unable to set TCP/SCTP NODELAY: Operation not supported

while truing to connect with

result/bin/mptcpize run  iperf -c 192.168.1.17 --cport 2042 -4
iperf3: error - unable to set TCP/SCTP NODELAY: Operation not supported

Should I set cap_net_admin or some specific capability ? I've tried with sudo result/bin/mptcpize run iperf3 -s with the same effect.

@matttbe
Copy link
Member

matttbe commented Dec 13, 2021

Is there an option for IPerf not to use NODELAY?
If not, maybe the easiest is to use a v5.16 kernel (or our export branch).

@teto
Copy link
Author

teto commented Dec 13, 2021

there is a -N. For some reason I've installed mptcpize and I dont see the unable to set TCP/SCTP NODELAY error anymore but iperf3 doesn't create an mptcp session either. For now I am happy with using @pabeni's fork so closing this. Thanks all for the help.

@teto teto closed this as completed Dec 13, 2021
jenkins-tessares pushed a commit that referenced this issue Jul 20, 2023
Add a big batch of test coverage to assert all aspects of the tcx opts
attach, detach and query API:

  # ./vmtest.sh -- ./test_progs -t tc_opts
  [...]
  #238     tc_opts_after:OK
  #239     tc_opts_append:OK
  #240     tc_opts_basic:OK
  #241     tc_opts_before:OK
  #242     tc_opts_chain_classic:OK
  #243     tc_opts_demixed:OK
  #244     tc_opts_detach:OK
  #245     tc_opts_detach_after:OK
  #246     tc_opts_detach_before:OK
  #247     tc_opts_dev_cleanup:OK
  #248     tc_opts_invalid:OK
  #249     tc_opts_mixed:OK
  #250     tc_opts_prepend:OK
  #251     tc_opts_replace:OK
  #252     tc_opts_revision:OK
  Summary: 15/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20230719140858.13224-8-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
jenkins-tessares pushed a commit that referenced this issue Aug 11, 2023
Add a detachment test case with miniq present to assert that with and
without the miniq we get the same error.

  # ./test_progs -t tc_opts
  #244     tc_opts_after:OK
  #245     tc_opts_append:OK
  #246     tc_opts_basic:OK
  #247     tc_opts_before:OK
  #248     tc_opts_chain_classic:OK
  #249     tc_opts_delete_empty:OK
  #250     tc_opts_demixed:OK
  #251     tc_opts_detach:OK
  #252     tc_opts_detach_after:OK
  #253     tc_opts_detach_before:OK
  #254     tc_opts_dev_cleanup:OK
  #255     tc_opts_invalid:OK
  #256     tc_opts_mixed:OK
  #257     tc_opts_prepend:OK
  #258     tc_opts_replace:OK
  #259     tc_opts_revision:OK
  Summary: 16/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20230804131112.11012-2-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
matttbe pushed a commit that referenced this issue Aug 17, 2023
Add several new tcx test cases to improve test coverage. This also includes
a few new tests with ingress instead of clsact qdisc, to cover the fix from
commit dc644b5 ("tcx: Fix splat in ingress_destroy upon tcx_entry_free").

  # ./test_progs -t tc
  [...]
  #234     tc_links_after:OK
  #235     tc_links_append:OK
  #236     tc_links_basic:OK
  #237     tc_links_before:OK
  #238     tc_links_chain_classic:OK
  #239     tc_links_chain_mixed:OK
  #240     tc_links_dev_cleanup:OK
  #241     tc_links_dev_mixed:OK
  #242     tc_links_ingress:OK
  #243     tc_links_invalid:OK
  #244     tc_links_prepend:OK
  #245     tc_links_replace:OK
  #246     tc_links_revision:OK
  #247     tc_opts_after:OK
  #248     tc_opts_append:OK
  #249     tc_opts_basic:OK
  #250     tc_opts_before:OK
  #251     tc_opts_chain_classic:OK
  #252     tc_opts_chain_mixed:OK
  #253     tc_opts_delete_empty:OK
  #254     tc_opts_demixed:OK
  #255     tc_opts_detach:OK
  #256     tc_opts_detach_after:OK
  #257     tc_opts_detach_before:OK
  #258     tc_opts_dev_cleanup:OK
  #259     tc_opts_invalid:OK
  #260     tc_opts_mixed:OK
  #261     tc_opts_prepend:OK
  #262     tc_opts_replace:OK
  #263     tc_opts_revision:OK
  [...]
  Summary: 44/38 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/8699efc284b75ccdc51ddf7062fa2370330dc6c0.1692029283.git.daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
jenkins-tessares pushed a commit that referenced this issue Sep 8, 2023
'./test_progs -t test_local_storage' reported a splat:

[   27.137569] =============================
[   27.138122] [ BUG: Invalid wait context ]
[   27.138650] 6.5.0-03980-gd11ae1b16b0a #247 Tainted: G           O
[   27.139542] -----------------------------
[   27.140106] test_progs/1729 is trying to lock:
[   27.140713] ffff8883ef047b88 (stock_lock){-.-.}-{3:3}, at: local_lock_acquire+0x9/0x130
[   27.141834] other info that might help us debug this:
[   27.142437] context-{5:5}
[   27.142856] 2 locks held by test_progs/1729:
[   27.143352]  #0: ffffffff84bcd9c0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x40
[   27.144492]  #1: ffff888107deb2c0 (&storage->lock){..-.}-{2:2}, at: bpf_local_storage_update+0x39e/0x8e0
[   27.145855] stack backtrace:
[   27.146274] CPU: 0 PID: 1729 Comm: test_progs Tainted: G           O       6.5.0-03980-gd11ae1b16b0a #247
[   27.147550] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[   27.149127] Call Trace:
[   27.149490]  <TASK>
[   27.149867]  dump_stack_lvl+0x130/0x1d0
[   27.152609]  dump_stack+0x14/0x20
[   27.153131]  __lock_acquire+0x1657/0x2220
[   27.153677]  lock_acquire+0x1b8/0x510
[   27.157908]  local_lock_acquire+0x29/0x130
[   27.159048]  obj_cgroup_charge+0xf4/0x3c0
[   27.160794]  slab_pre_alloc_hook+0x28e/0x2b0
[   27.161931]  __kmem_cache_alloc_node+0x51/0x210
[   27.163557]  __kmalloc+0xaa/0x210
[   27.164593]  bpf_map_kzalloc+0xbc/0x170
[   27.165147]  bpf_selem_alloc+0x130/0x510
[   27.166295]  bpf_local_storage_update+0x5aa/0x8e0
[   27.167042]  bpf_fd_sk_storage_update_elem+0xdb/0x1a0
[   27.169199]  bpf_map_update_value+0x415/0x4f0
[   27.169871]  map_update_elem+0x413/0x550
[   27.170330]  __sys_bpf+0x5e9/0x640
[   27.174065]  __x64_sys_bpf+0x80/0x90
[   27.174568]  do_syscall_64+0x48/0xa0
[   27.175201]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   27.175932] RIP: 0033:0x7effb40e41ad
[   27.176357] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d8
[   27.179028] RSP: 002b:00007ffe64c21fc8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
[   27.180088] RAX: ffffffffffffffda RBX: 00007ffe64c22768 RCX: 00007effb40e41ad
[   27.181082] RDX: 0000000000000020 RSI: 00007ffe64c22008 RDI: 0000000000000002
[   27.182030] RBP: 00007ffe64c21ff0 R08: 0000000000000000 R09: 00007ffe64c22788
[   27.183038] R10: 0000000000000064 R11: 0000000000000202 R12: 0000000000000000
[   27.184006] R13: 00007ffe64c22788 R14: 00007effb42a1000 R15: 0000000000000000
[   27.184958]  </TASK>

It complains about acquiring a local_lock while holding a raw_spin_lock.
It means it should not allocate memory while holding a raw_spin_lock
since it is not safe for RT.

raw_spin_lock is needed because bpf_local_storage supports tracing
context. In particular for task local storage, it is easy to
get a "current" task PTR_TO_BTF_ID in tracing bpf prog.
However, task (and cgroup) local storage has already been moved to
bpf mem allocator which can be used after raw_spin_lock.

The splat is for the sk storage. For sk (and inode) storage,
it has not been moved to bpf mem allocator. Using raw_spin_lock or not,
kzalloc(GFP_ATOMIC) could theoretically be unsafe in tracing context.
However, the local storage helper requires a verifier accepted
sk pointer (PTR_TO_BTF_ID), it is hypothetical if that (mean running
a bpf prog in a kzalloc unsafe context and also able to hold a verifier
accepted sk pointer) could happen.

This patch avoids kzalloc after raw_spin_lock to silent the splat.
There is an existing kzalloc before the raw_spin_lock. At that point,
a kzalloc is very likely required because a lookup has just been done
before. Thus, this patch always does the kzalloc before acquiring
the raw_spin_lock and remove the later kzalloc usage after the
raw_spin_lock. After this change, it will have a charge and then
uncharge during the syscall bpf_map_update_elem() code path.
This patch opts for simplicity and not continue the old
optimization to save one charge and uncharge.

This issue is dated back to the very first commit of bpf_sk_storage
which had been refactored multiple times to create task, inode, and
cgroup storage. This patch uses a Fixes tag with a more recent
commit that should be easier to do backport.

Fixes: b00fa38 ("bpf: Enable non-atomic allocations in local storage")
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230901231129.578493-2-martin.lau@linux.dev
@Monika-Research
Copy link

Hi,
I don't see this issue with iperf 3.9. Only versions beyond 3.9 reports "unable to set TCP/SCTP NODELAY: Operation not supported". I won't be able to use 3.9 as our devices might lose some features if we downgrade. Kindly let me know if there is any patch to get mptcpize working with iperf 3.10 and above version or what can be the best way to get away with this issue?

Appreciate support and help.

Thanks,
Monika

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants