Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WARNING in sk_stream_kill_queues #172

Closed
matttbe opened this issue Mar 12, 2021 · 10 comments
Closed

WARNING in sk_stream_kill_queues #172

matttbe opened this issue Mar 12, 2021 · 10 comments
Assignees
Labels
Projects

Comments

@matttbe
Copy link
Member

matttbe commented Mar 12, 2021

Today, I'm replacing Christoph in sending bug reports about new warnings reported by the kernel :)

This was spot by my CI when running selftests (mptcp_connect) with the latest export branch and a non debug kernel:

01:20:05.880 # selftests: net/mptcp: mptcp_connect.sh
01:20:05.880 [   12.551060] ip (497) used greatest stack depth: 12160 bytes left
01:20:05.880 [   12.660444] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
01:20:05.880 [   12.758865] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth3: link becomes ready
01:20:05.880 [   12.867166] IPv6: ADDRCONF(NETDEV_CHANGE): ns3eth4: link becomes ready
01:20:05.880 [   12.868582] ip (528) used greatest stack depth: 12080 bytes left
01:20:05.880 # INFO: set ns3-604b3609-VpnW3B dev ns3eth2: ethtool -K tso off
01:20:05.880 # Created /tmp/tmp.apDkKpCK6J (size 5643292	/tmp/tmp.apDkKpCK6J) containing data sent by client
01:20:05.880 # Created /tmp/tmp.N9yTFLG6v4 (size 2384924	/tmp/tmp.N9yTFLG6v4) containing data sent by server
01:20:05.880 # New MPTCP socket can be blocked via sysctl		[ OK ]
01:20:05.880 # setsockopt(..., TCP_ULP, "mptcp", ...) blocked	[ OK ]
01:20:05.881 # INFO: validating network environment with pings
01:20:05.881 [   13.603058] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth1: link becomes ready
01:20:05.881 [   16.288011] netem: version 1.3
01:20:05.881 # INFO: Using loss of 0.99% delay 4 ms reorder 94% 99% with delay 1ms on ns3eth4
01:20:05.881 # ns1 MPTCP -> ns1 (10.0.1.1:10000      ) MPTCP	(duration   101ms) [ OK ]
(...)
01:20:05.883 # ns2 MPTCP -> ns4 (dead:beef:3::1:10023) MPTCP	[   26.909979] ------------[ cut here ]------------
01:20:05.883 [   26.910668] WARNING: CPU: 1 PID: 75 at net/core/stream.c:208 sk_stream_kill_queues (net/core/stream.c:208 (discriminator 1)) 
01:20:06.191 [   26.911880] Modules linked in: sch_netem mptcp_token_test mptcp_crypto_test kunit
01:20:06.191 [   26.912811] CPU: 1 PID: 75 Comm: kworker/1:2 Not tainted 5.12.0-rc2+ #82
01:20:06.191 [   26.913624] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
01:20:06.191 [   26.914660] Workqueue: events mptcp_worker
01:20:06.191 [   26.915262] RIP: 0010:sk_stream_kill_queues (net/core/stream.c:208 (discriminator 1)) 
01:20:06.193 [ 26.915931] Code: 00 85 c0 75 1f 85 f6 75 21 5b 5d c3 48 89 df e8 d3 ed fe ff 8b 83 48 01 00 00 8b b3 00 01 00 00 85 c0 74 e1 0f 0b 85 f6 74 df <0f> 0b 5b 5d c3 0f 0b eb ac 0f 1f 40 00 0f 1f 44 00 00 41 57 41 56
01:20:06.206 All code
01:20:06.258 ========
01:20:06.258    0:	00 85 c0 75 1f 85    	add    %al,-0x7ae08a40(%rbp)
01:20:06.258    6:	f6 75 21             	divb   0x21(%rbp)
01:20:06.258    9:	5b                   	pop    %rbx
01:20:06.258    a:	5d                   	pop    %rbp
01:20:06.258    b:	c3                   	retq   
01:20:06.258    c:	48 89 df             	mov    %rbx,%rdi
01:20:06.258    f:	e8 d3 ed fe ff       	callq  0xfffffffffffeede7
01:20:06.258   14:	8b 83 48 01 00 00    	mov    0x148(%rbx),%eax
01:20:06.258   1a:	8b b3 00 01 00 00    	mov    0x100(%rbx),%esi
01:20:06.258   20:	85 c0                	test   %eax,%eax
01:20:06.258   22:	74 e1                	je     0x5
01:20:06.258   24:	0f 0b                	ud2    
01:20:06.258   26:	85 f6                	test   %esi,%esi
01:20:06.258   28:	74 df                	je     0x9
01:20:06.258   2a:*	0f 0b                	ud2    		<-- trapping instruction
01:20:06.258   2c:	5b                   	pop    %rbx
01:20:06.258   2d:	5d                   	pop    %rbp
01:20:06.258   2e:	c3                   	retq   
01:20:06.258   2f:	0f 0b                	ud2    
01:20:06.258   31:	eb ac                	jmp    0xffffffffffffffdf
01:20:06.258   33:	0f 1f 40 00          	nopl   0x0(%rax)
01:20:06.258   37:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
01:20:06.258   3c:	41 57                	push   %r15
01:20:06.258   3e:	41 56                	push   %r14
01:20:06.258 
01:20:06.258 Code starting with the faulting instruction
01:20:06.258 ===========================================
01:20:06.258    0:	0f 0b                	ud2    
01:20:06.258    2:	5b                   	pop    %rbx
01:20:06.258    3:	5d                   	pop    %rbp
01:20:06.258    4:	c3                   	retq   
01:20:06.258    5:	0f 0b                	ud2    
01:20:06.258    7:	eb ac                	jmp    0xffffffffffffffb5
01:20:06.258    9:	0f 1f 40 00          	nopl   0x0(%rax)
01:20:06.258    d:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
01:20:06.258   12:	41 57                	push   %r15
01:20:06.258   14:	41 56                	push   %r14
01:20:06.258 [   26.918076] RSP: 0018:ffffae3e4025fdf8 EFLAGS: 00010206
01:20:06.259 [   26.918746] RAX: 0000000000000000 RBX: ffff8bd3835545c0 RCX: 0000000000000007
01:20:06.259 [   26.919598] RDX: ffffffff960f5fd0 RSI: 0000000000000578 RDI: ffff8bd3835545c0
01:20:06.260 [   26.920432] RBP: ffff8bd383554670 R08: 0000000000000000 R09: ffffffff95436100
01:20:06.260 [   26.921249] R10: ffff8bd3834e0800 R11: 0000000000000001 R12: ffffae3e4025fe10
01:20:06.260 [   26.922195] R13: ffff8bd383554b90 R14: ffff8bd3fdd2d100 R15: 0000000000000000
01:20:06.260 [   26.923197] FS:  0000000000000000(0000) GS:ffff8bd3fdd00000(0000) knlGS:0000000000000000
01:20:06.260 [   26.924269] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
01:20:06.260 [   26.925062] CR2: 00005580c706e540 CR3: 0000000046c10002 CR4: 0000000000370ee0
01:20:06.260 [   26.925970] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
01:20:06.260 [   26.926941] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
01:20:06.260 [   26.927956] Call Trace:
01:20:06.260 [   26.928371] __mptcp_destroy_sock (./include/net/xfrm.h:1185 (discriminator 3)) 
01:20:06.504 [   26.929042] mptcp_worker (net/mptcp/protocol.c:2331) 
01:20:06.755 [   26.929630] process_one_work (./arch/x86/include/asm/jump_label.h:25) 
01:20:07.007 [   26.930224] worker_thread (./include/linux/list.h:282) 
01:20:07.266 [   26.930728] ? process_one_work (kernel/workqueue.c:2364) 
01:20:07.413 [   26.931310] kthread (kernel/kthread.c:292) 
01:20:07.659 [   26.931766] ? kthread_park (kernel/kthread.c:245) 
01:20:07.908 [   26.932269] ret_from_fork (arch/x86/entry/entry_64.S:300) 
01:20:08.160 [   26.932713] ---[ end trace 62b4db612e46011d ]---
01:20:08.160 [   26.933354] ------------[ cut here ]------------
01:20:08.160 [   26.934200] WARNING: CPU: 1 PID: 16 at net/ipv4/af_inet.c:157 inet_sock_destruct (net/ipv4/af_inet.c:157 (discriminator 1)) 
01:20:08.407 [   26.935407] Modules linked in: sch_netem mptcp_token_test mptcp_crypto_test kunit
01:20:08.407 [   26.936428] CPU: 1 PID: 16 Comm: ksoftirqd/1 Tainted: G        W         5.12.0-rc2+ #82
01:20:08.407 [   26.937633] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
01:20:08.407 [   26.938790] RIP: 0010:inet_sock_destruct (net/ipv4/af_inet.c:157 (discriminator 1)) 
01:20:08.409 [ 26.939486] Code: bc 24 30 01 00 00 5b 41 5c e9 d7 18 f3 ff 41 0f b6 44 24 12 3c 07 74 8f e9 b8 c2 21 00 4c 89 e7 e8 a0 85 f0 ff e9 71 ff ff ff <0f> 0b eb b6 0f 0b 41 8b 84 24 4c 01 00 00 85 c0 74 90 0f 0b 41 8b
01:20:08.411 All code
01:20:08.433 ========
01:20:08.433    0:	bc 24 30 01 00       	mov    $0x13024,%esp
01:20:08.433    5:	00 5b 41             	add    %bl,0x41(%rbx)
01:20:08.433    8:	5c                   	pop    %rsp
01:20:08.433    9:	e9 d7 18 f3 ff       	jmpq   0xfffffffffff318e5
01:20:08.433    e:	41 0f b6 44 24 12    	movzbl 0x12(%r12),%eax
01:20:08.433   14:	3c 07                	cmp    $0x7,%al
01:20:08.433   16:	74 8f                	je     0xffffffffffffffa7
01:20:08.433   18:	e9 b8 c2 21 00       	jmpq   0x21c2d5
01:20:08.433   1d:	4c 89 e7             	mov    %r12,%rdi
01:20:08.433   20:	e8 a0 85 f0 ff       	callq  0xfffffffffff085c5
01:20:08.433   25:	e9 71 ff ff ff       	jmpq   0xffffffffffffff9b
01:20:08.433   2a:*	0f 0b                	ud2    		<-- trapping instruction
01:20:08.433   2c:	eb b6                	jmp    0xffffffffffffffe4
01:20:08.433   2e:	0f 0b                	ud2    
01:20:08.433   30:	41 8b 84 24 4c 01 00 	mov    0x14c(%r12),%eax
01:20:08.433   37:	00 
01:20:08.433   38:	85 c0                	test   %eax,%eax
01:20:08.433   3a:	74 90                	je     0xffffffffffffffcc
01:20:08.433   3c:	0f 0b                	ud2    
01:20:08.433   3e:	41                   	rex.B
01:20:08.433   3f:	8b                   	.byte 0x8b
01:20:08.433 
01:20:08.433 Code starting with the faulting instruction
01:20:08.434 ===========================================
01:20:08.434    0:	0f 0b                	ud2    
01:20:08.434    2:	eb b6                	jmp    0xffffffffffffffba
01:20:08.434    4:	0f 0b                	ud2    
01:20:08.434    6:	41 8b 84 24 4c 01 00 	mov    0x14c(%r12),%eax
01:20:08.434    d:	00 
01:20:08.434    e:	85 c0                	test   %eax,%eax
01:20:08.434   10:	74 90                	je     0xffffffffffffffa2
01:20:08.434   12:	0f 0b                	ud2    
01:20:08.434   14:	41                   	rex.B
01:20:08.434   15:	8b                   	.byte 0x8b
01:20:08.434 [   26.941652] RSP: 0018:ffffae3e40093ad8 EFLAGS: 00010206
01:20:08.435 [   26.942336] RAX: 0000000000000578 RBX: ffff8bd383554670 RCX: 0000000000000028
01:20:08.435 [   26.943284] RDX: 0000000000000000 RSI: 0000000000000578 RDI: ffff8bd383554670
01:20:08.435 [   26.944208] RBP: ffff8bd3835545c0 R08: ffff8bd38355470c R09: 0000000000000000
01:20:08.435 [   26.945136] R10: 0000000001000000 R11: 0000000001000000 R12: ffff8bd3835545c0
01:20:08.435 [   26.945997] R13: ffff8bd382d08000 R14: ffff8bd384611d14 R15: ffff8bd384611cec
01:20:08.435 [   26.946839] FS:  0000000000000000(0000) GS:ffff8bd3fdd00000(0000) knlGS:0000000000000000
01:20:08.436 [   26.947762] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
01:20:08.436 [   26.948445] CR2: 00005580c706e540 CR3: 0000000046c10002 CR4: 0000000000370ee0
01:20:08.436 [   26.949293] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
01:20:08.436 [   26.950320] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
01:20:08.436 [   26.951284] Call Trace:
01:20:08.436 [   26.951658] __sk_destruct (net/core/sock.c:1797 (discriminator 3)) 
01:20:08.672 [   26.952155] subflow_ulp_release (./include/net/sock.h:1803) 
01:20:08.931 [   26.952694] tcp_cleanup_ulp (net/ipv4/tcp_ulp.c:124) 
01:20:09.188 [   26.953183] tcp_v4_destroy_sock (net/ipv4/tcp_ipv4.c:2234) 
01:20:09.447 [   26.953794] tcp_v6_destroy_sock (net/ipv6/tcp_ipv6.c:1940) 
01:20:09.707 [   26.954421] inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:886 (discriminator 9)) 
01:20:10.010 [   26.955013] tcp_rcv_state_process (net/ipv4/tcp_input.c:6498) 
01:20:10.269 [   26.955735] ? security_sock_rcv_skb (security/security.c:2190 (discriminator 19)) 
01:20:10.524 [   26.956475] ? sk_filter_trim_cap (net/core/filter.c:156) 
01:20:10.783 [   26.957119] tcp_v6_do_rcv (net/ipv6/tcp_ipv6.c:1522) 
01:20:11.042 [   26.957647] tcp_v6_rcv (net/ipv6/tcp_ipv6.c:1730) 
01:20:11.301 [   26.958133] ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:434 (discriminator 8)) 
01:20:11.550 [   26.958919] ip6_input_finish (./include/linux/rcupdate.h:74) 
01:20:11.800 [   26.959417] ip6_input (./include/linux/netfilter.h:301) 
01:20:12.048 [   26.959831] ? ip6_dst_check (net/ipv6/route.c:2646) 
01:20:12.298 [   26.960355] ? tcp_v6_early_demux (./include/net/dst.h:469) 
01:20:12.556 [   26.960989] ipv6_rcv (./include/linux/netfilter.h:302) 
01:20:12.804 [   26.961437] __netif_receive_skb_one_core (net/core/dev.c:5369) 
01:20:13.047 [   26.962113] process_backlog (./include/linux/rcupdate.h:74) 
01:20:13.304 [   26.962629] __napi_poll (net/core/dev.c:6893) 
01:20:13.547 [   26.963138] net_rx_action (net/core/dev.c:6962) 
01:20:13.807 [   26.963637] __do_softirq (./arch/x86/include/asm/jump_label.h:25) 
01:20:14.026 [   26.964117] run_ksoftirqd (./arch/x86/include/asm/irqflags.h:45) 
01:20:14.287 [   26.964637] smpboot_thread_fn (kernel/smpboot.c:165) 
01:20:14.541 [   26.965200] ? sort_range (kernel/smpboot.c:108) 
01:20:14.797 [   26.965728] kthread (kernel/kthread.c:292) 
01:20:14.799 [   26.966159] ? kthread_park (kernel/kthread.c:245) 
01:20:14.800 [   26.966767] ret_from_fork (arch/x86/entry/entry_64.S:300) 
01:20:14.802 [   26.967263] ---[ end trace 62b4db612e46011e ]---
01:20:14.802 (duration   265ms) [ OK ]

Maybe similar to #136 ?

@matttbe matttbe added the bug label Mar 12, 2021
@matttbe matttbe added this to Needs triage in MPTCP Bugs via automation Mar 12, 2021
@matttbe
Copy link
Member Author

matttbe commented Apr 16, 2021

My CI got the issue again:

# selftests: net/mptcp: mptcp_connect.sh
[   11.205424] ip (500) used greatest stack depth: 12160 bytes left
[   11.282956] ip (507) used greatest stack depth: 12096 bytes left
[   11.366341] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
[   11.479448] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth3: link becomes ready
[   11.602172] IPv6: ADDRCONF(NETDEV_CHANGE): ns3eth4: link becomes ready
# INFO: set ns3-60793e51-gEvPoy dev ns3eth2: ethtool -K tso off gso off gro off
# INFO: set ns4-60793e51-gEvPoy dev ns4eth3: ethtool -K tso off gro off
# Created /tmp/tmp.80GSDDtWgl (size 5136412	/tmp/tmp.80GSDDtWgl) containing data sent by client
# Created /tmp/tmp.tXHycNkXvN (size 5500956	/tmp/tmp.tXHycNkXvN) containing data sent by server
# New MPTCP socket can be blocked via sysctl		[ OK ]
# setsockopt(..., TCP_ULP, "mptcp", ...) blocked	[ OK ]
# INFO: validating network environment with pings
[   12.315354] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth1: link becomes ready
[   14.940148] netem: version 1.3
# INFO: Using loss of 0.31% delay 14 ms reorder 95% 51% with delay 3ms on ns3eth4
# ns1 MPTCP -> ns1 (10.0.1.1:10000      ) MPTCP	(duration   260ms) [ OK ]
# ns1 MPTCP -> ns1 (10.0.1.1:10001      ) TCP  	(duration    35ms) [ OK ]
# ns1 TCP   -> ns1 (10.0.1.1:10002      ) MPTCP	(duration    34ms) [ OK ]
# ns1 MPTCP -> ns1 (dead:beef:1::1:10003) MPTCP	(duration    81ms) [ OK ]
# ns1 MPTCP -> ns1 (dead:beef:1::1:10004) TCP  	(duration    33ms) [ OK ]
# ns1 TCP   -> ns1 (dead:beef:1::1:10005) MPTCP	(duration    35ms) [ OK ]
# ns1 MPTCP -> ns2 (10.0.1.2:10006      ) MPTCP	(duration    37ms) [ OK ]
# ns1 MPTCP -> ns2 (dead:beef:1::2:10007) MPTCP	(duration    37ms) [ OK ]
# ns1 MPTCP -> ns2 (10.0.2.1:10008      ) MPTCP	(duration    93ms) [ OK ]
# ns1 MPTCP -> ns2 (dead:beef:2::1:10009) MPTCP	(duration    83ms) [ OK ]
# ns1 MPTCP -> ns3 (10.0.2.2:10010      ) MPTCP	[   18.600946] ------------[ cut here ]------------
[   18.601837] WARNING: CPU: 0 PID: 23 at net/core/stream.c:208 sk_stream_kill_queues (net/core/stream.c:208 (discriminator 1)) 
[   18.602964] Modules linked in: sch_netem mptcp_token_test mptcp_crypto_test kunit
[   18.603860] CPU: 0 PID: 23 Comm: kworker/0:1 Not tainted 5.12.0-rc6+ #1
[   18.604667] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
[   18.605813] Workqueue: events mptcp_worker
[   18.606475] RIP: 0010:sk_stream_kill_queues (net/core/stream.c:208 (discriminator 1)) 
[ 18.607368] Code: 00 85 c0 75 1f 85 f6 75 21 5b 5d c3 48 89 df e8 83 ec fe ff 8b 83 48 01 00 00 8b b3 00 01 00 00 85 c0 74 e1 0f 0b 85 f6 74 df <0f> 0b 5b 5d c3 0f 0b eb ac 0f 1f 40 00 0f 1f 44 00 00 41 57 41 56
All code
========
   0:	00 85 c0 75 1f 85    	add    %al,-0x7ae08a40(%rbp)
   6:	f6 75 21             	divb   0x21(%rbp)
   9:	5b                   	pop    %rbx
   a:	5d                   	pop    %rbp
   b:	c3                   	retq   
   c:	48 89 df             	mov    %rbx,%rdi
   f:	e8 83 ec fe ff       	callq  0xfffffffffffeec97
  14:	8b 83 48 01 00 00    	mov    0x148(%rbx),%eax
  1a:	8b b3 00 01 00 00    	mov    0x100(%rbx),%esi
  20:	85 c0                	test   %eax,%eax
  22:	74 e1                	je     0x5
  24:	0f 0b                	ud2    
  26:	85 f6                	test   %esi,%esi
  28:	74 df                	je     0x9
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	5b                   	pop    %rbx
  2d:	5d                   	pop    %rbp
  2e:	c3                   	retq   
  2f:	0f 0b                	ud2    
  31:	eb ac                	jmp    0xffffffffffffffdf
  33:	0f 1f 40 00          	nopl   0x0(%rax)
  37:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  3c:	41 57                	push   %r15
  3e:	41 56                	push   %r14

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2    
   2:	5b                   	pop    %rbx
   3:	5d                   	pop    %rbp
   4:	c3                   	retq   
   5:	0f 0b                	ud2    
   7:	eb ac                	jmp    0xffffffffffffffb5
   9:	0f 1f 40 00          	nopl   0x0(%rax)
   d:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  12:	41 57                	push   %r15
  14:	41 56                	push   %r14
[   18.610067] RSP: 0018:ffffb4f8c00cfdf8 EFLAGS: 00010202
[   18.610956] RAX: 0000000000000000 RBX: ffff9eaf034a8740 RCX: 0000000000000000
[   18.612035] RDX: ffffffff8a4f6db0 RSI: 0000000000000940 RDI: ffff9eaf034a8740
[   18.612875] RBP: ffff9eaf034a87f0 R08: 0000000000000000 R09: 0000000000000101
[   18.613807] R10: ffff9eaf02e20000 R11: ffff9eaf02e38780 R12: ffffb4f8c00cfe10
[   18.614741] R13: ffff9eaf034a8d10 R14: ffff9eaf7dc2d100 R15: 0000000000000000
[   18.615661] FS:  0000000000000000(0000) GS:ffff9eaf7dc00000(0000) knlGS:0000000000000000
[   18.616609] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.617402] CR2: 0000557620ce02c0 CR3: 0000000002df6004 CR4: 0000000000370ef0
[   18.618306] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.619186] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.620020] Call Trace:
[   18.620355] __mptcp_destroy_sock (./include/net/xfrm.h:1185 (discriminator 3)) 
[   18.620943] mptcp_worker (net/mptcp/protocol.c:2348) 
[   18.621446] process_one_work (./arch/x86/include/asm/jump_label.h:25) 
[   18.621988] worker_thread (./include/linux/list.h:282) 
[   18.622470] ? process_one_work (kernel/workqueue.c:2364) 
[   18.623050] kthread (kernel/kthread.c:292) 
[   18.623494] ? kthread_park (kernel/kthread.c:245) 
[   18.623997] ret_from_fork (arch/x86/entry/entry_64.S:294) 
[   18.624579] ---[ end trace 73e3a8cb76102604 ]---
[   18.625434] ------------[ cut here ]------------
[   18.626436] WARNING: CPU: 0 PID: 10 at net/ipv4/af_inet.c:157 inet_sock_destruct (net/ipv4/af_inet.c:157 (discriminator 1)) 
[   18.627561] Modules linked in: sch_netem mptcp_token_test mptcp_crypto_test kunit
[   18.628500] CPU: 0 PID: 10 Comm: ksoftirqd/0 Tainted: G        W         5.12.0-rc6+ #1
[   18.629584] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
[   18.630935] RIP: 0010:inet_sock_destruct (net/ipv4/af_inet.c:157 (discriminator 1)) 
[ 18.632259] Code: bc 24 30 01 00 00 5b 41 5c e9 97 02 f3 ff 41 0f b6 44 24 12 3c 07 74 8f e9 fd ed 21 00 4c 89 e7 e8 80 6b f0 ff e9 71 ff ff ff <0f> 0b eb b6 0f 0b 41 8b 84 24 4c 01 00 00 85 c0 74 90 0f 0b 41 8b
All code
========
   0:	bc 24 30 01 00       	mov    $0x13024,%esp
   5:	00 5b 41             	add    %bl,0x41(%rbx)
   8:	5c                   	pop    %rsp
   9:	e9 97 02 f3 ff       	jmpq   0xfffffffffff302a5
   e:	41 0f b6 44 24 12    	movzbl 0x12(%r12),%eax
  14:	3c 07                	cmp    $0x7,%al
  16:	74 8f                	je     0xffffffffffffffa7
  18:	e9 fd ed 21 00       	jmpq   0x21ee1a
  1d:	4c 89 e7             	mov    %r12,%rdi
  20:	e8 80 6b f0 ff       	callq  0xfffffffffff06ba5
  25:	e9 71 ff ff ff       	jmpq   0xffffffffffffff9b
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	eb b6                	jmp    0xffffffffffffffe4
  2e:	0f 0b                	ud2    
  30:	41 8b 84 24 4c 01 00 	mov    0x14c(%r12),%eax
  37:	00 
  38:	85 c0                	test   %eax,%eax
  3a:	74 90                	je     0xffffffffffffffcc
  3c:	0f 0b                	ud2    
  3e:	41                   	rex.B
  3f:	8b                   	.byte 0x8b

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2    
   2:	eb b6                	jmp    0xffffffffffffffba
   4:	0f 0b                	ud2    
   6:	41 8b 84 24 4c 01 00 	mov    0x14c(%r12),%eax
   d:	00 
   e:	85 c0                	test   %eax,%eax
  10:	74 90                	je     0xffffffffffffffa2
  12:	0f 0b                	ud2    
  14:	41                   	rex.B
  15:	8b                   	.byte 0x8b
[   18.636246] RSP: 0018:ffffb4f8c005bb08 EFLAGS: 00010202
[   18.637276] RAX: 0000000000000940 RBX: ffff9eaf034a87f0 RCX: 0000000000000000
[   18.638850] RDX: 0000000000000000 RSI: 0000000000000940 RDI: ffff9eaf034a87f0
[   18.640333] RBP: ffff9eaf034a8740 R08: ffff9eaf034a888c R09: 0000000000000000
[   18.641840] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9eaf034a8740
[   18.643179] R13: ffff9eaf03781e00 R14: ffff9eaf034a6114 R15: ffff9eaf034a6100
[   18.645257] FS:  0000000000000000(0000) GS:ffff9eaf7dc00000(0000) knlGS:0000000000000000
[   18.647934] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.649563] CR2: 0000557620ce02c0 CR3: 0000000050610005 CR4: 0000000000370ef0
[   18.651717] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.653879] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.655921] Call Trace:
[   18.656612] __sk_destruct (net/core/sock.c:1797 (discriminator 3)) 
[   18.657530] subflow_ulp_release (./include/net/sock.h:1810) 
[   18.658158] tcp_cleanup_ulp (net/ipv4/tcp_ulp.c:124) 
[   18.658659] tcp_v4_destroy_sock (net/ipv4/tcp_ipv4.c:2234) 
[   18.659217] inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:889 (discriminator 9)) 
[   18.659832] tcp_rcv_state_process (net/ipv4/tcp_input.c:6496) 
[   18.660526] ? security_sock_rcv_skb (security/security.c:2190 (discriminator 19)) 
[   18.661071] ? sk_filter_trim_cap (net/core/filter.c:156) 
[   18.661621] ? tcp_v4_inbound_md5_hash.constprop.0 (net/ipv4/tcp_ipv4.c:1416) 
[   18.662339] tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1716) 
[   18.662843] tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2077) 
[   18.663330] ip_protocol_deliver_rcu (net/ipv4/ip_input.c:204 (discriminator 1)) 
[   18.663903] ip_local_deliver_finish (./include/linux/rcupdate.h:74) 
[   18.664491] ip_local_deliver (./include/linux/netfilter.h:301) 
[   18.664991] ? tcp_v4_early_demux (./include/net/dst.h:469) 
[   18.665533] ? ip_rcv_finish_core.isra.0 (net/ipv4/ip_input.c:342) 
[   18.666217] ip_rcv (./include/linux/netfilter.h:302) 
[   18.666617] __netif_receive_skb_one_core (net/core/dev.c:5436 (discriminator 4)) 
[   18.667297] process_backlog (./include/linux/rcupdate.h:74) 
[   18.667810] __napi_poll (net/core/dev.c:6961) 
[   18.668273] net_rx_action (net/core/dev.c:7030) 
[   18.668777] __do_softirq (./arch/x86/include/asm/jump_label.h:25) 
[   18.669303] run_ksoftirqd (./arch/x86/include/asm/irqflags.h:45) 
[   18.669811] smpboot_thread_fn (kernel/smpboot.c:165) 
[   18.670436] ? sort_range (kernel/smpboot.c:108) 
[   18.670971] kthread (kernel/kthread.c:292) 
[   18.671419] ? kthread_park (kernel/kthread.c:245) 
[   18.671879] ret_from_fork (arch/x86/entry/entry_64.S:294) 
[   18.672354] ---[ end trace 73e3a8cb76102605 ]---
(duration   239ms) [ OK ]
# ns1 MPTCP -> ns3 (dead:beef:2::2:10011) MPTCP	(duration  1585ms) [ OK ]
# ns1 MPTCP -> ns3 (10.0.3.2:10012      ) MPTCP	(duration  1408ms) [ OK ]
# ns1 MPTCP -> ns3 (dead:beef:3::2:10013) MPTCP	(duration   211ms) [ OK ]

@matttbe
Copy link
Member Author

matttbe commented Apr 22, 2021

@pabeni : I was able to reproduce it locally \o/
Do you have any suggestion on what would be useful to extract? :)

@pabeni
Copy link

pabeni commented May 6, 2021

@pabeni : I was able to reproduce it locally \o/
Do you have any suggestion on what would be useful to extract? :)

something as simple as:

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 32a39774075b..1785acd8bfd5 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2637,6 +2637,8 @@ static void __mptcp_destroy_sock(struct sock *sk)
        WARN_ON_ONCE(msk->wmem_reserved);
        WARN_ON_ONCE(msk->rmem_released);
        sk_stream_kill_queues(sk);
+       if (sk->sk_forward_alloc)
+               pr_warn("msk=%p fwd=%d", msk, sk->sk_forward_alloc);
        xfrm_sk_free_policy(sk);

        tcp_cleanup_congestion_control(sk);

@matttbe
Copy link
Member Author

matttbe commented May 6, 2021

@pabeni: thank you for the suggestion!

I got the debug line: MPTCP: msk=000000000469329e fwd=266

 $ ./mptcp_connect.sh -l 0.99% -d 4 -r '94% 99%'
[    6.413292] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
[    6.836783] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth3: link becomes ready
[    7.018194] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth1: link becomes ready
[    7.244303] IPv6: ADDRCONF(NETDEV_CHANGE): ns4eth3: link becomes ready
[    7.249847] IPv6: ADDRCONF(NETDEV_CHANGE): ns3eth4: link becomes ready
# INFO: set ns3-60942489-xeVvp0 dev ns3eth2: ethtool -K  gso off
# INFO: set ns4-60942489-xeVvp0 dev ns4eth3: ethtool -K tso off gso off
# Created /tmp/tmp.nhFXn1i8T6 (size 7654428	/tmp/tmp.nhFXn1i8T6) containing data sent by client
# Created /tmp/tmp.gPDCwooLju (size 7000092	/tmp/tmp.gPDCwooLju) containing data sent by server
# New MPTCP socket can be blocked via sysctl		[ OK ]
# setsockopt(..., TCP_ULP, "mptcp", ...) blocked	[ OK ]
# INFO: validating network environment with pings
[   12.091048] netem: version 1.3
# INFO: Using loss of 0.99% delay 4 ms reorder 94% 99% with delay 1ms on ns3eth4
# ns1 MPTCP -> ns1 (10.0.1.1:10000      ) MPTCP	(duration   442ms) [ OK ]
# ns1 MPTCP -> ns1 (10.0.1.1:10001      ) TCP  	(duration   127ms) [ OK ]
# ns1 TCP   -> ns1 (10.0.1.1:10002      ) MPTCP	(duration   218ms) [ OK ]
# ns1 MPTCP -> ns1 (dead:beef:1::1:10003) MPTCP	(duration   161ms) [ OK ]
# ns1 MPTCP -> ns1 (dead:beef:1::1:10004) TCP  	(duration   133ms) [ OK ]
# ns1 TCP   -> ns1 (dead:beef:1::1:10005) MPTCP	(duration   114ms) [ OK ]
# ns1 MPTCP -> ns2 (10.0.1.2:10006      ) MPTCP	(duration   132ms) [ OK ]
# ns1 MPTCP -> ns2 (dead:beef:1::2:10007) MPTCP	(duration   139ms) [ OK ]
# ns1 MPTCP -> ns2 (10.0.2.1:10008      ) MPTCP	(duration   150ms) [ OK ]
# ns1 MPTCP -> ns2 (dead:beef:2::1:10009) MPTCP	(duration   158ms) [ OK ]
# ns1 MPTCP -> ns3 (10.0.2.2:10010      ) MPTCP	[   24.320970] ------------[ cut here ]------------
[   24.325588] WARNING: CPU: 0 PID: 171 at net/core/stream.c:208 sk_stream_kill_queues (net/core/stream.c:208 (discriminator 1)) 
[   24.333582] Modules linked in: sch_netem
[   24.337218] CPU: 0 PID: 171 Comm: kworker/0:2 Not tainted 5.12.0+ #25
[   24.343110] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[   24.351637] Workqueue: events mptcp_worker
[   24.355617] RIP: 0010:sk_stream_kill_queues (net/core/stream.c:208 (discriminator 1)) 
[ 24.360406] Code: 00 85 c0 75 1f 85 f6 75 21 5b 5d c3 48 89 df e8 73 ec fe ff 8b 83 48 01 00 00 8b b3 00 01 00 00 85 c0 74 e1 0f 0b 85 f6 74 df <0f> 0b 5b 5d c3 0f 0b eb ac 0f 1f 40 00 0f 1f 44 00 00 41 57 41 56
All code
========
   0:	00 85 c0 75 1f 85    	add    %al,-0x7ae08a40(%rbp)
   6:	f6 75 21             	divb   0x21(%rbp)
   9:	5b                   	pop    %rbx
   a:	5d                   	pop    %rbp
   b:	c3                   	retq   
   c:	48 89 df             	mov    %rbx,%rdi
   f:	e8 73 ec fe ff       	callq  0xfffffffffffeec87
  14:	8b 83 48 01 00 00    	mov    0x148(%rbx),%eax
  1a:	8b b3 00 01 00 00    	mov    0x100(%rbx),%esi
  20:	85 c0                	test   %eax,%eax
  22:	74 e1                	je     0x5
  24:	0f 0b                	ud2    
  26:	85 f6                	test   %esi,%esi
  28:	74 df                	je     0x9
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	5b                   	pop    %rbx
  2d:	5d                   	pop    %rbp
  2e:	c3                   	retq   
  2f:	0f 0b                	ud2    
  31:	eb ac                	jmp    0xffffffffffffffdf
  33:	0f 1f 40 00          	nopl   0x0(%rax)
  37:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  3c:	41 57                	push   %r15
  3e:	41 56                	push   %r14

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2    
   2:	5b                   	pop    %rbx
   3:	5d                   	pop    %rbp
   4:	c3                   	retq   
   5:	0f 0b                	ud2    
   7:	eb ac                	jmp    0xffffffffffffffb5
   9:	0f 1f 40 00          	nopl   0x0(%rax)
   d:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  12:	41 57                	push   %r15
  14:	41 56                	push   %r14
[   24.377001] RSP: 0018:ffff97e74035fdf8 EFLAGS: 00010206
[   24.381861] RAX: 0000000000000000 RBX: ffff92aa42f695c0 RCX: 0000000000000000
[   24.388290] RDX: ffffffffbdcf8a50 RSI: 000000000000010a RDI: ffff92aa42f695c0
[   24.394599] RBP: ffff92aa42f69670 R08: 0000000000000000 R09: ffffffffbd049900
[   24.401117] R10: ffff92aa42cd6e00 R11: 0000000000000001 R12: ffff97e74035fe10
[   24.407650] R13: ffff92aa42f69b90 R14: ffff92aabdc2d100 R15: 0000000000000000
[   24.419952] FS:  0000000000000000(0000) GS:ffff92aabdc00000(0000) knlGS:0000000000000000
[   24.427317] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   24.432644] CR2: 000055bfe9b3a9f0 CR3: 0000000002cc6000 CR4: 00000000003506f0
[   24.439157] Call Trace:
[   24.441590] __mptcp_destroy_sock (net/mptcp/protocol.c:2640 (discriminator 3)) 
[   24.445609] mptcp_worker (net/mptcp/protocol.c:2396) 
[   24.449112] process_one_work (kernel/workqueue.c:2275) 
[   24.453011] worker_thread (include/linux/list.h:282) 
[   24.456487] ? process_one_work (kernel/workqueue.c:2364) 
[   24.460430] kthread (kernel/kthread.c:313) 
[   24.463436] ? kthread_park (kernel/kthread.c:266) 
[   24.466792] ret_from_fork (arch/x86/entry/entry_64.S:294) 
[   24.470195] ---[ end trace 130998adc1f76743 ]---
[   24.474711] MPTCP: msk=000000000469329e fwd=266
[   24.475022] ------------[ cut here ]------------
[   24.484046] WARNING: CPU: 0 PID: 10 at net/ipv4/af_inet.c:157 inet_sock_destruct (net/ipv4/af_inet.c:157 (discriminator 1)) 
[   24.491830] Modules linked in: sch_netem
[   24.496108] CPU: 0 PID: 10 Comm: ksoftirqd/0 Tainted: G        W         5.12.0+ #25
[   24.503854] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[   24.512904] RIP: 0010:inet_sock_destruct (net/ipv4/af_inet.c:157 (discriminator 1)) 
[ 24.517998] Code: bc 24 30 01 00 00 5b 41 5c e9 f7 eb f2 ff 41 0f b6 44 24 12 3c 07 74 8f e9 ea fb 21 00 4c 89 e7 e8 20 54 f0 ff e9 71 ff ff ff <0f> 0b eb b6 0f 0b 41 8b 84 24 4c 01 00 00 85 c0 74 90 0f 0b 41 8b
All code
========
   0:	bc 24 30 01 00       	mov    $0x13024,%esp
   5:	00 5b 41             	add    %bl,0x41(%rbx)
   8:	5c                   	pop    %rsp
   9:	e9 f7 eb f2 ff       	jmpq   0xfffffffffff2ec05
   e:	41 0f b6 44 24 12    	movzbl 0x12(%r12),%eax
  14:	3c 07                	cmp    $0x7,%al
  16:	74 8f                	je     0xffffffffffffffa7
  18:	e9 ea fb 21 00       	jmpq   0x21fc07
  1d:	4c 89 e7             	mov    %r12,%rdi
  20:	e8 20 54 f0 ff       	callq  0xfffffffffff05445
  25:	e9 71 ff ff ff       	jmpq   0xffffffffffffff9b
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	eb b6                	jmp    0xffffffffffffffe4
  2e:	0f 0b                	ud2    
  30:	41 8b 84 24 4c 01 00 	mov    0x14c(%r12),%eax
  37:	00 
  38:	85 c0                	test   %eax,%eax
  3a:	74 90                	je     0xffffffffffffffcc
  3c:	0f 0b                	ud2    
  3e:	41                   	rex.B
  3f:	8b                   	.byte 0x8b

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2    
   2:	eb b6                	jmp    0xffffffffffffffba
   4:	0f 0b                	ud2    
   6:	41 8b 84 24 4c 01 00 	mov    0x14c(%r12),%eax
   d:	00 
   e:	85 c0                	test   %eax,%eax
  10:	74 90                	je     0xffffffffffffffa2
  12:	0f 0b                	ud2    
  14:	41                   	rex.B
  15:	8b                   	.byte 0x8b
[   24.539219] RSP: 0018:ffff97e74005bb08 EFLAGS: 00010206
[   24.545482] RAX: 000000000000010a RBX: ffff92aa42f69670 RCX: 0000000000000010
[   24.561087] RDX: 0000000000000000 RSI: 000000000000010a RDI: ffff92aa42f69670
[   24.567758] RBP: ffff92aa42f695c0 R08: ffff92aa42f6970c R09: 0000000000000000
[   24.573760] R10: 0000000000000000 R11: 0000000000000000 R12: ffff92aa42f695c0
[   24.579685] R13: ffff92aa42f26400 R14: ffff92aa443c9914 R15: ffff92aa443c9900
[   24.585715] FS:  0000000000000000(0000) GS:ffff92aabdc00000(0000) knlGS:0000000000000000
[   24.592601] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   24.597588] CR2: 000055bfe9b3a9f0 CR3: 000000004ea10000 CR4: 00000000003506f0
[   24.603650] Call Trace:
[   24.605858] __sk_destruct (net/core/sock.c:1795) 
[   24.609145] subflow_ulp_release (include/net/sock.h:1813) 
[   24.612801] tcp_cleanup_ulp (net/ipv4/tcp_ulp.c:124) 
[   24.616176] tcp_v4_destroy_sock (net/ipv4/tcp_ipv4.c:2234) 
[   24.619902] inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:887 (discriminator 9)) 
[   24.623763] tcp_rcv_state_process (net/ipv4/tcp_input.c:6496) 
[   24.627598] ? security_sock_rcv_skb (security/security.c:2204 (discriminator 19)) 
[   24.631437] ? sk_filter_trim_cap (net/core/filter.c:156) 
[   24.635152] ? tcp_v4_inbound_md5_hash.constprop.0 (net/ipv4/tcp_ipv4.c:1416) 
[   24.640109] tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1716) 
[   24.643409] tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2077) 
[   24.646483] ? icmpv6_rcv (include/linux/skbuff.h:4085) 
[   24.649811] ip_protocol_deliver_rcu (net/ipv4/ip_input.c:204 (discriminator 1)) 
[   24.653709] ip_local_deliver_finish (include/linux/rcupdate.h:74) 
[   24.657511] ip_local_deliver (include/linux/netfilter.h:301) 
[   24.660931] ? tcp_v4_early_demux (include/net/dst.h:469) 
[   24.664789] ? ip_rcv_finish_core.isra.0 (net/ipv4/ip_input.c:342) 
[   24.669087] ip_rcv (include/linux/netfilter.h:302) 
[   24.671775] __netif_receive_skb_one_core (net/core/dev.c:5440 (discriminator 4)) 
[   24.676139] process_backlog (include/linux/rcupdate.h:74) 
[   24.679614] __napi_poll (net/core/dev.c:6966) 
[   24.682659] net_rx_action (net/core/dev.c:7035) 
[   24.685918] __do_softirq (kernel/softirq.c:559) 
[   24.689150] run_ksoftirqd (arch/x86/include/asm/irqflags.h:45) 
[   24.692365] smpboot_thread_fn (kernel/smpboot.c:165) 
[   24.696047] ? sort_range (kernel/smpboot.c:108) 
[   24.699192] kthread (kernel/kthread.c:313) 
[   24.702002] ? kthread_park (kernel/kthread.c:266) 
[   24.705210] ret_from_fork (arch/x86/entry/entry_64.S:294) 
[   24.708340] ---[ end trace 130998adc1f76744 ]---
(duration   405ms) [ OK ]
# ns1 MPTCP -> ns3 (dead:beef:2::2:10011) MPTCP	(duration   166ms) [ OK ]
# ns1 MPTCP -> ns3 (10.0.3.2:10012      ) MPTCP	(duration   166ms) [ OK ]
# ns1 MPTCP -> ns3 (dead:beef:3::2:10013) MPTCP	(duration   190ms) [ OK ]
# ns1 MPTCP -> ns4 (10.0.3.1:10014      ) MPTCP	(duration   431ms) [ OK ]
# ns1 MPTCP -> ns4 (dead:beef:3::1:10015) MPTCP	(duration   226ms) [ OK ]
# ns2 MPTCP -> ns1 (10.0.1.1:10016      ) MPTCP	(duration   138ms) [ OK ]
# ns2 MPTCP -> ns1 (dead:beef:1::1:10017) MPTCP	(duration   123ms) [ OK ]
# ns2 MPTCP -> ns3 (10.0.2.2:10018      ) MPTCP	(duration   299ms) [ OK ]
# ns2 MPTCP -> ns3 (dead:beef:2::2:10019) MPTCP	(duration   775ms) [ OK ]
# ns2 MPTCP -> ns3 (10.0.3.2:10020      ) MPTCP	(duration   169ms) [ OK ]
# ns2 MPTCP -> ns3 (dead:beef:3::2:10021) MPTCP	(duration   882ms) [ OK ]
# ns2 MPTCP -> ns4 (10.0.3.1:10022      ) MPTCP	(duration   273ms) [ OK ]
# ns2 MPTCP -> ns4 (dead:beef:3::1:10023) MPTCP	(duration   185ms) [ OK ]
# ns3 MPTCP -> ns1 (10.0.1.1:10024      ) MPTCP	(duration   335ms) [ OK ]
# ns3 MPTCP -> ns1 (dead:beef:1::1:10025) MPTCP	(duration   176ms) [ OK ]
# ns3 MPTCP -> ns2 (10.0.1.2:10026      ) MPTCP	(duration   376ms) [ OK ]
# ns3 MPTCP -> ns2 (dead:beef:1::2:10027) MPTCP	(duration   871ms) [ OK ]
# ns3 MPTCP -> ns2 (10.0.2.1:10028      ) MPTCP	(duration   197ms) [ OK ]
# ns3 MPTCP -> ns2 (dead:beef:2::1:10029) MPTCP	(duration   194ms) [ OK ]
# ns3 MPTCP -> ns4 (10.0.3.1:10030      ) MPTCP	(duration   122ms) [ OK ]
# ns3 MPTCP -> ns4 (dead:beef:3::1:10031) MPTCP	(duration   139ms) [ OK ]
# ns4 MPTCP -> ns1 (10.0.1.1:10032      ) MPTCP	(duration   241ms) [ OK ]
# ns4 MPTCP -> ns1 (dead:beef:1::1:10033) MPTCP	(duration  1049ms) [ OK ]
# ns4 MPTCP -> ns2 (10.0.1.2:10034      ) MPTCP	(duration  1094ms) [ OK ]
# ns4 MPTCP -> ns2 (dead:beef:1::2:10035) MPTCP	(duration   222ms) [ OK ]
# ns4 MPTCP -> ns2 (10.0.2.1:10036      ) MPTCP	(duration   186ms) [ OK ]
# ns4 MPTCP -> ns2 (dead:beef:2::1:10037) MPTCP	(duration  1527ms) [ OK ]
# ns4 MPTCP -> ns3 (10.0.2.2:10038      ) MPTCP	(duration   156ms) [ OK ]
# ns4 MPTCP -> ns3 (dead:beef:2::2:10039) MPTCP	(duration   161ms) [ OK ]
# ns4 MPTCP -> ns3 (10.0.3.2:10040      ) MPTCP	(duration   147ms) [ OK ]
# ns4 MPTCP -> ns3 (dead:beef:3::2:10041) MPTCP	(duration   154ms) [ OK ]
# INFO: with peek mode: saveWithPeek
# ns1 MPTCP -> ns1 (10.0.1.1:10042      ) MPTCP	(duration   166ms) [ OK ]
# ns1 MPTCP -> ns1 (10.0.1.1:10043      ) TCP  	(duration   113ms) [ OK ]
# ns1 TCP   -> ns1 (10.0.1.1:10044      ) MPTCP	(duration   129ms) [ OK ]
# ns1 MPTCP -> ns1 (dead:beef:1::1:10045) MPTCP	(duration   163ms) [ OK ]
# ns1 MPTCP -> ns1 (dead:beef:1::1:10046) TCP  	(duration   122ms) [ OK ]
# ns1 TCP   -> ns1 (dead:beef:1::1:10047) MPTCP	(duration   114ms) [ OK ]
# INFO: with peek mode: saveAfterPeek
# ns1 MPTCP -> ns1 (10.0.1.1:10048      ) MPTCP	(duration   179ms) [ OK ]
# ns1 MPTCP -> ns1 (10.0.1.1:10049      ) TCP  	(duration   139ms) [ OK ]
# ns1 TCP   -> ns1 (10.0.1.1:10050      ) MPTCP	(duration   117ms) [ OK ]
# ns1 MPTCP -> ns1 (dead:beef:1::1:10051) MPTCP	(duration   164ms) [ OK ]
# ns1 MPTCP -> ns1 (dead:beef:1::1:10052) TCP  	(duration   114ms) [ OK ]
# ns1 TCP   -> ns1 (dead:beef:1::1:10053) MPTCP	(duration   127ms) [ OK ]
# Time: 74 seconds
++ rc=0

@pabeni
Copy link

pabeni commented May 13, 2021

not sure if the following is the root cause, but looks like currently we could update sk_forward_memory without owning the mptcp_data_lock() in:

mptcp_worker() -> __mptcp_retrans() -> __mptcp_clean_una_wakeup() -> __mptcp_clean_una() [memory pressure] -> __mptcp_update_wmem()

@matttbe: could you please test the following patch:

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 2d21a4793d9d..f1eee770608a 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2299,7 +2299,6 @@ static void __mptcp_retrans(struct sock *sk)
        struct sock *ssk;
        int ret;
 
-       __mptcp_clean_una_wakeup(sk);
        dfrag = mptcp_rtx_head(sk);
        if (!dfrag) {
                if (mptcp_data_fin_enabled(msk)) {

I think calling clean_una there is not required, will be done shortly later at msk socket lock release time.

Additionally, could you please add pr_warn() just around all *reclaim_partial() calls ?

@pabeni pabeni self-assigned this May 13, 2021
@matttbe
Copy link
Member Author

matttbe commented May 13, 2021

@pabeni : thank you for the patch!

After 10 attempts, I got this error:

+ ./mptcp_connect.sh -l 0.99% -d 4 -r '94% 99%'
[    5.179050] ip (163) used greatest stack depth: 12208 bytes left
[    5.377081] ip (169) used greatest stack depth: 12160 bytes left
[    5.412652] ip (170) used greatest stack depth: 12096 bytes left
[    5.712754] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
[    6.039750] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth3: link becomes ready
[    6.350804] IPv6: ADDRCONF(NETDEV_CHANGE): ns3eth4: link becomes ready
[    6.410865] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth1: link becomes ready
# INFO: set ns3-609d424c-QKZygJ dev ns3eth2: ethtool -K  gso off gro off
# INFO: set ns4-609d424c-QKZygJ dev ns4eth3: ethtool -K  gro off
# Created /tmp/tmp.aQDAyu3nPM (size 466972      /tmp/tmp.aQDAyu3nPM) containing data sent by client
# Created /tmp/tmp.0zrgihCS8q (size 7264284     /tmp/tmp.0zrgihCS8q) containing data sent by server
# New MPTCP socket can be blocked via sysctl            [ OK ]
# setsockopt(..., TCP_ULP, "mptcp", ...) blocked        [ OK ]
# INFO: validating network environment with pings
[   10.671291] netem: version 1.3
# INFO: Using loss of 0.99% delay 4 ms reorder 94% 99% with delay 1ms on ns3eth4
# ns1 MPTCP -> ns1 (10.0.1.1:10000      ) MPTCP (duration   111ms) [ OK ]
# ns1 MPTCP -> ns1 (10.0.1.1:10001      ) TCP   (duration    98ms) [ OK ]
# ns1 TCP   -> ns1 (10.0.1.1:10002      ) MPTCP (duration   105ms) [ OK ]
# ns1 MPTCP -> ns1 (dead:beef:1::1:10003) MPTCP (duration   150ms) [ OK ]
# ns1 MPTCP -> ns1 (dead:beef:1::1:10004) TCP   (duration   109ms) [ OK ]
# ns1 TCP   -> ns1 (dead:beef:1::1:10005) MPTCP (duration   106ms) [ OK ]
# ns1 MPTCP -> ns2 (10.0.1.2:10006      ) MPTCP (duration   136ms) [ OK ]
# ns1 MPTCP -> ns2 (dead:beef:1::2:10007) MPTCP (duration   124ms) [ OK ]
# ns1 MPTCP -> ns2 (10.0.2.1:10008      ) MPTCP (duration   104ms) [ OK ]
# ns1 MPTCP -> ns2 (dead:beef:2::1:10009) MPTCP (duration   125ms) [ OK ]
# ns1 MPTCP -> ns3 (10.0.2.2:10010      ) MPTCP (duration   150ms) [ OK ]
# ns1 MPTCP -> ns3 (dead:beef:2::2:10011) MPTCP (duration   146ms) [ OK ]
# ns1 MPTCP -> ns3 (10.0.3.2:10012      ) MPTCP (duration   154ms) [ OK ]
# ns1 MPTCP -> ns3 (dead:beef:3::2:10013) MPTCP (duration   162ms) [ OK ]
# ns1 MPTCP -> ns4 (10.0.3.1:10014      ) MPTCP write: Connection reset by peer
# copyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 4)
# (duration 30147ms) [ FAIL ] client exit code 111, server 2
#
# netns ns4-609d424c-QKZygJ socket stat for 10014:
# Netid State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
# TcpPassiveOpens                 1                  0.0
# TcpEstabResets                  1                  0.0
# TcpInSegs                       27                 0.0
# TcpOutSegs                      51                 0.0
# TcpOutRsts                      12                 0.0
# TcpExtDelayedACKLost            2                  0.0
# TcpExtTCPHPHits                 2                  0.0
# TcpExtTCPPureAcks               6                  0.0
# TcpExtTCPDSACKOldSent           2                  0.0
# TcpExtTCPRcvCoalesce            1                  0.0
# TcpExtTCPOFOQueue               3                  0.0
# TcpExtTCPOrigDataSent           30                 0.0
# TcpExtTCPDelivered              10                 0.0
# MPTcpExtMPCapableSYNRX          1                  0.0
# MPTcpExtMPCapableACKRX          1                  0.0
#
# netns ns1-609d424c-QKZygJ socket stat for 10014:
# Netid State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
# TcpActiveOpens                  1                  0.0
# TcpEstabResets                  1                  0.0
# TcpInSegs                       32                 0.0
# TcpOutSegs                      43                 0.0
# TcpRetransSegs                  3                  0.0
# TcpExtTCPPureAcks               8                  0.0
# TcpExtTCPSackRecovery           1                  0.0
# TcpExtTCPFullUndo               1                  0.0
# TcpExtTCPFastRetrans            3                  0.0
# TcpExtTCPDSACKRecv              2                  0.0
# TcpExtTCPDSACKIgnoredNoUndo     2                  0.0
# TcpExtTCPSackMerged             1                  0.0
# TcpExtTCPSackShiftFallback      2                  0.0
# TcpExtTCPOrigDataSent           30                 0.0
# TcpExtTCPDelivered              19                 0.0
# TcpExtTCPDSACKRecvSegs          3                  0.0
# MPTcpExtMPCapableSYNTX          1                  0.0
# MPTcpExtMPCapableSYNACKRX       1                  0.0
#
# ns1 MPTCP -> ns4 (dead:beef:3::1:10015) MPTCP (duration   152ms) [ OK ]
# ns2 MPTCP -> ns1 (10.0.1.1:10016      ) MPTCP (duration   110ms) [ OK ]
# FAIL: Could not even run loopback test

It looks different, no?

matttbe pushed a commit that referenced this issue May 17, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup() -> __mptcp_update_wmem()

We can hit the above only under memory pressure. When that happen,
forward memory accounting could be corrupted, as reported by Matt.

Address the issue creating a new variant of __mptcp_clean_una_wakeup()
which handle fwd memory updates with the proper care and invoking
such new helper in the relevant code path.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
MPTCP Bugs automation moved this from Needs triage to Closed May 17, 2021
jenkins-tessares pushed a commit that referenced this issue May 18, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup() -> __mptcp_update_wmem()

We can hit the above only under memory pressure. When that happen,
forward memory accounting could be corrupted, as reported by Matt.

Address the issue creating a new variant of __mptcp_clean_una_wakeup()
which handle fwd memory updates with the proper care and invoking
such new helper in the relevant code path.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
jenkins-tessares pushed a commit that referenced this issue May 19, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup() -> __mptcp_update_wmem()

We can hit the above only under memory pressure. When that happen,
forward memory accounting could be corrupted, as reported by Matt.

Address the issue creating a new variant of __mptcp_clean_una_wakeup()
which handle fwd memory updates with the proper care and invoking
such new helper in the relevant code path.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
@matttbe
Copy link
Member Author

matttbe commented May 19, 2021

When trying to reproduce #192 with perf, I got this warning after 14 hours of tests:

++ date -R
Wed, 19 May 2021 05:39:24 +0000
++ echo '=== 657 ==='
=== 657 ===
++ i=658
++ true
++ perf record -ag -e probe:subflow_error_report ./mptcp_connect.sh -l 0.99% -d 4 -r '94% 99%'
Couldn't synthesize bpf events.
[48785.949826] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
[48786.267177] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth3: link becomes ready
[48786.575106] IPv6: ADDRCONF(NETDEV_CHANGE): ns3eth4: link becomes ready
[48786.663944] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth1: link becomes ready
INFO: set ns3-60a4a48d-NkQUko dev ns3eth2: ethtool -K tso off gso off
INFO: set ns4-60a4a48d-NkQUko dev ns4eth3: ethtool -K tso off gso off
Created /tmp/tmp.f8GXxnN2CE (size 8072220       /tmp/tmp.f8GXxnN2CE) containing data sent by client
Created /tmp/tmp.w9PXhxQXbf (size 8135708       /tmp/tmp.w9PXhxQXbf) containing data sent by server
New MPTCP socket can be blocked via sysctl              [ OK ]
setsockopt(..., TCP_ULP, "mptcp", ...) blocked  [ OK ]
INFO: validating network environment with pings
INFO: Using loss of 0.99% delay 4 ms reorder 94% 99% with delay 1ms on ns3eth4
ns1 MPTCP -> ns1 (10.0.1.1:10000      ) MPTCP   (duration   184ms) [ OK ]
ns1 MPTCP -> ns1 (10.0.1.1:10001      ) TCP     (duration   129ms) [ OK ]
ns1 TCP   -> ns1 (10.0.1.1:10002      ) MPTCP   (duration   130ms) [ OK ]
ns1 MPTCP -> ns1 (dead:beef:1::1:10003) MPTCP   (duration   165ms) [ OK ]
ns1 MPTCP -> ns1 (dead:beef:1::1:10004) TCP     (duration   104ms) [ OK ]
ns1 TCP   -> ns1 (dead:beef:1::1:10005) MPTCP   (duration   127ms) [ OK ]
ns1 MPTCP -> ns2 (10.0.1.2:10006      ) MPTCP   (duration   112ms) [ OK ]
ns1 MPTCP -> ns2 (dead:beef:1::2:10007) MPTCP   (duration   119ms) [ OK ]
ns1 MPTCP -> ns2 (10.0.2.1:10008      ) MPTCP   (duration   207ms) [ OK ]
ns1 MPTCP -> ns2 (dead:beef:2::1:10009) MPTCP   (duration   128ms) [ OK ]
ns1 MPTCP -> ns3 (10.0.2.2:10010      ) MPTCP   (duration   216ms) [ OK ]
ns1 MPTCP -> ns3 (dead:beef:2::2:10011) MPTCP   (duration   269ms) [ OK ]
ns1 MPTCP -> ns3 (10.0.3.2:10012      ) MPTCP   (duration  1032ms) [ OK ]
ns1 MPTCP -> ns3 (dead:beef:3::2:10013) MPTCP   (duration  1242ms) [ OK ]
ns1 MPTCP -> ns4 (10.0.3.1:10014      ) MPTCP   (duration   248ms) [ OK ]
ns1 MPTCP -> ns4 (dead:beef:3::1:10015) MPTCP   (duration   255ms) [ OK ]
ns2 MPTCP -> ns1 (10.0.1.1:10016      ) MPTCP   (duration   177ms) [ OK ]
ns2 MPTCP -> ns1 (dead:beef:1::1:10017) MPTCP   (duration   155ms) [ OK ]
ns2 MPTCP -> ns3 (10.0.2.2:10018      ) MPTCP   (duration   610ms) [ OK ]
ns2 MPTCP -> ns3 (dead:beef:2::2:10019) MPTCP   (duration   181ms) [ OK ]
ns2 MPTCP -> ns3 (10.0.3.2:10020      ) MPTCP   (duration   186ms) [ OK ]
ns2 MPTCP -> ns3 (dead:beef:3::2:10021) MPTCP   [48819.680850] ------------[ cut here ]------------
[48819.684756] WARNING: CPU: 0 PID: 20506 at net/core/stream.c:208 sk_stream_kill_queues (net/core/stream.c:208 (discriminator 1))
[48819.691732] Modules linked in: sch_netem
[48819.695208] CPU: 0 PID: 20506 Comm: kworker/0:1 Tainted: G        W         5.12.0+ #535
[48819.701704] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[48819.709147] Workqueue: events mptcp_worker
[48819.712802] RIP: 0010:sk_stream_kill_queues (net/core/stream.c:208 (discriminator 1))
[48819.717784] Code: 00 85 c0 75 1f 85 f6 75 21 5b 5d c3 48 89 df e8 83 ec fe ff 8b 83 48 01 00 00 8b b3 00 01 00 00 85 c0 74 e1 0f 0b 85 f6 74 df <0f> 0b 5b 5d c3 0f 0b eb ac 0f 1f 40 00 0f 1f 44 00 00 41 57 41 56
All code
========
   0:   00 85 c0 75 1f 85       add    %al,-0x7ae08a40(%rbp)
   6:   f6 75 21                divb   0x21(%rbp)
   9:   5b                      pop    %rbx
   a:   5d                      pop    %rbp
   b:   c3                      retq
   c:   48 89 df                mov    %rbx,%rdi
   f:   e8 83 ec fe ff          callq  0xfffffffffffeec97
  14:   8b 83 48 01 00 00       mov    0x148(%rbx),%eax
  1a:   8b b3 00 01 00 00       mov    0x100(%rbx),%esi
  20:   85 c0                   test   %eax,%eax
  22:   74 e1                   je     0x5
  24:   0f 0b                   ud2
  26:   85 f6                   test   %esi,%esi
  28:   74 df                   je     0x9
  2a:*  0f 0b                   ud2             <-- trapping instruction
  2c:   5b                      pop    %rbx
  2d:   5d                      pop    %rbp
  2e:   c3                      retq
  2f:   0f 0b                   ud2
  31:   eb ac                   jmp    0xffffffffffffffdf
  33:   0f 1f 40 00             nopl   0x0(%rax)
  37:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  3c:   41 57                   push   %r15
  3e:   41 56                   push   %r14

Code starting with the faulting instruction
===========================================
   0:   0f 0b                   ud2
   2:   5b                      pop    %rbx
   3:   5d                      pop    %rbp
   4:   c3                      retq
   5:   0f 0b                   ud2
   7:   eb ac                   jmp    0xffffffffffffffb5
   9:   0f 1f 40 00             nopl   0x0(%rax)
   d:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  12:   41 57                   push   %r15
  14:   41 56                   push   %r14
[48819.733525] RSP: 0018:ffffa9a3c2f83df8 EFLAGS: 00010202
[48819.738067] RAX: 0000000000000000 RBX: ffff9d84839d6c80 RCX: 0000000000000007
[48819.744233] RDX: ffffffff828f8570 RSI: 00000000000000d0 RDI: ffff9d84839d6c80
[48819.750358] RBP: ffff9d84839d6d30 R08: 0000000000000000 R09: 0000000000000001
[48819.756450] R10: 000000000000000c R11: 0000000000000000 R12: ffffa9a3c2f83e10
[48819.762587] R13: ffff9d84839d7250 R14: ffff9d84fdc2d100 R15: 0000000000000000
[48819.768785] FS:  0000000000000000(0000) GS:ffff9d84fdc00000(0000) knlGS:0000000000000000
[48819.775684] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[48819.780749] CR2: 00007ffcec3ac010 CR3: 0000000003b20000 CR4: 00000000003506f0
[48819.786873] Call Trace:
[48819.789202] __mptcp_destroy_sock (include/net/xfrm.h:1185 (discriminator 3))
[48819.793072] mptcp_worker (net/mptcp/protocol.c:2444)
[48819.796312] process_one_work (kernel/workqueue.c:2275)
[48819.806351] worker_thread (include/linux/list.h:282)
[48819.809614] ? process_one_work (kernel/workqueue.c:2364)
[48819.813318] kthread (kernel/kthread.c:313)
[48819.816165] ? kthread_park (kernel/kthread.c:266)
[48819.819398] ret_from_fork (arch/x86/entry/entry_64.S:294)
[48819.822609] ---[ end trace 1e87d4d0a1c0c377 ]---
[48819.826807] ------------[ cut here ]------------
[48819.831222] WARNING: CPU: 0 PID: 10 at net/ipv4/af_inet.c:157 inet_sock_destruct (net/ipv4/af_inet.c:157 (discriminator 1))
[48819.838426] Modules linked in: sch_netem
[48819.842660] CPU: 0 PID: 10 Comm: ksoftirqd/0 Tainted: G        W         5.12.0+ #535
[48819.850990] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[48819.860224] RIP: 0010:inet_sock_destruct (net/ipv4/af_inet.c:157 (discriminator 1))
[48819.865453] Code: bc 24 30 01 00 00 5b 41 5c e9 c7 e4 f2 ff 41 0f b6 44 24 12 3c 07 74 8f e9 26 fb 21 00 4c 89 e7 e8 10 4d f0 ff e9 71 ff ff ff <0f> 0b eb b6 0f 0b 41 8b 84 24 4c 01 00 00 85 c0 74 90 0f 0b 41 8b
All code
========
   0:   bc 24 30 01 00          mov    $0x13024,%esp
   5:   00 5b 41                add    %bl,0x41(%rbx)
   8:   5c                      pop    %rsp
   9:   e9 c7 e4 f2 ff          jmpq   0xfffffffffff2e4d5
   e:   41 0f b6 44 24 12       movzbl 0x12(%r12),%eax
  14:   3c 07                   cmp    $0x7,%al
  16:   74 8f                   je     0xffffffffffffffa7
  18:   e9 26 fb 21 00          jmpq   0x21fb43
  1d:   4c 89 e7                mov    %r12,%rdi
  20:   e8 10 4d f0 ff          callq  0xfffffffffff04d35
  25:   e9 71 ff ff ff          jmpq   0xffffffffffffff9b
  2a:*  0f 0b                   ud2             <-- trapping instruction
  2c:   eb b6                   jmp    0xffffffffffffffe4
  2e:   0f 0b                   ud2
  30:   41 8b 84 24 4c 01 00    mov    0x14c(%r12),%eax
  37:   00
  38:   85 c0                   test   %eax,%eax
  3a:   74 90                   je     0xffffffffffffffcc
  3c:   0f 0b                   ud2
  3e:   41                      rex.B
  3f:   8b                      .byte 0x8b

Code starting with the faulting instruction
===========================================
   0:   0f 0b                   ud2
   2:   eb b6                   jmp    0xffffffffffffffba
   4:   0f 0b                   ud2
   6:   41 8b 84 24 4c 01 00    mov    0x14c(%r12),%eax
   d:   00
   e:   85 c0                   test   %eax,%eax
  10:   74 90                   je     0xffffffffffffffa2
  12:   0f 0b                   ud2
  14:   41                      rex.B
  15:   8b                      .byte 0x8b
[48819.888362] RSP: 0018:ffffa9a3c005ba98 EFLAGS: 00010202
[48819.894348] RAX: 00000000000000d0 RBX: ffff9d84839d6d30 RCX: 000000000000cf44
[48819.902022] RDX: 0000000000000000 RSI: 00000000000000d0 RDI: ffff9d84839d6d30
[48819.908178] RBP: ffff9d84839d6c80 R08: ffff9d84839d6dcc R09: ffffffff81a53a00
[48819.914335] R10: ffff9d848393f680 R11: 0000000000000001 R12: ffff9d84839d6c80
[48819.920180] R13: ffff9d8483b4ccb8 R14: ffff9d8483b4ccb8 R15: ffff9d8483b4cc90
[48819.926053] FS:  0000000000000000(0000) GS:ffff9d84fdc00000(0000) knlGS:0000000000000000
[48819.932668] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[48819.937415] CR2: 00007ffcec3ac010 CR3: 0000000074c10000 CR4: 00000000003506f0
[48819.943321] Call Trace:
[48819.945526] __sk_destruct (net/core/sock.c:1808)
[48819.948678] subflow_ulp_release (include/net/sock.h:1813)
[48819.952089] tcp_cleanup_ulp (net/ipv4/tcp_ulp.c:124)
[48819.955265] tcp_v4_destroy_sock (net/ipv4/tcp_ipv4.c:2237)
[48819.958780] tcp_v6_destroy_sock (net/ipv6/tcp_ipv6.c:1947)
[48819.962307] inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:887 (discriminator 9))
[48819.966294] tcp_reset (arch/x86/include/asm/bitops.h:207)
[48819.969058] tcp_validate_incoming (net/ipv4/tcp_input.c:5656)
[48819.972750] tcp_rcv_state_process (net/ipv4/tcp_input.c:6384)
[48819.976474] ? security_sock_rcv_skb (security/security.c:2241 (discriminator 19))
[48819.980219] ? sk_filter_trim_cap (net/core/filter.c:156)
[48819.983777] tcp_v6_do_rcv (net/ipv6/tcp_ipv6.c:1527)
[48819.986937] tcp_v6_rcv (net/ipv6/tcp_ipv6.c:1736)
[48819.996929] ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:424 (discriminator 8))
[48820.001045] ip6_input_finish (include/linux/rcupdate.h:74)
[48820.004679] ip6_input (include/linux/netfilter.h:301)
[48820.007449] ? ip6_dst_check (net/ipv6/route.c:2625)
[48820.010630] ? tcp_v6_early_demux (include/net/dst.h:469)
[48820.014248] ipv6_rcv (include/linux/netfilter.h:302)
[48820.016957] __netif_receive_skb_one_core (net/core/dev.c:5443)
[48820.021067] process_backlog (include/linux/rcupdate.h:74)
[48820.024301] __napi_poll (net/core/dev.c:6973)
[48820.027289] net_rx_action (net/core/dev.c:7042)
[48820.030477] __do_softirq (kernel/softirq.c:559)
[48820.033564] run_ksoftirqd (arch/x86/include/asm/irqflags.h:45)
[48820.036599] smpboot_thread_fn (kernel/smpboot.c:165)
[48820.040005] ? sort_range (kernel/smpboot.c:108)
[48820.043004] kthread (kernel/kthread.c:313)
[48820.045730] ? kthread_park (kernel/kthread.c:266)
[48820.048850] ret_from_fork (arch/x86/entry/entry_64.S:294)
[48820.051874] ---[ end trace 1e87d4d0a1c0c378 ]---
(duration   361ms) [ OK ]
ns2 MPTCP -> ns4 (10.0.3.1:10022      ) MPTCP   (duration   454ms) [ OK ]
ns2 MPTCP -> ns4 (dead:beef:3::1:10023) MPTCP   (duration   212ms) [ OK ]
ns3 MPTCP -> ns1 (10.0.1.1:10024      ) MPTCP   (duration   247ms) [ OK ]
ns3 MPTCP -> ns1 (dead:beef:1::1:10025) MPTCP   (duration   196ms) [ OK ]
ns3 MPTCP -> ns2 (10.0.1.2:10026      ) MPTCP   (duration   263ms) [ OK ]
ns3 MPTCP -> ns2 (dead:beef:1::2:10027) MPTCP   (duration   779ms) [ OK ]
ns3 MPTCP -> ns2 (10.0.2.1:10028      ) MPTCP   (duration   228ms) [ OK ]
ns3 MPTCP -> ns2 (dead:beef:2::1:10029) MPTCP   (duration   246ms) [ OK ]
ns3 MPTCP -> ns4 (10.0.3.1:10030      ) MPTCP   (duration   185ms) [ OK ]
ns3 MPTCP -> ns4 (dead:beef:3::1:10031) MPTCP   (duration   205ms) [ OK ]
ns4 MPTCP -> ns1 (10.0.1.1:10032      ) MPTCP   (duration   388ms) [ OK ]
ns4 MPTCP -> ns1 (dead:beef:1::1:10033) MPTCP   (duration   219ms) [ OK ]
ns4 MPTCP -> ns2 (10.0.1.2:10034      ) MPTCP   (duration  1032ms) [ OK ]
ns4 MPTCP -> ns2 (dead:beef:1::2:10035) MPTCP   (duration  1138ms) [ OK ]
ns4 MPTCP -> ns2 (10.0.2.1:10036      ) MPTCP   (duration   243ms) [ OK ]
ns4 MPTCP -> ns2 (dead:beef:2::1:10037) MPTCP   (duration   208ms) [ OK ]
ns4 MPTCP -> ns3 (10.0.2.2:10038      ) MPTCP   (duration   181ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:2::2:10039) MPTCP   (duration   217ms) [ OK ]
ns4 MPTCP -> ns3 (10.0.3.2:10040      ) MPTCP   (duration   163ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:3::2:10041) MPTCP   (duration   165ms) [ OK ]
INFO: with peek mode: saveWithPeek
ns1 MPTCP -> ns1 (10.0.1.1:10042      ) MPTCP   (duration   175ms) [ OK ]
ns1 MPTCP -> ns1 (10.0.1.1:10043      ) TCP     (duration   316ms) [ OK ]
ns1 TCP   -> ns1 (10.0.1.1:10044      ) MPTCP   (duration   119ms) [ OK ]
ns1 MPTCP -> ns1 (dead:beef:1::1:10045) MPTCP   (duration   211ms) [ OK ]
ns1 MPTCP -> ns1 (dead:beef:1::1:10046) TCP     (duration   121ms) [ OK ]
ns1 TCP   -> ns1 (dead:beef:1::1:10047) MPTCP   (duration   116ms) [ OK ]
INFO: with peek mode: saveAfterPeek
ns1 MPTCP -> ns1 (10.0.1.1:10048      ) MPTCP   (duration   170ms) [ OK ]
ns1 MPTCP -> ns1 (10.0.1.1:10049      ) TCP     (duration   112ms) [ OK ]
ns1 TCP   -> ns1 (10.0.1.1:10050      ) MPTCP   (duration   115ms) [ OK ]
ns1 MPTCP -> ns1 (dead:beef:1::1:10051) MPTCP   (duration   180ms) [ OK ]
ns1 MPTCP -> ns1 (dead:beef:1::1:10052) TCP     (duration   123ms) [ OK ]
ns1 TCP   -> ns1 (dead:beef:1::1:10053) MPTCP   (duration   142ms) [ OK ]
Time: 77 seconds
[ perf record: Woken up 15 times to write data ]
failed to mmap file
[ perf record: Captured and wrote 3.747 MB perf.data ]
++ date -R
Wed, 19 May 2021 05:40:42 +0000
++ echo '=== 658 ==='
=== 658 ===
++ i=659
++ true
++ perf record -ag -e probe:subflow_error_report ./mptcp_connect.sh -l 0.99% -d 4 -r '94% 99%'
Couldn't synthesize bpf events.
[48863.720409] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
[48864.019158] IPv6: ADDRCONF(NETDEV_CHANGE): ns3eth2: link becomes ready

On top of yesterday's export branch (t/upstream but that's the same):

66f500c3e329 (HEAD -> tmp) mptcp: update selftest for fallback due to OoO
2eda92d4479d mptcp: do not reset MP_CAPABLE subflow on mapping errors
e03f1c973fc8 mptcp: always parse mptcp options for MPC reqsk
6273c44bc200 (origin/t/upstream, t/upstream) tgupdate: merge t/upstream base into t/upstream

In perf.zip file, we can file:

  • perf.data: for the attempt 658, the one AFTER the warning but mptcp_connect.sh has failed
  • perf.data.old: attempt 657, the one with the warning

@matttbe
Copy link
Member Author

matttbe commented May 19, 2021

perf report doesn't report anything for perf.data.old but it does with perf.data:

# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 2  of event 'probe:subflow_error_report'
# Event count (approx.): 2
#
# Children      Self  Command  Shared Object                Symbol                             
# ........  ........  .......  ...........................  ...................................
#
   100.00%   100.00%  swapper  [kernel.kallsyms]            [k] subflow_error_report
            |
            ---secondary_startup_64_no_verify
               |          
               |--50.00%--start_kernel
               |          cpu_startup_entry
               |          do_idle
               |          default_idle_call
               |          default_idle
               |          asm_sysvec_apic_timer_interrupt
               |          sysvec_apic_timer_interrupt
               |          irq_exit_rcu
               |          __do_softirq
               |          net_rx_action
               |          __napi_poll
               |          process_backlog
               |          __netif_receive_skb_one_core
               |          ipv6_rcv
               |          ip6_input
               |          ip6_input_finish
               |          ip6_protocol_deliver_rcu
               |          tcp_v6_rcv
               |          tcp_v6_do_rcv
               |          tcp_rcv_established
               |          tcp_validate_incoming
               |          subflow_error_report
               |          
                --50.00%--start_secondary
                          cpu_startup_entry
                          do_idle
                          default_idle_call
                          default_idle
                          asm_sysvec_apic_timer_interrupt
                          sysvec_apic_timer_interrupt
                          irq_exit_rcu
                          __do_softirq
                          net_rx_action
                          __napi_poll
                          process_backlog
                          __netif_receive_skb_one_core
                          ipv6_rcv
                          ip6_input
                          ip6_input_finish
                          ip6_protocol_deliver_rcu
                          tcp_v6_rcv
                          tcp_v6_do_rcv
                          tcp_rcv_established
                          tcp_validate_incoming
                          subflow_error_report

   100.00%     0.00%  swapper  [kernel.kallsyms]            [k] secondary_startup_64_no_verify
            |
            ---secondary_startup_64_no_verify
               |          
               |--50.00%--start_kernel
               |          cpu_startup_entry
               |          do_idle
               |          default_idle_call
               |          default_idle
               |          asm_sysvec_apic_timer_interrupt
               |          sysvec_apic_timer_interrupt
               |          irq_exit_rcu
               |          __do_softirq
               |          net_rx_action
               |          __napi_poll
               |          process_backlog
               |          __netif_receive_skb_one_core
               |          ipv6_rcv
               |          ip6_input
               |          ip6_input_finish
               |          ip6_protocol_deliver_rcu
               |          tcp_v6_rcv
               |          tcp_v6_do_rcv
               |          tcp_rcv_established
               |          tcp_validate_incoming
               |          subflow_error_report
               |          
                --50.00%--start_secondary
                          cpu_startup_entry
                          do_idle
                          default_idle_call
                          default_idle
                          asm_sysvec_apic_timer_interrupt
                          sysvec_apic_timer_interrupt
                          irq_exit_rcu
                          __do_softirq
                          net_rx_action
                          __napi_poll
                          process_backlog
                          __netif_receive_skb_one_core
                          ipv6_rcv
                          ip6_input
                          ip6_input_finish
                          ip6_protocol_deliver_rcu
                          tcp_v6_rcv
                          tcp_v6_do_rcv
                          tcp_rcv_established
                          tcp_validate_incoming
                          subflow_error_report

   100.00%     0.00%  swapper  [kernel.kallsyms]            [k] cpu_startup_entry
            |
            ---cpu_startup_entry
               do_idle
               default_idle_call
               default_idle
               asm_sysvec_apic_timer_interrupt
               sysvec_apic_timer_interrupt
               irq_exit_rcu
               __do_softirq
               net_rx_action
               __napi_poll
               process_backlog
               __netif_receive_skb_one_core
               ipv6_rcv
               ip6_input
               ip6_input_finish
               ip6_protocol_deliver_rcu
               tcp_v6_rcv
               tcp_v6_do_rcv
               tcp_rcv_established
               tcp_validate_incoming
               subflow_error_report

   100.00%     0.00%  swapper  [kernel.kallsyms]            [k] do_idle
            |
            ---do_idle
               default_idle_call
               default_idle
               asm_sysvec_apic_timer_interrupt
               sysvec_apic_timer_interrupt
               irq_exit_rcu
               __do_softirq
               net_rx_action
               __napi_poll
               process_backlog
               __netif_receive_skb_one_core
               ipv6_rcv
               ip6_input
               ip6_input_finish
               ip6_protocol_deliver_rcu
               tcp_v6_rcv
               tcp_v6_do_rcv
               tcp_rcv_established
               tcp_validate_incoming
               subflow_error_report

   100.00%     0.00%  swapper  [kernel.kallsyms]            [k] default_idle_call
            |
            ---default_idle_call
               default_idle
               asm_sysvec_apic_timer_interrupt
               sysvec_apic_timer_interrupt
               irq_exit_rcu
               __do_softirq
               net_rx_action
               __napi_poll
               process_backlog
               __netif_receive_skb_one_core
               ipv6_rcv
               ip6_input
               ip6_input_finish
               ip6_protocol_deliver_rcu
               tcp_v6_rcv
               tcp_v6_do_rcv
               tcp_rcv_established
               tcp_validate_incoming
               subflow_error_report

   100.00%     0.00%  swapper  [kernel.kallsyms]            [k] default_idle
            |
            ---default_idle
               asm_sysvec_apic_timer_interrupt
               sysvec_apic_timer_interrupt
               irq_exit_rcu
               __do_softirq
               net_rx_action
               __napi_poll
               process_backlog
               __netif_receive_skb_one_core
               ipv6_rcv
               ip6_input
               ip6_input_finish
               ip6_protocol_deliver_rcu
               tcp_v6_rcv
               tcp_v6_do_rcv
               tcp_rcv_established
               tcp_validate_incoming
               subflow_error_report

   100.00%     0.00%  swapper  [kernel.kallsyms]            [k] asm_sysvec_apic_timer_interrupt
            |
            ---asm_sysvec_apic_timer_interrupt
               sysvec_apic_timer_interrupt
               irq_exit_rcu
               __do_softirq
               net_rx_action
               __napi_poll
               process_backlog
               __netif_receive_skb_one_core
               ipv6_rcv
               ip6_input
               ip6_input_finish
               ip6_protocol_deliver_rcu
               tcp_v6_rcv
               tcp_v6_do_rcv
               tcp_rcv_established
               tcp_validate_incoming
               subflow_error_report

   100.00%     0.00%  swapper  [kernel.kallsyms]            [k] sysvec_apic_timer_interrupt
            |
            ---sysvec_apic_timer_interrupt
               irq_exit_rcu
               __do_softirq
               net_rx_action
               __napi_poll
               process_backlog
               __netif_receive_skb_one_core
               ipv6_rcv
               ip6_input
               ip6_input_finish
               ip6_protocol_deliver_rcu
               tcp_v6_rcv
               tcp_v6_do_rcv
               tcp_rcv_established
               tcp_validate_incoming
               subflow_error_report

   100.00%     0.00%  swapper  [kernel.kallsyms]            [k] irq_exit_rcu
            |
            ---irq_exit_rcu
               __do_softirq
               net_rx_action
               __napi_poll
               process_backlog
               __netif_receive_skb_one_core
               ipv6_rcv
               ip6_input
               ip6_input_finish
               ip6_protocol_deliver_rcu
               tcp_v6_rcv
               tcp_v6_do_rcv
               tcp_rcv_established
               tcp_validate_incoming
               subflow_error_report

   100.00%     0.00%  swapper  [kernel.kallsyms]            [k] __do_softirq
            |
            ---__do_softirq
               net_rx_action
               __napi_poll
               process_backlog
               __netif_receive_skb_one_core
               ipv6_rcv
               ip6_input
               ip6_input_finish
               ip6_protocol_deliver_rcu
               tcp_v6_rcv
               tcp_v6_do_rcv
               tcp_rcv_established
               tcp_validate_incoming
               subflow_error_report

   100.00%     0.00%  swapper  [kernel.kallsyms]            [k] net_rx_action
            |
            ---net_rx_action
               __napi_poll
               process_backlog
               __netif_receive_skb_one_core
               ipv6_rcv
               ip6_input
               ip6_input_finish
               ip6_protocol_deliver_rcu
               tcp_v6_rcv
               tcp_v6_do_rcv
               tcp_rcv_established
               tcp_validate_incoming
               subflow_error_report

   100.00%     0.00%  swapper  [kernel.kallsyms]            [k] __napi_poll
            |
            ---__napi_poll
               process_backlog
               __netif_receive_skb_one_core
               ipv6_rcv
               ip6_input
               ip6_input_finish
               ip6_protocol_deliver_rcu
               tcp_v6_rcv
               tcp_v6_do_rcv
               tcp_rcv_established
               tcp_validate_incoming
               subflow_error_report

   100.00%     0.00%  swapper  [kernel.kallsyms]            [k] process_backlog
            |
            ---process_backlog
               __netif_receive_skb_one_core
               ipv6_rcv
               ip6_input
               ip6_input_finish
               ip6_protocol_deliver_rcu
               tcp_v6_rcv
               tcp_v6_do_rcv
               tcp_rcv_established
               tcp_validate_incoming
               subflow_error_report

   100.00%     0.00%  swapper  [kernel.kallsyms]            [k] __netif_receive_skb_one_core
            |
            ---__netif_receive_skb_one_core
               ipv6_rcv
               ip6_input
               ip6_input_finish
               ip6_protocol_deliver_rcu
               tcp_v6_rcv
               tcp_v6_do_rcv
               tcp_rcv_established
               tcp_validate_incoming
               subflow_error_report

   100.00%     0.00%  swapper  [kernel.kallsyms]            [k] ipv6_rcv
            |
            ---ipv6_rcv
               ip6_input
               ip6_input_finish
               ip6_protocol_deliver_rcu
               tcp_v6_rcv
               tcp_v6_do_rcv
               tcp_rcv_established
               tcp_validate_incoming
               subflow_error_report

   100.00%     0.00%  swapper  [kernel.kallsyms]            [k] ip6_input
            |
            ---ip6_input
               ip6_input_finish
               ip6_protocol_deliver_rcu
               tcp_v6_rcv
               tcp_v6_do_rcv
               tcp_rcv_established
               tcp_validate_incoming
               subflow_error_report

   100.00%     0.00%  swapper  [kernel.kallsyms]            [k] ip6_input_finish
            |
            ---ip6_input_finish
               ip6_protocol_deliver_rcu
               tcp_v6_rcv
               tcp_v6_do_rcv
               tcp_rcv_established
               tcp_validate_incoming
               subflow_error_report

   100.00%     0.00%  swapper  [kernel.kallsyms]            [k] ip6_protocol_deliver_rcu
            |
            ---ip6_protocol_deliver_rcu
               tcp_v6_rcv
               tcp_v6_do_rcv
               tcp_rcv_established
               tcp_validate_incoming
               subflow_error_report

   100.00%     0.00%  swapper  [kernel.kallsyms]            [k] tcp_v6_rcv
            |
            ---tcp_v6_rcv
               tcp_v6_do_rcv
               tcp_rcv_established
               tcp_validate_incoming
               subflow_error_report

   100.00%     0.00%  swapper  [kernel.kallsyms]            [k] tcp_v6_do_rcv
            |
            ---tcp_v6_do_rcv
               tcp_rcv_established
               tcp_validate_incoming
               subflow_error_report

   100.00%     0.00%  swapper  [kernel.kallsyms]            [k] tcp_rcv_established
            |
            ---tcp_rcv_established
               tcp_validate_incoming
               subflow_error_report

   100.00%     0.00%  swapper  [kernel.kallsyms]            [k] tcp_validate_incoming
            |
            ---tcp_validate_incoming
               subflow_error_report

    50.00%     0.00%  swapper  [kernel.kallsyms]            [k] start_secondary
            |
            ---start_secondary
               cpu_startup_entry
               do_idle
               default_idle_call
               default_idle
               asm_sysvec_apic_timer_interrupt
               sysvec_apic_timer_interrupt
               irq_exit_rcu
               __do_softirq
               net_rx_action
               __napi_poll
               process_backlog
               __netif_receive_skb_one_core
               ipv6_rcv
               ip6_input
               ip6_input_finish
               ip6_protocol_deliver_rcu
               tcp_v6_rcv
               tcp_v6_do_rcv
               tcp_rcv_established
               tcp_validate_incoming
               subflow_error_report

    50.00%     0.00%  swapper  [kernel.kallsyms].init.text  [k] start_kernel
            |
            ---start_kernel
               cpu_startup_entry
               do_idle
               default_idle_call
               default_idle
               asm_sysvec_apic_timer_interrupt
               sysvec_apic_timer_interrupt
               irq_exit_rcu
               __do_softirq
               net_rx_action
               __napi_poll
               process_backlog
               __netif_receive_skb_one_core
               ipv6_rcv
               ip6_input
               ip6_input_finish
               ip6_protocol_deliver_rcu
               tcp_v6_rcv
               tcp_v6_do_rcv
               tcp_rcv_established
               tcp_validate_incoming
               subflow_error_report



#
# (Tip: Add -I to perf record to sample register values, which will be visible in perf report sample context.)
#

Not sure if this one is useful then :-/

@matttbe
Copy link
Member Author

matttbe commented May 19, 2021

I had it again, without perf, with @pabeni's 4th patch:

++ ./mptcp_connect.sh -l 0.99% -d 4 -r '94% 99%'
[ 2471.700771] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
[ 2472.071206] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth3: link becomes ready
[ 2472.531101] IPv6: ADDRCONF(NETDEV_CHANGE): ns4eth3: link becomes ready
[ 2472.536847] IPv6: ADDRCONF(NETDEV_CHANGE): ns3eth4: link becomes ready
# INFO: set ns3-60a4fa56-WfeBHV dev ns3eth2: ethtool -K  gro off
# INFO: set ns4-60a4fa56-WfeBHV dev ns4eth3: ethtool -K tso off gso off gro off
# Created /tmp/tmp.Ycdt9WOL5n (size 7319580     /tmp/tmp.Ycdt9WOL5n) containing data sent by client
# Created /tmp/tmp.ygogCWfJ1o (size 7500828     /tmp/tmp.ygogCWfJ1o) containing data sent by server
# New MPTCP socket can be blocked via sysctl            [ OK ]
# setsockopt(..., TCP_ULP, "mptcp", ...) blocked        [ OK ]
# INFO: validating network environment with pings
# INFO: Using loss of 0.99% delay 4 ms reorder 94% 99% with delay 1ms on ns3eth4
# ns1 MPTCP -> ns1 (10.0.1.1:10000      ) MPTCP (duration   175ms) [ OK ]
# ns1 MPTCP -> ns1 (10.0.1.1:10001      ) TCP   (duration   140ms) [ OK ]
# ns1 TCP   -> ns1 (10.0.1.1:10002      ) MPTCP (duration   142ms) [ OK ]
# ns1 MPTCP -> ns1 (dead:beef:1::1:10003) MPTCP (duration   153ms) [ OK ]
# ns1 MPTCP -> ns1 (dead:beef:1::1:10004) TCP   (duration   118ms) [ OK ]
# ns1 TCP   -> ns1 (dead:beef:1::1:10005) MPTCP (duration   160ms) [ OK ]
# ns1 MPTCP -> ns2 (10.0.1.2:10006      ) MPTCP (duration   135ms) [ OK ]
# ns1 MPTCP -> ns2 (dead:beef:1::2:10007) MPTCP (duration   139ms) [ OK ]
# ns1 MPTCP -> ns2 (10.0.2.1:10008      ) MPTCP (duration   143ms) [ OK ]
# ns1 MPTCP -> ns2 (dead:beef:2::1:10009) MPTCP (duration   142ms) [ OK ]
# ns1 MPTCP -> ns3 (10.0.2.2:10010      ) MPTCP (duration   276ms) [ OK ]
# ns1 MPTCP -> ns3 (dead:beef:2::2:10011) MPTCP (duration   182ms) [ OK ]
# ns1 MPTCP -> ns3 (10.0.3.2:10012      ) MPTCP (duration   172ms) [ OK ]
# ns1 MPTCP -> ns3 (dead:beef:3::2:10013) MPTCP (duration   356ms) [ OK ]
# ns1 MPTCP -> ns4 (10.0.3.1:10014      ) MPTCP (duration   254ms) [ OK ]
# ns1 MPTCP -> ns4 (dead:beef:3::1:10015) MPTCP (duration   444ms) [ OK ]
# ns2 MPTCP -> ns1 (10.0.1.1:10016      ) MPTCP (duration   119ms) [ OK ]
# ns2 MPTCP -> ns1 (dead:beef:1::1:10017) MPTCP (duration   133ms) [ OK ]
# ns2 MPTCP -> ns3 (10.0.2.2:10018      ) MPTCP (duration   191ms) [ OK ]
# ns2 MPTCP -> ns3 (dead:beef:2::2:10019) MPTCP [ 2501.608391] ------------[ cut here ]------------
[ 2501.613071] WARNING: CPU: 1 PID: 29458 at net/core/stream.c:208 sk_stream_kill_queues (net/core/stream.c:208 (discriminator 1))
[ 2501.621115] Modules linked in: sch_netem
[ 2501.624986] CPU: 1 PID: 29458 Comm: kworker/1:3 Not tainted 5.12.0+ #538
[ 2501.630837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[ 2501.638693] Workqueue: events mptcp_worker
[ 2501.642367] RIP: 0010:sk_stream_kill_queues (net/core/stream.c:208 (discriminator 1))
[ 2501.646966] Code: 00 85 c0 75 1f 85 f6 75 21 5b 5d c3 48 89 df e8 83 ec fe ff 8b 83 48 01 00 00 8b b3 00 01 00 00 85 c0 74 e1 0f 0b 85 f6 74 df <0f> 0b 5b 5d c3 0f 0b eb ac 0f 1f 40 00 0f 1f 44 00 00 41 57 41 56
All code
========
   0:   00 85 c0 75 1f 85       add    %al,-0x7ae08a40(%rbp)
   6:   f6 75 21                divb   0x21(%rbp)
   9:   5b                      pop    %rbx
   a:   5d                      pop    %rbp
   b:   c3                      retq
   c:   48 89 df                mov    %rbx,%rdi
   f:   e8 83 ec fe ff          callq  0xfffffffffffeec97
  14:   8b 83 48 01 00 00       mov    0x148(%rbx),%eax
  1a:   8b b3 00 01 00 00       mov    0x100(%rbx),%esi
  20:   85 c0                   test   %eax,%eax
  22:   74 e1                   je     0x5
  24:   0f 0b                   ud2
  26:   85 f6                   test   %esi,%esi
  28:   74 df                   je     0x9
  2a:*  0f 0b                   ud2             <-- trapping instruction
  2c:   5b                      pop    %rbx
  2d:   5d                      pop    %rbp
  2e:   c3                      retq
  2f:   0f 0b                   ud2
  31:   eb ac                   jmp    0xffffffffffffffdf
  33:   0f 1f 40 00             nopl   0x0(%rax)
  37:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  3c:   41 57                   push   %r15
  3e:   41 56                   push   %r14

Code starting with the faulting instruction
===========================================
   0:   0f 0b                   ud2
   2:   5b                      pop    %rbx
   3:   5d                      pop    %rbp
   4:   c3                      retq
   5:   0f 0b                   ud2
   7:   eb ac                   jmp    0xffffffffffffffb5
   9:   0f 1f 40 00             nopl   0x0(%rax)
   d:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  12:   41 57                   push   %r15
  14:   41 56                   push   %r14
[ 2501.662908] RSP: 0018:ffff97504353bdf8 EFLAGS: 00010206
[ 2501.667526] RAX: 0000000000000000 RBX: ffff8ec3c35b64c0 RCX: 0000000000000007
[ 2501.673705] RDX: ffffffffa4ef8570 RSI: 0000000000000dbb RDI: ffff8ec3c35b64c0
[ 2501.679935] RBP: ffff8ec3c35b6570 R08: 0000000000000000 R09: 0000000000000001
[ 2501.686117] R10: 000000000000000c R11: 0000000000000000 R12: ffff97504353be10
[ 2501.692257] R13: ffff8ec3c35b6a90 R14: ffff8ec43dd2d100 R15: 0000000000000000
[ 2501.698466] FS:  0000000000000000(0000) GS:ffff8ec43dd00000(0000) knlGS:0000000000000000
[ 2501.705498] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2501.710512] CR2: 00007fffcd37aab8 CR3: 000000003c010000 CR4: 00000000003506e0
[ 2501.716750] Call Trace:
[ 2501.719024] __mptcp_destroy_sock (include/net/xfrm.h:1185 (discriminator 3))
[ 2501.722929] mptcp_worker (net/mptcp/protocol.c:2370)
[ 2501.726269] process_one_work (kernel/workqueue.c:2275)
[ 2501.729872] worker_thread (include/linux/list.h:282)
[ 2501.733147] ? process_one_work (kernel/workqueue.c:2364)
[ 2501.736909] kthread (kernel/kthread.c:313)
[ 2501.739739] ? kthread_park (kernel/kthread.c:266)
[ 2501.743020] ret_from_fork (arch/x86/entry/entry_64.S:294)
[ 2501.746257] ---[ end trace d0ac4fdcacad2849 ]---
[ 2501.750413] ------------[ cut here ]------------
[ 2501.754621] WARNING: CPU: 1 PID: 16 at net/ipv4/af_inet.c:157 inet_sock_destruct (net/ipv4/af_inet.c:157 (discriminator 1))
[ 2501.761994] Modules linked in: sch_netem
[ 2501.765762] CPU: 1 PID: 16 Comm: ksoftirqd/1 Tainted: G        W         5.12.0+ #538
[ 2501.772469] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[ 2501.780272] RIP: 0010:inet_sock_destruct (net/ipv4/af_inet.c:157 (discriminator 1))
[ 2501.784657] Code: bc 24 30 01 00 00 5b 41 5c e9 c7 e4 f2 ff 41 0f b6 44 24 12 3c 07 74 8f e9 26 fb 21 00 4c 89 e7 e8 10 4d f0 ff e9 71 ff ff ff <0f> 0b eb b6 0f 0b 41 8b 84 24 4c 01 00 00 85 c0 74 90 0f 0b 41 8b
All code
========
   0:   bc 24 30 01 00          mov    $0x13024,%esp
   5:   00 5b 41                add    %bl,0x41(%rbx)
   8:   5c                      pop    %rsp
   9:   e9 c7 e4 f2 ff          jmpq   0xfffffffffff2e4d5
   e:   41 0f b6 44 24 12       movzbl 0x12(%r12),%eax
  14:   3c 07                   cmp    $0x7,%al
  16:   74 8f                   je     0xffffffffffffffa7
  18:   e9 26 fb 21 00          jmpq   0x21fb43
  1d:   4c 89 e7                mov    %r12,%rdi
  20:   e8 10 4d f0 ff          callq  0xfffffffffff04d35
  25:   e9 71 ff ff ff          jmpq   0xffffffffffffff9b
  2a:*  0f 0b                   ud2             <-- trapping instruction
  2c:   eb b6                   jmp    0xffffffffffffffe4
  2e:   0f 0b                   ud2
  30:   41 8b 84 24 4c 01 00    mov    0x14c(%r12),%eax
  37:   00
  38:   85 c0                   test   %eax,%eax
  3a:   74 90                   je     0xffffffffffffffcc
  3c:   0f 0b                   ud2
  3e:   41                      rex.B
  3f:   8b                      .byte 0x8b

Code starting with the faulting instruction
===========================================
   0:   0f 0b                   ud2
   2:   eb b6                   jmp    0xffffffffffffffba
   4:   0f 0b                   ud2
   6:   41 8b 84 24 4c 01 00    mov    0x14c(%r12),%eax
   d:   00
   e:   85 c0                   test   %eax,%eax
  10:   74 90                   je     0xffffffffffffffa2
  12:   0f 0b                   ud2
  14:   41                      rex.B
  15:   8b                      .byte 0x8b
[ 2501.800279] RSP: 0018:ffff975040093a98 EFLAGS: 00010206
[ 2501.804968] RAX: 0000000000000dbb RBX: ffff8ec3c35b6570 RCX: 0000000000000ba4
[ 2501.811048] RDX: 0000000000000000 RSI: 0000000000000dbb RDI: ffff8ec3c35b6570
[ 2501.817247] RBP: ffff8ec3c35b64c0 R08: ffff8ec3c35b660c R09: ffffffffa4053a00
[ 2501.823364] R10: ffff8ec3c2c80080 R11: 0000000000000001 R12: ffff8ec3c35b64c0
[ 2501.829507] R13: ffff8ec3c46b6cb8 R14: ffff8ec3c46b6cb8 R15: ffff8ec3c46b6c90
[ 2501.835655] FS:  0000000000000000(0000) GS:ffff8ec43dd00000(0000) knlGS:0000000000000000
[ 2501.842610] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2501.847691] CR2: 00007fffcd37aab8 CR3: 000000003c010000 CR4: 00000000003506e0
[ 2501.853857] Call Trace:
[ 2501.856295] __sk_destruct (net/core/sock.c:1808)
[ 2501.859621] subflow_ulp_release (include/net/sock.h:1813)
[ 2501.863278] tcp_cleanup_ulp (net/ipv4/tcp_ulp.c:124)
[ 2501.866718] tcp_v4_destroy_sock (net/ipv4/tcp_ipv4.c:2237)
[ 2501.870491] tcp_v6_destroy_sock (net/ipv6/tcp_ipv6.c:1947)
[ 2501.874078] inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:887 (discriminator 9))
[ 2501.878122] tcp_reset (arch/x86/include/asm/bitops.h:207)
[ 2501.881144] tcp_validate_incoming (net/ipv4/tcp_input.c:5656)
[ 2501.885100] tcp_rcv_state_process (net/ipv4/tcp_input.c:6384)
[ 2501.889137] ? security_sock_rcv_skb (security/security.c:2241 (discriminator 19))
[ 2501.893147] ? sk_filter_trim_cap (net/core/filter.c:156)
[ 2501.897000] tcp_v6_do_rcv (net/ipv6/tcp_ipv6.c:1527)
[ 2501.900387] tcp_v6_rcv (net/ipv6/tcp_ipv6.c:1736)
[ 2501.903545] ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:424 (discriminator 8))
[ 2501.907668] ip6_input_finish (include/linux/rcupdate.h:74)
[ 2501.911120] ip6_input (include/linux/netfilter.h:301)
[ 2501.914191] ? ip6_dst_check (net/ipv6/route.c:2625)
[ 2501.917620] ? tcp_v6_early_demux (include/net/dst.h:469)
[ 2501.921554] ipv6_rcv (include/linux/netfilter.h:302)
[ 2501.924479] __netif_receive_skb_one_core (net/core/dev.c:5443)
[ 2501.928817] process_backlog (include/linux/rcupdate.h:74)
[ 2501.932295] __napi_poll (net/core/dev.c:6973)
[ 2501.935462] net_rx_action (net/core/dev.c:7042)
[ 2501.938842] __do_softirq (kernel/softirq.c:559)
[ 2501.943153] run_ksoftirqd (arch/x86/include/asm/irqflags.h:45)
[ 2501.946383] smpboot_thread_fn (kernel/smpboot.c:165)
[ 2501.949982] ? sort_range (kernel/smpboot.c:108)
[ 2501.953131] kthread (kernel/kthread.c:313)
[ 2501.955991] ? kthread_park (kernel/kthread.c:266)
[ 2501.959268] ret_from_fork (arch/x86/entry/entry_64.S:294)
[ 2501.962471] ---[ end trace d0ac4fdcacad284a ]---
(duration   529ms) [ OK ]
# ns2 MPTCP -> ns3 (10.0.3.2:10020      ) MPTCP (duration   183ms) [ OK ]
# ns2 MPTCP -> ns3 (dead:beef:3::2:10021) MPTCP (duration   160ms) [ OK ]
# ns2 MPTCP -> ns4 (10.0.3.1:10022      ) MPTCP (duration   212ms) [ OK ]
# ns2 MPTCP -> ns4 (dead:beef:3::1:10023) MPTCP (duration   270ms) [ OK ]
# ns3 MPTCP -> ns1 (10.0.1.1:10024      ) MPTCP (duration   406ms) [ OK ]
# ns3 MPTCP -> ns1 (dead:beef:1::1:10025) MPTCP (duration   191ms) [ OK ]
# ns3 MPTCP -> ns2 (10.0.1.2:10026      ) MPTCP (duration   220ms) [ OK ]
# ns3 MPTCP -> ns2 (dead:beef:1::2:10027) MPTCP (duration  1551ms) [ OK ]
# ns3 MPTCP -> ns2 (10.0.2.1:10028      ) MPTCP (duration   313ms) [ OK ]
# ns3 MPTCP -> ns2 (dead:beef:2::1:10029) MPTCP (duration   187ms) [ OK ]
# ns3 MPTCP -> ns4 (10.0.3.1:10030      ) MPTCP (duration   139ms) [ OK ]
# ns3 MPTCP -> ns4 (dead:beef:3::1:10031) MPTCP (duration   178ms) [ OK ]
# ns4 MPTCP -> ns1 (10.0.1.1:10032      ) MPTCP (duration   189ms) [ OK ]
# ns4 MPTCP -> ns1 (dead:beef:1::1:10033) MPTCP (duration   214ms) [ OK ]
# ns4 MPTCP -> ns2 (10.0.1.2:10034      ) MPTCP (duration   519ms) [ OK ]
# ns4 MPTCP -> ns2 (dead:beef:1::2:10035) MPTCP (duration   212ms) [ OK ]
# ns4 MPTCP -> ns2 (10.0.2.1:10036      ) MPTCP (duration   324ms) [ OK ]
# ns4 MPTCP -> ns2 (dead:beef:2::1:10037) MPTCP (duration   946ms) [ OK ]
# ns4 MPTCP -> ns3 (10.0.2.2:10038      ) MPTCP (duration   145ms) [ OK ]
# ns4 MPTCP -> ns3 (dead:beef:2::2:10039) MPTCP (duration   165ms) [ OK ]
# ns4 MPTCP -> ns3 (10.0.3.2:10040      ) MPTCP (duration   188ms) [ OK ]
# ns4 MPTCP -> ns3 (dead:beef:3::2:10041) MPTCP (duration   152ms) [ OK ]
# INFO: with peek mode: saveWithPeek
# ns1 MPTCP -> ns1 (10.0.1.1:10042      ) MPTCP (duration   170ms) [ OK ]
# ns1 MPTCP -> ns1 (10.0.1.1:10043      ) TCP   (duration   118ms) [ OK ]
# ns1 TCP   -> ns1 (10.0.1.1:10044      ) MPTCP (duration   116ms) [ OK ]
# ns1 MPTCP -> ns1 (dead:beef:1::1:10045) MPTCP (duration   186ms) [ OK ]
# ns1 MPTCP -> ns1 (dead:beef:1::1:10046) TCP   (duration   144ms) [ OK ]
# ns1 TCP   -> ns1 (dead:beef:1::1:10047) MPTCP (duration   114ms) [ OK ]
# INFO: with peek mode: saveAfterPeek
# ns1 MPTCP -> ns1 (10.0.1.1:10048      ) MPTCP (duration   180ms) [ OK ]
# ns1 MPTCP -> ns1 (10.0.1.1:10049      ) TCP   (duration   151ms) [ OK ]
# ns1 TCP   -> ns1 (10.0.1.1:10050      ) MPTCP (duration   123ms) [ OK ]
# ns1 MPTCP -> ns1 (dead:beef:1::1:10051) MPTCP (duration   167ms) [ OK ]
# ns1 MPTCP -> ns1 (dead:beef:1::1:10052) TCP   (duration   148ms) [ OK ]
# ns1 TCP   -> ns1 (dead:beef:1::1:10053) MPTCP (duration   133ms) [ OK ]
# Time: 77 seconds
++ rc=0

On top of yesterday's export branch with these 4 patches:

878282030ff0 (HEAD -> tmp) mptcp: drop skb cache
66f500c3e329 mptcp: update selftest for fallback due to OoO
2eda92d4479d mptcp: do not reset MP_CAPABLE subflow on mapping errors
e03f1c973fc8 mptcp: always parse mptcp options for MPC reqsk

@matttbe
Copy link
Member Author

matttbe commented May 19, 2021

I just created #195 not to mix this with what e31f807 (mptcp: fix sk_forward_memory corruption under memory pressure) fixed.

matttbe pushed a commit that referenced this issue May 20, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup() -> __mptcp_update_wmem()

We can hit the above only under memory pressure. When that happen,
forward memory accounting could be corrupted, as reported by Matt.

Address the issue creating a new variant of __mptcp_clean_una_wakeup()
which handle fwd memory updates with the proper care and invoking
such new helper in the relevant code path.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
matttbe pushed a commit that referenced this issue May 20, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup() -> __mptcp_update_wmem()

We can hit the above only under memory pressure. When that happen,
forward memory accounting could be corrupted, as reported by Matt.

Address the issue creating a new variant of __mptcp_clean_una_wakeup()
which handle fwd memory updates with the proper care and invoking
such new helper in the relevant code path.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
matttbe pushed a commit that referenced this issue May 20, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup() -> __mptcp_update_wmem()

We can hit the above only under memory pressure. When that happen,
forward memory accounting could be corrupted, as reported by Matt.

Address the issue creating a new variant of __mptcp_clean_una_wakeup()
which handle fwd memory updates with the proper care and invoking
such new helper in the relevant code path.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
matttbe pushed a commit that referenced this issue May 20, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup() -> __mptcp_update_wmem()

We can hit the above only under memory pressure. When that happen,
forward memory accounting could be corrupted, as reported by Matt.

Address the issue creating a new variant of __mptcp_clean_una_wakeup()
which handle fwd memory updates with the proper care and invoking
such new helper in the relevant code path.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
jenkins-tessares pushed a commit that referenced this issue May 21, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup() -> __mptcp_update_wmem()

We can hit the above only under memory pressure. When that happen,
forward memory accounting could be corrupted, as reported by Matt.

Address the issue creating a new variant of __mptcp_clean_una_wakeup()
which handle fwd memory updates with the proper care and invoking
such new helper in the relevant code path.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
jenkins-tessares pushed a commit that referenced this issue May 22, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup() -> __mptcp_update_wmem()

We can hit the above only under memory pressure. When that happen,
forward memory accounting could be corrupted, as reported by Matt.

Address the issue creating a new variant of __mptcp_clean_una_wakeup()
which handle fwd memory updates with the proper care and invoking
such new helper in the relevant code path.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
jenkins-tessares pushed a commit that referenced this issue May 24, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup() -> __mptcp_update_wmem()

We can hit the above only under memory pressure. When that happen,
forward memory accounting could be corrupted, as reported by Matt.

Address the issue creating a new variant of __mptcp_clean_una_wakeup()
which handle fwd memory updates with the proper care and invoking
such new helper in the relevant code path.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
jenkins-tessares pushed a commit that referenced this issue May 25, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup() -> __mptcp_update_wmem()

We can hit the above only under memory pressure. When that happen,
forward memory accounting could be corrupted, as reported by Matt.

Address the issue creating a new variant of __mptcp_clean_una_wakeup()
which handle fwd memory updates with the proper care and invoking
such new helper in the relevant code path.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
matttbe pushed a commit that referenced this issue May 25, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
matttbe pushed a commit that referenced this issue May 25, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
jenkins-tessares pushed a commit that referenced this issue May 26, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
jenkins-tessares pushed a commit that referenced this issue May 27, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
jenkins-tessares pushed a commit that referenced this issue May 28, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
jenkins-tessares pushed a commit that referenced this issue May 29, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
jenkins-tessares pushed a commit that referenced this issue May 30, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
zx2c4-bot pushed a commit to WireGuard/wireguard-linux that referenced this issue May 30, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: multipath-tcp/mptcp_net-next#172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
jenkins-tessares pushed a commit that referenced this issue May 31, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
matttbe pushed a commit that referenced this issue May 31, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
jenkins-tessares pushed a commit that referenced this issue Jun 1, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
jenkins-tessares pushed a commit that referenced this issue Jun 2, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
jenkins-tessares pushed a commit that referenced this issue Jun 3, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
matttbe pushed a commit that referenced this issue Jun 3, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
jenkins-tessares pushed a commit that referenced this issue Jun 4, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
matttbe pushed a commit that referenced this issue Jun 4, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
matttbe pushed a commit that referenced this issue Jun 4, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
matttbe pushed a commit that referenced this issue Jun 4, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
matttbe pushed a commit that referenced this issue Jun 4, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
matttbe pushed a commit that referenced this issue Jun 4, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
jenkins-tessares pushed a commit that referenced this issue Jun 5, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
matttbe pushed a commit that referenced this issue Jun 7, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
matttbe pushed a commit that referenced this issue Jun 7, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
matttbe pushed a commit that referenced this issue Jun 7, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
matttbe pushed a commit that referenced this issue Jun 7, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
matttbe pushed a commit that referenced this issue Jun 7, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: #172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
hamadmarri pushed a commit to hamadmarri/linux that referenced this issue Jun 9, 2021
[ Upstream commit b5941f0 ]

MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: multipath-tcp/mptcp_net-next#172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts pushed a commit to woodsts/linux-stable that referenced this issue Jun 10, 2021
[ Upstream commit b5941f0 ]

MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: multipath-tcp/mptcp_net-next#172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
fengguang pushed a commit to 0day-ci/linux that referenced this issue Oct 9, 2021
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea ("mptcp: fix spurious retransmissions")
Closes: multipath-tcp/mptcp_net-next#172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
matttbe pushed a commit that referenced this issue May 20, 2024
Recent additions in BPF like cpu v4 instructions, test_bpf module
exhibits the following failures:

  test_bpf: #82 ALU_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: #83 ALU_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: #84 ALU64_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: #85 ALU64_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: #86 ALU64_MOVSX | BPF_W jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)

  test_bpf: #165 ALU_SDIV_X: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)
  test_bpf: #166 ALU_SDIV_K: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)

  test_bpf: #169 ALU_SMOD_X: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)
  test_bpf: #170 ALU_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)

  test_bpf: #172 ALU64_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)

  test_bpf: #313 BSWAP 16: 0x0123456789abcdef -> 0xefcd
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 301 PASS
  test_bpf: #314 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 555 PASS
  test_bpf: #315 BSWAP 64: 0x0123456789abcdef -> 0x67452301
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 268 PASS
  test_bpf: #316 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 269 PASS
  test_bpf: #317 BSWAP 16: 0xfedcba9876543210 -> 0x1032
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 460 PASS
  test_bpf: #318 BSWAP 32: 0xfedcba9876543210 -> 0x10325476
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 320 PASS
  test_bpf: #319 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 222 PASS
  test_bpf: #320 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 273 PASS

  test_bpf: #344 BPF_LDX_MEMSX | BPF_B
  eBPF filter opcode 0091 (@5) unsupported
  jited:0 432 PASS
  test_bpf: #345 BPF_LDX_MEMSX | BPF_H
  eBPF filter opcode 0089 (@5) unsupported
  jited:0 381 PASS
  test_bpf: #346 BPF_LDX_MEMSX | BPF_W
  eBPF filter opcode 0081 (@5) unsupported
  jited:0 505 PASS

  test_bpf: #490 JMP32_JA: Unconditional jump: if (true) return 1
  eBPF filter opcode 0006 (@1) unsupported
  jited:0 261 PASS

  test_bpf: Summary: 1040 PASSED, 10 FAILED, [924/1038 JIT'ed]

Fix them by adding missing processing.

Fixes: daabb2b ("bpf/tests: add tests for cpuv4 instructions")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/91de862dda99d170697eb79ffb478678af7e0b27.1709652689.git.christophe.leroy@csgroup.eu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
MPTCP Bugs
  
Closed
Development

No branches or pull requests

2 participants