Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[interop] watchdog: BUG: soft lockup - CPU#2 stuck for 22s #87

Closed
cpaasch opened this issue Sep 10, 2020 · 5 comments
Closed

[interop] watchdog: BUG: soft lockup - CPU#2 stuck for 22s #87

cpaasch opened this issue Sep 10, 2020 · 5 comments

Comments

@cpaasch
Copy link
Member

cpaasch commented Sep 10, 2020

Running test "bug_cheng_sbuf" (yeah, weird name - historic reasons ;-)) from https://github.com/multipath-tcp/mptcp-scripts/blob/master/testing/testing.py#L555 with netnext as server and multipath-tcp.org as a client. The test configures a huge rmem on the server and configures the client to create 16 subflows using the same pair of IP-addresses. Then, the test starts an iperf.

[   56.303956] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/2:0]
[   56.306069] Modules linked in:
[   56.306564] CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Not tainted 5.9.0-rc3.mptcp #118
[   56.307753] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[   56.309443] RIP: 0010:mptcp_subflow_data_available+0x1cd/0x8d0
[   56.310292] Code: 8b b5 88 04 00 00 83 e2 01 89 c6 01 ca 39 c8 0f 43 f2 41 89 f4 0f 1f 44 00 00 48 8b 55 30 48 8b 92 e0 01 00 00 48 85 d2 74 08 <65> 48 ff 82 a0 00 00 00 8b 95 6c 05 00 00 44 01 e2 89 95 6c 05 00
[   56.313021] RSP: 0018:ffffc900000e4a20 EFLAGS: 00000206
[   56.313814] RAX: 0000000000000000 RBX: ffff8881364f1900 RCX: 0000000000000b28
[   56.314862] RDX: 0000607ec4008fc0 RSI: 0000000000000000 RDI: 0000000000025ed0
[   56.315883] RBP: ffff888128cb3780 R08: ffff888135f24680 R09: 0000000000000001
[   56.316942] R10: 0000000000000000 R11: ffff888135f24688 R12: 0000000000000000
[   56.317988] R13: ffff888128ca8000 R14: ffff8881364f1900 R15: ffff888125e6f500
[   56.319051] FS:  0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
[   56.320267] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   56.321108] CR2: 00007fa8ff63b000 CR3: 0000000002010000 CR4: 00000000000006e0
[   56.322159] Call Trace:
[   56.322547]  <IRQ>
[   56.322856]  subflow_data_ready+0x52/0xa0
[   56.323477]  tcp_data_queue+0x455/0xc70
[   56.324050]  tcp_rcv_established+0x1c3/0x5b0
[   56.324715]  tcp_v4_do_rcv+0x126/0x1e0
[   56.325273]  tcp_v4_rcv+0xbd4/0xd00
[   56.325799]  ip_protocol_deliver_rcu+0x26/0x180
[   56.326490]  ip_local_deliver_finish+0x3f/0x50
[   56.327143]  ip_local_deliver+0xe0/0xf0
[   56.327748]  ? ip_protocol_deliver_rcu+0x180/0x180
[   56.328459]  ip_sublist_rcv_finish+0x33/0x50
[   56.329083]  ip_sublist_rcv+0x129/0x1b0
[   56.329753]  ? ip_rcv_finish_core.isra.0+0x390/0x390
[   56.330522]  ip_list_rcv+0x11e/0x142
[   56.331078]  __netif_receive_skb_list_core+0x238/0x260
[   56.331885]  netif_receive_skb_list_internal+0x18e/0x2a0
[   56.332704]  gro_normal_list.part.0+0x14/0x30
[   56.333342]  napi_complete_done+0x62/0x150
[   56.333971]  virtqueue_napi_complete+0x26/0x70
[   56.334679]  virtnet_poll+0x2d6/0x330
[   56.335200]  net_rx_action+0xe0/0x2a0
[   56.335765]  __do_softirq+0xd1/0x213
[   56.336303]  asm_call_on_stack+0x12/0x20
[   56.336891]  </IRQ>
[   56.337248]  do_softirq_own_stack+0x34/0x40
[   56.337886]  irq_exit_rcu+0x92/0xa0
[   56.338450]  common_interrupt+0x6c/0x130
[   56.339037]  asm_common_interrupt+0x1e/0x40
[   56.339698] RIP: 0010:default_idle+0xe/0x10
[   56.340295] Code: 8b 04 25 00 6d 01 00 f0 80 60 02 df c3 0f ae f0 0f ae 38 0f ae f0 eb b9 fb eb e3 90 e9 07 00 00 00 0f 00 2d 06 cb 49 00 fb f4 <c3> cc 65 8b 15 79 92 8a 7e 89 d2 48 8b 05 e0 e5 d3 00 48 03 04 d5
[   56.343049] RSP: 0018:ffffc9000007bed0 EFLAGS: 00000206
[   56.343821] RAX: ffffffff817680d0 RBX: 0000000000000002 RCX: ffff88813bd21a80
[   56.344902] RDX: 000000000003335a RSI: 0000000000000087 RDI: 0000000000000002
[   56.345964] RBP: ffff88813b111700 R08: 00000007863023ea R09: 00000000000003ae
[   56.347022] R10: ffff88813bd20f40 R11: 0000000000009000 R12: ffff88813b111700
[   56.348076] R13: ffff88813b111700 R14: 0000000000000000 R15: 0000000000000000
[   56.349126]  ? mwait_idle+0x70/0x70
[   56.349671]  default_idle_call+0x1f/0x30
[   56.350258]  do_idle+0x1d2/0x230
[   56.350775]  cpu_startup_entry+0x14/0x20
[   56.351355]  start_secondary+0x10a/0x150
[   56.351957]  secondary_startup_64+0xa4/0xb0
[   56.352582] Kernel panic - not syncing: softlockup: hung tasks
[   56.353453] CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Tainted: G             L    5.9.0-rc3.mptcp #118
[   56.354850] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[   56.356530] Call Trace:
[   56.356895]  <IRQ>
[   56.357240]  dump_stack+0x57/0x70
[   56.357753]  panic+0xf6/0x2b3
[   56.358194]  watchdog_timer_fn.cold+0x7e/0x97
[   56.358849]  ? softlockup_fn+0x30/0x30
[   56.359419]  __hrtimer_run_queues+0xe5/0x180
[   56.360055]  hrtimer_interrupt+0x109/0x240
[   56.360699]  __sysvec_apic_timer_interrupt+0x55/0x60
[   56.361411]  sysvec_apic_timer_interrupt+0x29/0x80
[   56.362112]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[   56.362887] RIP: 0010:mptcp_subflow_data_available+0x1cd/0x8d0
[   56.363757] Code: 8b b5 88 04 00 00 83 e2 01 89 c6 01 ca 39 c8 0f 43 f2 41 89 f4 0f 1f 44 00 00 48 8b 55 30 48 8b 92 e0 01 00 00 48 85 d2 74 08 <65> 48 ff 82 a0 00 00 00 8b 95 6c 05 00 00 44 01 e2 89 95 6c 05 00
[   56.366476] RSP: 0018:ffffc900000e4a20 EFLAGS: 00000206
[   56.367248] RAX: 0000000000000000 RBX: ffff8881364f1900 RCX: 0000000000000b28
[   56.368284] RDX: 0000607ec4008fc0 RSI: 0000000000000000 RDI: 0000000000025ed0
[   56.369343] RBP: ffff888128cb3780 R08: ffff888135f24680 R09: 0000000000000001
[   56.370500] R10: 0000000000000000 R11: ffff888135f24688 R12: 0000000000000000
[   56.371549] R13: ffff888128ca8000 R14: ffff8881364f1900 R15: ffff888125e6f500
[   56.372611]  subflow_data_ready+0x52/0xa0
[   56.373218]  tcp_data_queue+0x455/0xc70
[   56.373789]  tcp_rcv_established+0x1c3/0x5b0
[   56.374447]  tcp_v4_do_rcv+0x126/0x1e0
[   56.374986]  tcp_v4_rcv+0xbd4/0xd00
[   56.375509]  ip_protocol_deliver_rcu+0x26/0x180
[   56.376171]  ip_local_deliver_finish+0x3f/0x50
[   56.376846]  ip_local_deliver+0xe0/0xf0
[   56.377432]  ? ip_protocol_deliver_rcu+0x180/0x180
[   56.378141]  ip_sublist_rcv_finish+0x33/0x50
[   56.378794]  ip_sublist_rcv+0x129/0x1b0
[   56.379345]  ? ip_rcv_finish_core.isra.0+0x390/0x390
[   56.380085]  ip_list_rcv+0x11e/0x142
[   56.380633]  __netif_receive_skb_list_core+0x238/0x260
[   56.381406]  netif_receive_skb_list_internal+0x18e/0x2a0
[   56.382177]  gro_normal_list.part.0+0x14/0x30
[   56.382825]  napi_complete_done+0x62/0x150
[   56.383432]  virtqueue_napi_complete+0x26/0x70
[   56.384134]  virtnet_poll+0x2d6/0x330
[   56.384715]  net_rx_action+0xe0/0x2a0
[   56.385257]  __do_softirq+0xd1/0x213
[   56.385780]  asm_call_on_stack+0x12/0x20
[   56.386343]  </IRQ>
[   56.386693]  do_softirq_own_stack+0x34/0x40
[   56.387300]  irq_exit_rcu+0x92/0xa0
[   56.387820]  common_interrupt+0x6c/0x130
[   56.388427]  asm_common_interrupt+0x1e/0x40
[   56.389048] RIP: 0010:default_idle+0xe/0x10
[   56.389700] Code: 8b 04 25 00 6d 01 00 f0 80 60 02 df c3 0f ae f0 0f ae 38 0f ae f0 eb b9 fb eb e3 90 e9 07 00 00 00 0f 00 2d 06 cb 49 00 fb f4 <c3> cc 65 8b 15 79 92 8a 7e 89 d2 48 8b 05 e0 e5 d3 00 48 03 04 d5
[   56.392442] RSP: 0018:ffffc9000007bed0 EFLAGS: 00000206
[   56.393204] RAX: ffffffff817680d0 RBX: 0000000000000002 RCX: ffff88813bd21a80
[   56.394282] RDX: 000000000003335a RSI: 0000000000000087 RDI: 0000000000000002
[   56.395328] RBP: ffff88813b111700 R08: 00000007863023ea R09: 00000000000003ae
[   56.396384] R10: ffff88813bd20f40 R11: 0000000000009000 R12: ffff88813b111700
[   56.397439] R13: ffff88813b111700 R14: 0000000000000000 R15: 0000000000000000
[   56.398473]  ? mwait_idle+0x70/0x70
[   56.398995]  default_idle_call+0x1f/0x30
[   56.399601]  do_idle+0x1d2/0x230
[   56.400086]  cpu_startup_entry+0x14/0x20
[   56.400695]  start_secondary+0x10a/0x150
[   56.401271]  secondary_startup_64+0xa4/0xb0

HEAD is at:
ecc3d085058b ("mptcp: Enable MPTCP when IPPROTO_MPTCP is set") (HEAD) (10 minutes ago)
fdc664ab314a ("mptcp: call tcp_cleanup_rbuf on subflows") (12 hours ago)
35d6b46 ("DO-NOT-MERGE: mptcp: enabled by default") (tag: export/20200909T050747, mptcp_net-next/export) (24 hours ago)
6e7290b ("DO-NOT-MERGE: mptcp: use kmalloc on kasan build") (24 hours ago)
8fd4ed4 ("tcp: propagate MPTCP skb extensions on xmit splits") (24 hours ago)
691048c ("mptcp: use _fast lock version in __mptcp_move_skbs") (24 hours ago)
ef14371 ("mptcp: adjust mptcp receive buffer limit if subflow has larger one") (24 hours ago)
74acdfd ("mptcp: simult flow self-tests") (24 hours ago)
a1975d0 ("mptcp: allow picking different xmit subflows") (24 hours ago)
6855e98 ("mptcp: allow creating non-backup subflows") (24 hours ago)
2c61627 ("mptcp: move address attribute into mptcp_addr_info") (24 hours ago)
b347db3 ("mptcp: add OoO related mibs") (24 hours ago)
91baf67 ("mptcp: cleanup mptcp_subflow_discard_data()") (24 hours ago)
685dab9 ("mptcp: move ooo skbs into msk out of order queue.") (24 hours ago)
c97925c ("mptcp: introduce and use mptcp_try_coalesce()") (24 hours ago)
904204e ("mptcp: basic sndbuf autotuning") (24 hours ago)
5d28362 ("mptcp: trigger msk processing even for OoO data") (24 hours ago)
227d369 ("mptcp: set data_ready status bit in subflow_check_data_avail()") (24 hours ago)
6f2c3af ("mptcp: rethink 'is writable' conditional") (24 hours ago)
ff4e207 ("mptcp: add accept_subflow re-check") (24 hours ago)
1cde79a ("selftests: mptcp: add ADD_ADDR mibs check function") (24 hours ago)
70671b7 ("mptcp: add ADD_ADDR related mibs") (24 hours ago)
6d37e97 ("mptcp: send out ADD_ADDR with echo flag") (24 hours ago)
a20a96f ("mptcp: add the incoming RM_ADDR support") (24 hours ago)
7103b2d ("mptcp: add the outgoing RM_ADDR support") (24 hours ago)
328bd7f ("mptcp: rename addr_signal and the related functions") (24 hours ago)
e8d556d ("selftests/mptcp: Better delay & reordering configuration") (24 hours ago)
38119e7 ("bpf:selftests: add bpf_mptcp_sock() verifier tests") (24 hours ago)
43b5943 ("bpf:selftests: add MPTCP test base") (24 hours ago)
c3de50c ("bpf: add 'bpf_mptcp_sock' structure and helper") (24 hours ago)
31e2af1 ("mptcp: attach subflow socket to parent cgroup") (24 hours ago)
36440f0 ("bpf: expose is_mptcp flag to bpf_tcp_sock") (24 hours ago)
f5499c6 ("nfc: pn533/usb.c: fix spelling of "functions"") (mptcp_net-next/net-next) (26 hours ago)

@matttbe matttbe added this to Needs triage in MPTCP Bugs via automation Sep 10, 2020
@pabeni
Copy link

pabeni commented Sep 11, 2020

I think hit some corner case that triggers an infinite loop in subflow_check_data_avail()

@cpaasch could you please have a run without the automatic reboot on soft lookup and try enabling the dyn debug once it got stuck ?

echo 'file net/mptcp/* +fmp' > /sys/kernel/debug/dynamic_debug/control
sleep 0.3 # or less ;)
echo 'file net/mptcp/* -fmp' > /sys/kernel/debug/dynamic_debug/control

if the build don't have the dyn debug, perhaps per top could give some hints

@cpaasch
Copy link
Member Author

cpaasch commented Sep 17, 2020

I think I got something useful:

[  231.402331] mptcp:subflow_check_data_avail: MPTCP: msk=000000003f9f7e34 ssk=00000000c75f5c4a status=0
[  231.403262] mptcp:subflow_check_data_avail: MPTCP: msk ack_seq=726dfce200000c25 subflow ack_seq=726dfce100000c25
[  231.404310] mptcp:mptcp_subflow_discard_data: MPTCP: discarding=0 len=2856 seq=341996629
[  231.405467] mptcp:subflow_check_data_avail: MPTCP: msk=000000003f9f7e34 ssk=00000000c75f5c4a status=0
[  231.406395] mptcp:subflow_check_data_avail: MPTCP: msk ack_seq=726dfce200000c25 subflow ack_seq=726dfce100000c25
[  231.407450] mptcp:mptcp_subflow_discard_data: MPTCP: discarding=0 len=2856 seq=341996629
[  231.408615] mptcp:subflow_check_data_avail: MPTCP: msk=000000003f9f7e34 ssk=00000000c75f5c4a status=0
[  231.409547] mptcp:subflow_check_data_avail: MPTCP: msk ack_seq=726dfce200000c25 subflow ack_seq=726dfce100000c25
[  231.410596] mptcp:mptcp_subflow_discard_data: MPTCP: discarding=0 len=2856 seq=341996629
[  231.411753] mptcp:subflow_check_data_avail: MPTCP: msk=000000003f9f7e34 ssk=00000000c75f5c4a status=0
[  231.412683] mptcp:subflow_check_data_avail: MPTCP: msk ack_seq=726dfce200000c25 subflow ack_seq=726dfce100000c25
[  231.413743] mptcp:mptcp_subflow_discard_data: MPTCP: discarding=0 len=2856 seq=341996629
[  231.414894] mptcp:subflow_check_data_avail: MPTCP: msk=000000003f9f7e34 ssk=00000000c75f5c4a status=0
[  231.415820] mptcp:subflow_check_data_avail: MPTCP: msk ack_seq=726dfce200000c25 subflow ack_seq=726dfce100000c25
[  231.416866] mptcp:mptcp_subflow_discard_data: MPTCP: discarding=0 len=2856 seq=341996629
[  231.418014] mptcp:subflow_check_data_avail: MPTCP: msk=000000003f9f7e34 ssk=00000000c75f5c4a status=0
[  231.418953] mptcp:subflow_check_data_avail: MPTCP: msk ack_seq=726dfce200000c25 subflow ack_seq=726dfce100000c25
[  231.419999] mptcp:mptcp_subflow_discard_data: MPTCP: discarding=0 len=2856 seq=341996629
[  231.421168] mptcp:subflow_check_data_avail: MPTCP: msk=000000003f9f7e34 ssk=00000000c75f5c4a status=0
[  231.422087] mptcp:mptcp_subflow_discard_data: MPTCP: discarding=0 len=2856 seq=341996629
[  231.423152] mptcp:mptcp_subflow_discard_data: MPTCP: discarding=0 len=2856 seq=341996629
[  231.424067] mptcp:mptcp_subflow_discard_data: MPTCP: discarding=0 len=2856 seq=341996629

@pabeni
Copy link

pabeni commented Sep 17, 2020

@cpaasch thank you! The following tentative patch should fix the issue, could you please try that ?

diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 600f25a399c9..b0f0e8e81f11 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -820,7 +820,7 @@ static enum mapping_status get_mapping_status(struct sock *ssk,
 }
 
 static void mptcp_subflow_discard_data(struct sock *ssk, struct sk_buff *skb,
-                                      unsigned int limit)
+                                      u64 limit)
 {
        struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
        bool fin = TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN;

@cpaasch
Copy link
Member Author

cpaasch commented Sep 17, 2020

Yes, problem solved!!!

fengguang pushed a commit to 0day-ci/linux that referenced this issue Sep 17, 2020
Christoph reported an infinite loop in the subflow receive path
under stress condition.

If there are multiple subflows, each of them using a large send
buffer, the delta between the sequence number used by
MPTCP-level retransmission can and the current msk->ack_seq
can be greater than MAX_INT.

In the above scenario, when calling mptcp_subflow_discard_data(),
such delta will be truncated to int, and could result in a negative
number: no bytes will be dropped, and subflow_check_data_avail()
will try again to process the same packet, looping forever.

This change addresses the issue by expanding the 'limit' size to 64
bits, so that overflows are not possible anymore.

Closes: multipath-tcp/mptcp_net-next#87
Fixes: 6719331 ("mptcp: trigger msk processing even for OoO data")
Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
alaahl pushed a commit to alaahl/linux that referenced this issue Sep 18, 2020
Christoph reported an infinite loop in the subflow receive path
under stress condition.

If there are multiple subflows, each of them using a large send
buffer, the delta between the sequence number used by
MPTCP-level retransmission can and the current msk->ack_seq
can be greater than MAX_INT.

In the above scenario, when calling mptcp_subflow_discard_data(),
such delta will be truncated to int, and could result in a negative
number: no bytes will be dropped, and subflow_check_data_avail()
will try again to process the same packet, looping forever.

This change addresses the issue by expanding the 'limit' size to 64
bits, so that overflows are not possible anymore.

Closes: multipath-tcp/mptcp_net-next#87
Fixes: 6719331 ("mptcp: trigger msk processing even for OoO data")
Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
@matttbe matttbe closed this as completed Sep 24, 2020
MPTCP Bugs automation moved this from Needs triage to Closed Sep 24, 2020
@matttbe
Copy link
Member

matttbe commented Sep 24, 2020

linked to 1d39cd8

jenkins-tessares pushed a commit that referenced this issue Jul 24, 2021
During testing of f263a81 ("bpf: Track subprog poke descriptors correctly
and fix use-after-free") under various failure conditions, for example, when
jit_subprogs() fails and tries to clean up the program to be run under the
interpreter, we ran into the following freeze:

  [...]
  #127/8 tailcall_bpf2bpf_3:FAIL
  [...]
  [   92.041251] BUG: KASAN: slab-out-of-bounds in ___bpf_prog_run+0x1b9d/0x2e20
  [   92.042408] Read of size 8 at addr ffff88800da67f68 by task test_progs/682
  [   92.043707]
  [   92.044030] CPU: 1 PID: 682 Comm: test_progs Tainted: G   O   5.13.0-53301-ge6c08cb33a30-dirty #87
  [   92.045542] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
  [   92.046785] Call Trace:
  [   92.047171]  ? __bpf_prog_run_args64+0xc0/0xc0
  [   92.047773]  ? __bpf_prog_run_args32+0x8b/0xb0
  [   92.048389]  ? __bpf_prog_run_args64+0xc0/0xc0
  [   92.049019]  ? ktime_get+0x117/0x130
  [...] // few hundred [similar] lines more
  [   92.659025]  ? ktime_get+0x117/0x130
  [   92.659845]  ? __bpf_prog_run_args64+0xc0/0xc0
  [   92.660738]  ? __bpf_prog_run_args32+0x8b/0xb0
  [   92.661528]  ? __bpf_prog_run_args64+0xc0/0xc0
  [   92.662378]  ? print_usage_bug+0x50/0x50
  [   92.663221]  ? print_usage_bug+0x50/0x50
  [   92.664077]  ? bpf_ksym_find+0x9c/0xe0
  [   92.664887]  ? ktime_get+0x117/0x130
  [   92.665624]  ? kernel_text_address+0xf5/0x100
  [   92.666529]  ? __kernel_text_address+0xe/0x30
  [   92.667725]  ? unwind_get_return_address+0x2f/0x50
  [   92.668854]  ? ___bpf_prog_run+0x15d4/0x2e20
  [   92.670185]  ? ktime_get+0x117/0x130
  [   92.671130]  ? __bpf_prog_run_args64+0xc0/0xc0
  [   92.672020]  ? __bpf_prog_run_args32+0x8b/0xb0
  [   92.672860]  ? __bpf_prog_run_args64+0xc0/0xc0
  [   92.675159]  ? ktime_get+0x117/0x130
  [   92.677074]  ? lock_is_held_type+0xd5/0x130
  [   92.678662]  ? ___bpf_prog_run+0x15d4/0x2e20
  [   92.680046]  ? ktime_get+0x117/0x130
  [   92.681285]  ? __bpf_prog_run32+0x6b/0x90
  [   92.682601]  ? __bpf_prog_run64+0x90/0x90
  [   92.683636]  ? lock_downgrade+0x370/0x370
  [   92.684647]  ? mark_held_locks+0x44/0x90
  [   92.685652]  ? ktime_get+0x117/0x130
  [   92.686752]  ? lockdep_hardirqs_on+0x79/0x100
  [   92.688004]  ? ktime_get+0x117/0x130
  [   92.688573]  ? __cant_migrate+0x2b/0x80
  [   92.689192]  ? bpf_test_run+0x2f4/0x510
  [   92.689869]  ? bpf_test_timer_continue+0x1c0/0x1c0
  [   92.690856]  ? rcu_read_lock_bh_held+0x90/0x90
  [   92.691506]  ? __kasan_slab_alloc+0x61/0x80
  [   92.692128]  ? eth_type_trans+0x128/0x240
  [   92.692737]  ? __build_skb+0x46/0x50
  [   92.693252]  ? bpf_prog_test_run_skb+0x65e/0xc50
  [   92.693954]  ? bpf_prog_test_run_raw_tp+0x2d0/0x2d0
  [   92.694639]  ? __fget_light+0xa1/0x100
  [   92.695162]  ? bpf_prog_inc+0x23/0x30
  [   92.695685]  ? __sys_bpf+0xb40/0x2c80
  [   92.696324]  ? bpf_link_get_from_fd+0x90/0x90
  [   92.697150]  ? mark_held_locks+0x24/0x90
  [   92.698007]  ? lockdep_hardirqs_on_prepare+0x124/0x220
  [   92.699045]  ? finish_task_switch+0xe6/0x370
  [   92.700072]  ? lockdep_hardirqs_on+0x79/0x100
  [   92.701233]  ? finish_task_switch+0x11d/0x370
  [   92.702264]  ? __switch_to+0x2c0/0x740
  [   92.703148]  ? mark_held_locks+0x24/0x90
  [   92.704155]  ? __x64_sys_bpf+0x45/0x50
  [   92.705146]  ? do_syscall_64+0x35/0x80
  [   92.706953]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
  [...]

Turns out that the program rejection from e411901 ("bpf: allow for tailcalls
in BPF subprograms for x64 JIT") is buggy since env->prog->aux->tail_call_reachable
is never true. Commit ebf7d1f ("bpf, x64: rework pro/epilogue and tailcall
handling in JIT") added a tracker into check_max_stack_depth() which propagates
the tail_call_reachable condition throughout the subprograms. This info is then
assigned to the subprogram's func[i]->aux->tail_call_reachable. However, in the
case of the rejection check upon JIT failure, env->prog->aux->tail_call_reachable
is used. func[0]->aux->tail_call_reachable which represents the main program's
information did not propagate this to the outer env->prog->aux, though. Add this
propagation into check_max_stack_depth() where it needs to belong so that the
check can be done reliably.

Fixes: ebf7d1f ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
Fixes: e411901 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT")
Co-developed-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/618c34e3163ad1a36b1e82377576a6081e182f25.1626123173.git.daniel@iogearbox.net
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
MPTCP Bugs
  
Closed
Development

No branches or pull requests

3 participants