bpf: Fix tailcall hierarchy #6898

Asphaltt · 2024-04-27T03:04:29Z

Signed-off-by: Leon Hwang hffilwlqm@gmail.com

When there is tailcall in main prog but there is no tailcall in its subprogs, prog->aux->tail_call_reachable is incorrect for this case. In order to correct it, it has to check subprog[0].has_tail_call at the time when to check subprog[0].tail_call_reachable in check_max_stack_depth_subprog(). It's for fixing a tailcall issue whose patch relies on prog->aux->tail_call_reachable. Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>

In order to store tail_call_cnt on the stack of bpf prog's caller, introduce bpf_tail_call_run_ctx to wrap the original ctx with tail_call_cnt_ptr as a new ctx for bpf prog. To avoid breaking run time, introduce use_tail_call_run_ctx in prog->aux. This flag will be set when prog->aux->tail_call_reachable and the prog is jited and the arch supports tail_call_cnt_ptr at prog's prologue at load time. Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>

This patch fixes a tailcall issue caused by abusing the tailcall in bpf2bpf feature. As we know, tail_call_cnt propagates by rax from caller to callee when to call subprog in tailcall context. But, like the following example, MAX_TAIL_CALL_CNT won't work because of missing tail_call_cnt back-propagation from callee to caller. \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; static __noinline int subprog_tail1(struct __sk_buff *skb) { bpf_tail_call_static(skb, &jmp_table, 0); return 0; } static __noinline int subprog_tail2(struct __sk_buff *skb) { bpf_tail_call_static(skb, &jmp_table, 0); return 0; } SEC("tc") int entry(struct __sk_buff *skb) { volatile int ret = 1; count++; subprog_tail1(skb); subprog_tail2(skb); return ret; } char __license[] SEC("license") = "GPL"; At run time, the tail_call_cnt in entry() will be propagated to subprog_tail1() and subprog_tail2(). But, when the tail_call_cnt in subprog_tail1() updates when bpf_tail_call_static(), the tail_call_cnt in entry() won't be updated at the same time. As a result, in entry(), when tail_call_cnt in entry() is less than MAX_TAIL_CALL_CNT and subprog_tail1() returns because of MAX_TAIL_CALL_CNT limit, bpf_tail_call_static() in suprog_tail2() is able to run because the tail_call_cnt in subprog_tail2() propagated from entry() is less than MAX_TAIL_CALL_CNT. So, how many tailcalls are there for this case if no error happens? From top-down view, does it look like hierarchy layer and layer? With hierarchy layer model, there will be 2+4+8+...+2^33 = 2^34 - 2 = 17,179,869,182 tailcalls for this case. How about there are N subprog_tail() in entry()? There will be almost N^34 tailcalls. Then, in this patch, it resolves this case on x86_64. In stead of propagating tail_call_cnt from caller to callee, it propagate its pointer, tail_call_cnt_ptr, tcc_ptr for short. However, where does it store tail_call_cnt? It stores tail_call_cnt on the stack of bpf prog's caller by the way in previous patch "bpf: Introduce bpf tail call run ctx". Then, in bpf prog's prologue, it loads tcc_ptr from bpf_tail_call_run_ctx, and restores the original ctx from bpf_tail_call_run_ctx meanwhile. Then, when a tailcall runs, it compares tail_call_cnt accessed by tcc_ptr with MAX_TAIL_CALL_CNT and then increments tail_call_cnt at tcc_ptr. Furthermore, when trampoline is the caller of bpf prog, it is required to prepare tail_call_cnt and tail call run ctx on the stack of the trampoline. Finally, enable bpf_jit_supports_tail_call_cnt_ptr() to use bpf_tail_call_run_ctx in __bpf_prog_run(). Fixes: ebf7d1f ("bpf, x64: rework pro/epilogue and tailcall handling in JIT") Fixes: e411901 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT") Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>

Like the fix way in "bpf, x64: Fix tailcall hierarchy", this patch fixes this issue on arm64. At prologue, it loads tail_call_cnt_ptr from bpf_tail_call_run_ctx to TCCNT_PTR register, and restores the original ctx from bpf_tail_call_run_ctx meanwhile. Then, when a tailcall runs: 1. load tail_call_cnt from tail_call_cnt_ptr at TCCNT_PTR register 2. compare tail_call_cnt with MAX_TAIL_CALL_CNT 3. increment tail_call_cnt 4. store tail_call_cnt by tail_call_cnt_ptr at TCCNT_PTR register Furthermore, when trampoline is the caller of bpf prog, it is required to prepare tail_call_cnt and tail call run ctx on the stack of the trampoline. Finally, enable bpf_jit_supports_tail_call_cnt_ptr() to use bpf_tail_call_run_ctx in __bpf_prog_run(). Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>

Add some test cases to confirm the tailcall hierarchy issue has been fixed. On x64, the selftests result is: cd tools/testing/selftests/bpf; && ./test_progs -t tailcalls 319/18 tailcalls/tailcall_bpf2bpf_hierarchy_1:OK 319/19 tailcalls/tailcall_bpf2bpf_hierarchy_fentry:OK 319/20 tailcalls/tailcall_bpf2bpf_hierarchy_fexit:OK 319/21 tailcalls/tailcall_bpf2bpf_hierarchy_fentry_fexit:OK 319/22 tailcalls/tailcall_bpf2bpf_hierarchy_fentry_entry:OK 319/23 tailcalls/tailcall_bpf2bpf_hierarchy_2:OK 319/24 tailcalls/tailcall_bpf2bpf_hierarchy_3:OK 319 tailcalls:OK Summary: 1/24 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>

Asphaltt added 5 commits April 27, 2024 10:59

Asphaltt force-pushed the bpf/tailcall-hierarchy branch from 956100f to 40090f4 Compare April 27, 2024 05:16

Asphaltt closed this Apr 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpf: Fix tailcall hierarchy #6898

bpf: Fix tailcall hierarchy #6898

Asphaltt commented Apr 27, 2024

bpf: Fix tailcall hierarchy #6898

bpf: Fix tailcall hierarchy #6898

Conversation

Asphaltt commented Apr 27, 2024