Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: Fix tailcall hierarchy #6898

Closed

Conversation

Asphaltt
Copy link
Contributor

Signed-off-by: Leon Hwang hffilwlqm@gmail.com

When there is tailcall in main prog but there is no tailcall in its
subprogs, prog->aux->tail_call_reachable is incorrect for this case.

In order to correct it, it has to check subprog[0].has_tail_call at the
time when to check subprog[0].tail_call_reachable in
check_max_stack_depth_subprog().

It's for fixing a tailcall issue whose patch relies on
prog->aux->tail_call_reachable.

Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
In order to store tail_call_cnt on the stack of bpf prog's caller,
introduce bpf_tail_call_run_ctx to wrap the original ctx with
tail_call_cnt_ptr as a new ctx for bpf prog.

To avoid breaking run time, introduce use_tail_call_run_ctx in
prog->aux. This flag will be set when prog->aux->tail_call_reachable and
the prog is jited and the arch supports tail_call_cnt_ptr at prog's
prologue at load time.

Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
This patch fixes a tailcall issue caused by abusing the tailcall in
bpf2bpf feature.

As we know, tail_call_cnt propagates by rax from caller to callee when
to call subprog in tailcall context. But, like the following example,
MAX_TAIL_CALL_CNT won't work because of missing tail_call_cnt
back-propagation from callee to caller.

\#include <linux/bpf.h>
\#include <bpf/bpf_helpers.h>
\#include "bpf_legacy.h"

struct {
	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
	__uint(max_entries, 1);
	__uint(key_size, sizeof(__u32));
	__uint(value_size, sizeof(__u32));
} jmp_table SEC(".maps");

int count = 0;

static __noinline
int subprog_tail1(struct __sk_buff *skb)
{
	bpf_tail_call_static(skb, &jmp_table, 0);
	return 0;
}

static __noinline
int subprog_tail2(struct __sk_buff *skb)
{
	bpf_tail_call_static(skb, &jmp_table, 0);
	return 0;
}

SEC("tc")
int entry(struct __sk_buff *skb)
{
	volatile int ret = 1;

	count++;
	subprog_tail1(skb);
	subprog_tail2(skb);

	return ret;
}

char __license[] SEC("license") = "GPL";

At run time, the tail_call_cnt in entry() will be propagated to
subprog_tail1() and subprog_tail2(). But, when the tail_call_cnt in
subprog_tail1() updates when bpf_tail_call_static(), the tail_call_cnt
in entry() won't be updated at the same time. As a result, in entry(),
when tail_call_cnt in entry() is less than MAX_TAIL_CALL_CNT and
subprog_tail1() returns because of MAX_TAIL_CALL_CNT limit,
bpf_tail_call_static() in suprog_tail2() is able to run because the
tail_call_cnt in subprog_tail2() propagated from entry() is less than
MAX_TAIL_CALL_CNT.

So, how many tailcalls are there for this case if no error happens?

From top-down view, does it look like hierarchy layer and layer?

With hierarchy layer model, there will be 2+4+8+...+2^33 = 2^34 - 2 =
17,179,869,182 tailcalls for this case.

How about there are N subprog_tail() in entry()? There will be almost
N^34 tailcalls.

Then, in this patch, it resolves this case on x86_64.

In stead of propagating tail_call_cnt from caller to callee, it
propagate its pointer, tail_call_cnt_ptr, tcc_ptr for short.

However, where does it store tail_call_cnt?

It stores tail_call_cnt on the stack of bpf prog's caller by the way in
previous patch "bpf: Introduce bpf tail call run ctx". Then, in bpf
prog's prologue, it loads tcc_ptr from bpf_tail_call_run_ctx, and
restores the original ctx from bpf_tail_call_run_ctx meanwhile.

Then, when a tailcall runs, it compares tail_call_cnt accessed by
tcc_ptr with MAX_TAIL_CALL_CNT and then increments tail_call_cnt at
tcc_ptr.

Furthermore, when trampoline is the caller of bpf prog, it is required
to prepare tail_call_cnt and tail call run ctx on the stack of the
trampoline.

Finally, enable bpf_jit_supports_tail_call_cnt_ptr() to use
bpf_tail_call_run_ctx in __bpf_prog_run().

Fixes: ebf7d1f ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
Fixes: e411901 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT")
Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
Like the fix way in "bpf, x64: Fix tailcall hierarchy", this patch fixes
this issue on arm64.

At prologue, it loads tail_call_cnt_ptr from bpf_tail_call_run_ctx to
TCCNT_PTR register, and restores the original ctx from
bpf_tail_call_run_ctx meanwhile.

Then, when a tailcall runs:
1. load tail_call_cnt from tail_call_cnt_ptr at TCCNT_PTR register
2. compare tail_call_cnt with MAX_TAIL_CALL_CNT
3. increment tail_call_cnt
4. store tail_call_cnt by tail_call_cnt_ptr at TCCNT_PTR register

Furthermore, when trampoline is the caller of bpf prog, it is required
to prepare tail_call_cnt and tail call run ctx on the stack of the
trampoline.

Finally, enable bpf_jit_supports_tail_call_cnt_ptr() to use
bpf_tail_call_run_ctx in __bpf_prog_run().

Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
Add some test cases to confirm the tailcall hierarchy issue has been fixed.

On x64, the selftests result is:

cd tools/testing/selftests/bpf; && ./test_progs -t tailcalls
319/18  tailcalls/tailcall_bpf2bpf_hierarchy_1:OK
319/19  tailcalls/tailcall_bpf2bpf_hierarchy_fentry:OK
319/20  tailcalls/tailcall_bpf2bpf_hierarchy_fexit:OK
319/21  tailcalls/tailcall_bpf2bpf_hierarchy_fentry_fexit:OK
319/22  tailcalls/tailcall_bpf2bpf_hierarchy_fentry_entry:OK
319/23  tailcalls/tailcall_bpf2bpf_hierarchy_2:OK
319/24  tailcalls/tailcall_bpf2bpf_hierarchy_3:OK
319     tailcalls:OK
Summary: 1/24 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant