-
Notifications
You must be signed in to change notification settings - Fork 5
bpf: Don't redirect packets with pkt_len 0 #618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Syzbot found an issue [1]: fq_codel_drop() try to drop a flow whitout any skbs, that is, the flow->head is null. The root cause, as the [2] says, is because that bpf_prog_test_run_skb() run a bpf prog which redirects empty skbs. So we should determine whether the length of the packet modified by bpf prog or others like bpf_prog_test is 0 before forwarding it directly. LINK: [1] https://syzkaller.appspot.com/bug?id=0b84da80c2917757915afa89f7738a9d16ec96c5 LINK: [2] https://www.spinics.net/lists/netdev/msg777503.html Reported-by: syzbot+7a12909485b94426aceb@syzkaller.appspotmail.com Signed-off-by: Di Zhu <zhudi2@huawei.com>
|
Master branch: 4429bdc |
|
At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=651294 expired. Closing PR. |
kernel-patches-daemon-bpf-rc bot
pushed a commit
that referenced
this pull request
Oct 14, 2025
With latest llvm22, I hit the verif_scale_strobemeta selftest failure below: $ ./test_progs -n 618 libbpf: prog 'on_event': BPF program load failed: -E2BIG libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG -- BPF program is too large. Processed 1000001 insn verification time 7019091 usec stack depth 488 processed 1000001 insns (limit 1000000) max_states_per_insn 28 total_states 33927 peak_states 12813 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'on_event': failed to load: -E2BIG libbpf: failed to load object 'strobemeta.bpf.o' scale_test:FAIL:expect_success unexpected error: -7 (errno 7) #618 verif_scale_strobemeta:FAIL But if I increase the verificaiton insn limit from 1M to 10M, the above test_progs run actually will succeed. The below is the result from veristat: $ ./veristat strobemeta.bpf.o Processing 'strobemeta.bpf.o'... File Program Verdict Duration (us) Insns States Program size Jited size ---------------- -------- ------- ------------- ------- ------ ------------ ---------- strobemeta.bpf.o on_event success 90250893 9777685 358230 15954 80794 ---------------- -------- ------- ------------- ------- ------ ------------ ---------- Done. Processed 1 files, 0 programs. Skipped 1 files, 0 programs. Further debugging shows the llvm commit [1] is responsible for the verificaiton failure as it tries to convert certain switch statement to if-condition. Such change may cause different transformation compared to original switch statement. In bpf program strobemeta.c case, the initial llvm ir for read_int_var() function is define internal void @read_int_var(ptr noundef %0, i64 noundef %1, ptr noundef %2, ptr noundef %3, ptr noundef %4) #2 !dbg !535 { %6 = alloca ptr, align 8 %7 = alloca i64, align 8 %8 = alloca ptr, align 8 %9 = alloca ptr, align 8 %10 = alloca ptr, align 8 %11 = alloca ptr, align 8 %12 = alloca i32, align 4 ... %20 = icmp ne ptr %19, null, !dbg !561 br i1 %20, label %22, label %21, !dbg !562 21: ; preds = %5 store i32 1, ptr %12, align 4 br label %48, !dbg !563 22: %23 = load ptr, ptr %9, align 8, !dbg !564 ... 47: ; preds = %38, %22 store i32 0, ptr %12, align 4, !dbg !588 br label %48, !dbg !588 48: ; preds = %47, %21 call void @llvm.lifetime.end.p0(ptr %11) #4, !dbg !588 %49 = load i32, ptr %12, align 4 switch i32 %49, label %51 [ i32 0, label %50 i32 1, label %50 ] 50: ; preds = %48, %48 ret void, !dbg !589 51: ; preds = %48 unreachable } Note that the above 'switch' statement is added by clang frontend. Without [1], the switch statement will survive until SelectionDag, so the switch statement acts like a 'barrier' and prevents some transformation involved with both 'before' and 'after' the switch statement. But with [1], the switch statement will be removed during middle end optimization and later middle end passes (esp. after inlining) have more freedom to reorder the code. The following is the related source code: static void *calc_location(struct strobe_value_loc *loc, void *tls_base): bpf_probe_read_user(&tls_ptr, sizeof(void *), dtv); /* if pointer has (void *)-1 value, then TLS wasn't initialized yet */ return tls_ptr && tls_ptr != (void *)-1 ? tls_ptr + tls_index.offset : NULL; In read_int_var() func, we have: void *location = calc_location(&cfg->int_locs[idx], tls_base); if (!location) return; bpf_probe_read_user(value, sizeof(struct strobe_value_generic), location); ... The static func calc_location() is called inside read_int_var(). The asm code without [1]: 77: .123....89 (85) call bpf_probe_read_user#112 78: ........89 (79) r1 = *(u64 *)(r10 -368) 79: .1......89 (79) r2 = *(u64 *)(r10 -8) 80: .12.....89 (bf) r3 = r2 81: .123....89 (0f) r3 += r1 82: ..23....89 (07) r2 += 1 83: ..23....89 (79) r4 = *(u64 *)(r10 -464) 84: ..234...89 (a5) if r2 < 0x2 goto pc+13 85: ...34...89 (15) if r3 == 0x0 goto pc+12 86: ...3....89 (bf) r1 = r10 87: .1.3....89 (07) r1 += -400 88: .1.3....89 (b4) w2 = 16 In this case, 'r2 < 0x2' and 'r3 == 0x0' go to null 'locaiton' place, so the verifier actually prefers to do verification first at 'r1 = r10' etc. The asm code with [1]: 119: .123....89 (85) call bpf_probe_read_user#112 120: ........89 (79) r1 = *(u64 *)(r10 -368) 121: .1......89 (79) r2 = *(u64 *)(r10 -8) 122: .12.....89 (bf) r3 = r2 123: .123....89 (0f) r3 += r1 124: ..23....89 (07) r2 += -1 125: ..23....89 (a5) if r2 < 0xfffffffe goto pc+6 126: ........89 (05) goto pc+17 ... 144: ........89 (b4) w1 = 0 145: .1......89 (6b) *(u16 *)(r8 +80) = r1 In this case, if 'r2 < 0xfffffffe' is true, the control will go to non-null 'location' branch, so 'goto pc+17' will actually go to null 'location' branch. This seems causing tremendous amount of verificaiton state. To fix the issue, rewrite the following code return tls_ptr && tls_ptr != (void *)-1 ? tls_ptr + tls_index.offset : NULL; to if/then statement and hopefully these explicit if/then statements are sticky during middle-end optimizations. Test with llvm20 and llvm21 as well and all strobemeta related selftests are passed. [1] llvm/llvm-project#161000 Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
kernel-patches-daemon-bpf-rc bot
pushed a commit
that referenced
this pull request
Oct 14, 2025
With latest llvm22, I hit the verif_scale_strobemeta selftest failure below: $ ./test_progs -n 618 libbpf: prog 'on_event': BPF program load failed: -E2BIG libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG -- BPF program is too large. Processed 1000001 insn verification time 7019091 usec stack depth 488 processed 1000001 insns (limit 1000000) max_states_per_insn 28 total_states 33927 peak_states 12813 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'on_event': failed to load: -E2BIG libbpf: failed to load object 'strobemeta.bpf.o' scale_test:FAIL:expect_success unexpected error: -7 (errno 7) #618 verif_scale_strobemeta:FAIL But if I increase the verificaiton insn limit from 1M to 10M, the above test_progs run actually will succeed. The below is the result from veristat: $ ./veristat strobemeta.bpf.o Processing 'strobemeta.bpf.o'... File Program Verdict Duration (us) Insns States Program size Jited size ---------------- -------- ------- ------------- ------- ------ ------------ ---------- strobemeta.bpf.o on_event success 90250893 9777685 358230 15954 80794 ---------------- -------- ------- ------------- ------- ------ ------------ ---------- Done. Processed 1 files, 0 programs. Skipped 1 files, 0 programs. Further debugging shows the llvm commit [1] is responsible for the verificaiton failure as it tries to convert certain switch statement to if-condition. Such change may cause different transformation compared to original switch statement. In bpf program strobemeta.c case, the initial llvm ir for read_int_var() function is define internal void @read_int_var(ptr noundef %0, i64 noundef %1, ptr noundef %2, ptr noundef %3, ptr noundef %4) #2 !dbg !535 { %6 = alloca ptr, align 8 %7 = alloca i64, align 8 %8 = alloca ptr, align 8 %9 = alloca ptr, align 8 %10 = alloca ptr, align 8 %11 = alloca ptr, align 8 %12 = alloca i32, align 4 ... %20 = icmp ne ptr %19, null, !dbg !561 br i1 %20, label %22, label %21, !dbg !562 21: ; preds = %5 store i32 1, ptr %12, align 4 br label %48, !dbg !563 22: %23 = load ptr, ptr %9, align 8, !dbg !564 ... 47: ; preds = %38, %22 store i32 0, ptr %12, align 4, !dbg !588 br label %48, !dbg !588 48: ; preds = %47, %21 call void @llvm.lifetime.end.p0(ptr %11) #4, !dbg !588 %49 = load i32, ptr %12, align 4 switch i32 %49, label %51 [ i32 0, label %50 i32 1, label %50 ] 50: ; preds = %48, %48 ret void, !dbg !589 51: ; preds = %48 unreachable } Note that the above 'switch' statement is added by clang frontend. Without [1], the switch statement will survive until SelectionDag, so the switch statement acts like a 'barrier' and prevents some transformation involved with both 'before' and 'after' the switch statement. But with [1], the switch statement will be removed during middle end optimization and later middle end passes (esp. after inlining) have more freedom to reorder the code. The following is the related source code: static void *calc_location(struct strobe_value_loc *loc, void *tls_base): bpf_probe_read_user(&tls_ptr, sizeof(void *), dtv); /* if pointer has (void *)-1 value, then TLS wasn't initialized yet */ return tls_ptr && tls_ptr != (void *)-1 ? tls_ptr + tls_index.offset : NULL; In read_int_var() func, we have: void *location = calc_location(&cfg->int_locs[idx], tls_base); if (!location) return; bpf_probe_read_user(value, sizeof(struct strobe_value_generic), location); ... The static func calc_location() is called inside read_int_var(). The asm code without [1]: 77: .123....89 (85) call bpf_probe_read_user#112 78: ........89 (79) r1 = *(u64 *)(r10 -368) 79: .1......89 (79) r2 = *(u64 *)(r10 -8) 80: .12.....89 (bf) r3 = r2 81: .123....89 (0f) r3 += r1 82: ..23....89 (07) r2 += 1 83: ..23....89 (79) r4 = *(u64 *)(r10 -464) 84: ..234...89 (a5) if r2 < 0x2 goto pc+13 85: ...34...89 (15) if r3 == 0x0 goto pc+12 86: ...3....89 (bf) r1 = r10 87: .1.3....89 (07) r1 += -400 88: .1.3....89 (b4) w2 = 16 In this case, 'r2 < 0x2' and 'r3 == 0x0' go to null 'locaiton' place, so the verifier actually prefers to do verification first at 'r1 = r10' etc. The asm code with [1]: 119: .123....89 (85) call bpf_probe_read_user#112 120: ........89 (79) r1 = *(u64 *)(r10 -368) 121: .1......89 (79) r2 = *(u64 *)(r10 -8) 122: .12.....89 (bf) r3 = r2 123: .123....89 (0f) r3 += r1 124: ..23....89 (07) r2 += -1 125: ..23....89 (a5) if r2 < 0xfffffffe goto pc+6 126: ........89 (05) goto pc+17 ... 144: ........89 (b4) w1 = 0 145: .1......89 (6b) *(u16 *)(r8 +80) = r1 In this case, if 'r2 < 0xfffffffe' is true, the control will go to non-null 'location' branch, so 'goto pc+17' will actually go to null 'location' branch. This seems causing tremendous amount of verificaiton state. To fix the issue, rewrite the following code return tls_ptr && tls_ptr != (void *)-1 ? tls_ptr + tls_index.offset : NULL; to if/then statement and hopefully these explicit if/then statements are sticky during middle-end optimizations. Test with llvm20 and llvm21 as well and all strobemeta related selftests are passed. [1] llvm/llvm-project#161000 Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
kernel-patches-daemon-bpf-rc bot
pushed a commit
that referenced
this pull request
Oct 15, 2025
With latest llvm22, I hit the verif_scale_strobemeta selftest failure below: $ ./test_progs -n 618 libbpf: prog 'on_event': BPF program load failed: -E2BIG libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG -- BPF program is too large. Processed 1000001 insn verification time 7019091 usec stack depth 488 processed 1000001 insns (limit 1000000) max_states_per_insn 28 total_states 33927 peak_states 12813 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'on_event': failed to load: -E2BIG libbpf: failed to load object 'strobemeta.bpf.o' scale_test:FAIL:expect_success unexpected error: -7 (errno 7) #618 verif_scale_strobemeta:FAIL But if I increase the verificaiton insn limit from 1M to 10M, the above test_progs run actually will succeed. The below is the result from veristat: $ ./veristat strobemeta.bpf.o Processing 'strobemeta.bpf.o'... File Program Verdict Duration (us) Insns States Program size Jited size ---------------- -------- ------- ------------- ------- ------ ------------ ---------- strobemeta.bpf.o on_event success 90250893 9777685 358230 15954 80794 ---------------- -------- ------- ------------- ------- ------ ------------ ---------- Done. Processed 1 files, 0 programs. Skipped 1 files, 0 programs. Further debugging shows the llvm commit [1] is responsible for the verificaiton failure as it tries to convert certain switch statement to if-condition. Such change may cause different transformation compared to original switch statement. In bpf program strobemeta.c case, the initial llvm ir for read_int_var() function is define internal void @read_int_var(ptr noundef %0, i64 noundef %1, ptr noundef %2, ptr noundef %3, ptr noundef %4) #2 !dbg !535 { %6 = alloca ptr, align 8 %7 = alloca i64, align 8 %8 = alloca ptr, align 8 %9 = alloca ptr, align 8 %10 = alloca ptr, align 8 %11 = alloca ptr, align 8 %12 = alloca i32, align 4 ... %20 = icmp ne ptr %19, null, !dbg !561 br i1 %20, label %22, label %21, !dbg !562 21: ; preds = %5 store i32 1, ptr %12, align 4 br label %48, !dbg !563 22: %23 = load ptr, ptr %9, align 8, !dbg !564 ... 47: ; preds = %38, %22 store i32 0, ptr %12, align 4, !dbg !588 br label %48, !dbg !588 48: ; preds = %47, %21 call void @llvm.lifetime.end.p0(ptr %11) #4, !dbg !588 %49 = load i32, ptr %12, align 4 switch i32 %49, label %51 [ i32 0, label %50 i32 1, label %50 ] 50: ; preds = %48, %48 ret void, !dbg !589 51: ; preds = %48 unreachable } Note that the above 'switch' statement is added by clang frontend. Without [1], the switch statement will survive until SelectionDag, so the switch statement acts like a 'barrier' and prevents some transformation involved with both 'before' and 'after' the switch statement. But with [1], the switch statement will be removed during middle end optimization and later middle end passes (esp. after inlining) have more freedom to reorder the code. The following is the related source code: static void *calc_location(struct strobe_value_loc *loc, void *tls_base): bpf_probe_read_user(&tls_ptr, sizeof(void *), dtv); /* if pointer has (void *)-1 value, then TLS wasn't initialized yet */ return tls_ptr && tls_ptr != (void *)-1 ? tls_ptr + tls_index.offset : NULL; In read_int_var() func, we have: void *location = calc_location(&cfg->int_locs[idx], tls_base); if (!location) return; bpf_probe_read_user(value, sizeof(struct strobe_value_generic), location); ... The static func calc_location() is called inside read_int_var(). The asm code without [1]: 77: .123....89 (85) call bpf_probe_read_user#112 78: ........89 (79) r1 = *(u64 *)(r10 -368) 79: .1......89 (79) r2 = *(u64 *)(r10 -8) 80: .12.....89 (bf) r3 = r2 81: .123....89 (0f) r3 += r1 82: ..23....89 (07) r2 += 1 83: ..23....89 (79) r4 = *(u64 *)(r10 -464) 84: ..234...89 (a5) if r2 < 0x2 goto pc+13 85: ...34...89 (15) if r3 == 0x0 goto pc+12 86: ...3....89 (bf) r1 = r10 87: .1.3....89 (07) r1 += -400 88: .1.3....89 (b4) w2 = 16 In this case, 'r2 < 0x2' and 'r3 == 0x0' go to null 'locaiton' place, so the verifier actually prefers to do verification first at 'r1 = r10' etc. The asm code with [1]: 119: .123....89 (85) call bpf_probe_read_user#112 120: ........89 (79) r1 = *(u64 *)(r10 -368) 121: .1......89 (79) r2 = *(u64 *)(r10 -8) 122: .12.....89 (bf) r3 = r2 123: .123....89 (0f) r3 += r1 124: ..23....89 (07) r2 += -1 125: ..23....89 (a5) if r2 < 0xfffffffe goto pc+6 126: ........89 (05) goto pc+17 ... 144: ........89 (b4) w1 = 0 145: .1......89 (6b) *(u16 *)(r8 +80) = r1 In this case, if 'r2 < 0xfffffffe' is true, the control will go to non-null 'location' branch, so 'goto pc+17' will actually go to null 'location' branch. This seems causing tremendous amount of verificaiton state. To fix the issue, rewrite the following code return tls_ptr && tls_ptr != (void *)-1 ? tls_ptr + tls_index.offset : NULL; to if/then statement and hopefully these explicit if/then statements are sticky during middle-end optimizations. Test with llvm20 and llvm21 as well and all strobemeta related selftests are passed. [1] llvm/llvm-project#161000 Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
kernel-patches-daemon-bpf-rc bot
pushed a commit
that referenced
this pull request
Oct 15, 2025
With latest llvm22, I hit the verif_scale_strobemeta selftest failure below: $ ./test_progs -n 618 libbpf: prog 'on_event': BPF program load failed: -E2BIG libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG -- BPF program is too large. Processed 1000001 insn verification time 7019091 usec stack depth 488 processed 1000001 insns (limit 1000000) max_states_per_insn 28 total_states 33927 peak_states 12813 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'on_event': failed to load: -E2BIG libbpf: failed to load object 'strobemeta.bpf.o' scale_test:FAIL:expect_success unexpected error: -7 (errno 7) #618 verif_scale_strobemeta:FAIL But if I increase the verificaiton insn limit from 1M to 10M, the above test_progs run actually will succeed. The below is the result from veristat: $ ./veristat strobemeta.bpf.o Processing 'strobemeta.bpf.o'... File Program Verdict Duration (us) Insns States Program size Jited size ---------------- -------- ------- ------------- ------- ------ ------------ ---------- strobemeta.bpf.o on_event success 90250893 9777685 358230 15954 80794 ---------------- -------- ------- ------------- ------- ------ ------------ ---------- Done. Processed 1 files, 0 programs. Skipped 1 files, 0 programs. Further debugging shows the llvm commit [1] is responsible for the verificaiton failure as it tries to convert certain switch statement to if-condition. Such change may cause different transformation compared to original switch statement. In bpf program strobemeta.c case, the initial llvm ir for read_int_var() function is define internal void @read_int_var(ptr noundef %0, i64 noundef %1, ptr noundef %2, ptr noundef %3, ptr noundef %4) #2 !dbg !535 { %6 = alloca ptr, align 8 %7 = alloca i64, align 8 %8 = alloca ptr, align 8 %9 = alloca ptr, align 8 %10 = alloca ptr, align 8 %11 = alloca ptr, align 8 %12 = alloca i32, align 4 ... %20 = icmp ne ptr %19, null, !dbg !561 br i1 %20, label %22, label %21, !dbg !562 21: ; preds = %5 store i32 1, ptr %12, align 4 br label %48, !dbg !563 22: %23 = load ptr, ptr %9, align 8, !dbg !564 ... 47: ; preds = %38, %22 store i32 0, ptr %12, align 4, !dbg !588 br label %48, !dbg !588 48: ; preds = %47, %21 call void @llvm.lifetime.end.p0(ptr %11) #4, !dbg !588 %49 = load i32, ptr %12, align 4 switch i32 %49, label %51 [ i32 0, label %50 i32 1, label %50 ] 50: ; preds = %48, %48 ret void, !dbg !589 51: ; preds = %48 unreachable } Note that the above 'switch' statement is added by clang frontend. Without [1], the switch statement will survive until SelectionDag, so the switch statement acts like a 'barrier' and prevents some transformation involved with both 'before' and 'after' the switch statement. But with [1], the switch statement will be removed during middle end optimization and later middle end passes (esp. after inlining) have more freedom to reorder the code. The following is the related source code: static void *calc_location(struct strobe_value_loc *loc, void *tls_base): bpf_probe_read_user(&tls_ptr, sizeof(void *), dtv); /* if pointer has (void *)-1 value, then TLS wasn't initialized yet */ return tls_ptr && tls_ptr != (void *)-1 ? tls_ptr + tls_index.offset : NULL; In read_int_var() func, we have: void *location = calc_location(&cfg->int_locs[idx], tls_base); if (!location) return; bpf_probe_read_user(value, sizeof(struct strobe_value_generic), location); ... The static func calc_location() is called inside read_int_var(). The asm code without [1]: 77: .123....89 (85) call bpf_probe_read_user#112 78: ........89 (79) r1 = *(u64 *)(r10 -368) 79: .1......89 (79) r2 = *(u64 *)(r10 -8) 80: .12.....89 (bf) r3 = r2 81: .123....89 (0f) r3 += r1 82: ..23....89 (07) r2 += 1 83: ..23....89 (79) r4 = *(u64 *)(r10 -464) 84: ..234...89 (a5) if r2 < 0x2 goto pc+13 85: ...34...89 (15) if r3 == 0x0 goto pc+12 86: ...3....89 (bf) r1 = r10 87: .1.3....89 (07) r1 += -400 88: .1.3....89 (b4) w2 = 16 In this case, 'r2 < 0x2' and 'r3 == 0x0' go to null 'locaiton' place, so the verifier actually prefers to do verification first at 'r1 = r10' etc. The asm code with [1]: 119: .123....89 (85) call bpf_probe_read_user#112 120: ........89 (79) r1 = *(u64 *)(r10 -368) 121: .1......89 (79) r2 = *(u64 *)(r10 -8) 122: .12.....89 (bf) r3 = r2 123: .123....89 (0f) r3 += r1 124: ..23....89 (07) r2 += -1 125: ..23....89 (a5) if r2 < 0xfffffffe goto pc+6 126: ........89 (05) goto pc+17 ... 144: ........89 (b4) w1 = 0 145: .1......89 (6b) *(u16 *)(r8 +80) = r1 In this case, if 'r2 < 0xfffffffe' is true, the control will go to non-null 'location' branch, so 'goto pc+17' will actually go to null 'location' branch. This seems causing tremendous amount of verificaiton state. To fix the issue, rewrite the following code return tls_ptr && tls_ptr != (void *)-1 ? tls_ptr + tls_index.offset : NULL; to if/then statement and hopefully these explicit if/then statements are sticky during middle-end optimizations. Test with llvm20 and llvm21 as well and all strobemeta related selftests are passed. [1] llvm/llvm-project#161000 Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
kernel-patches-daemon-bpf-rc bot
pushed a commit
that referenced
this pull request
Oct 15, 2025
With latest llvm22, I hit the verif_scale_strobemeta selftest failure below: $ ./test_progs -n 618 libbpf: prog 'on_event': BPF program load failed: -E2BIG libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG -- BPF program is too large. Processed 1000001 insn verification time 7019091 usec stack depth 488 processed 1000001 insns (limit 1000000) max_states_per_insn 28 total_states 33927 peak_states 12813 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'on_event': failed to load: -E2BIG libbpf: failed to load object 'strobemeta.bpf.o' scale_test:FAIL:expect_success unexpected error: -7 (errno 7) #618 verif_scale_strobemeta:FAIL But if I increase the verificaiton insn limit from 1M to 10M, the above test_progs run actually will succeed. The below is the result from veristat: $ ./veristat strobemeta.bpf.o Processing 'strobemeta.bpf.o'... File Program Verdict Duration (us) Insns States Program size Jited size ---------------- -------- ------- ------------- ------- ------ ------------ ---------- strobemeta.bpf.o on_event success 90250893 9777685 358230 15954 80794 ---------------- -------- ------- ------------- ------- ------ ------------ ---------- Done. Processed 1 files, 0 programs. Skipped 1 files, 0 programs. Further debugging shows the llvm commit [1] is responsible for the verificaiton failure as it tries to convert certain switch statement to if-condition. Such change may cause different transformation compared to original switch statement. In bpf program strobemeta.c case, the initial llvm ir for read_int_var() function is define internal void @read_int_var(ptr noundef %0, i64 noundef %1, ptr noundef %2, ptr noundef %3, ptr noundef %4) #2 !dbg !535 { %6 = alloca ptr, align 8 %7 = alloca i64, align 8 %8 = alloca ptr, align 8 %9 = alloca ptr, align 8 %10 = alloca ptr, align 8 %11 = alloca ptr, align 8 %12 = alloca i32, align 4 ... %20 = icmp ne ptr %19, null, !dbg !561 br i1 %20, label %22, label %21, !dbg !562 21: ; preds = %5 store i32 1, ptr %12, align 4 br label %48, !dbg !563 22: %23 = load ptr, ptr %9, align 8, !dbg !564 ... 47: ; preds = %38, %22 store i32 0, ptr %12, align 4, !dbg !588 br label %48, !dbg !588 48: ; preds = %47, %21 call void @llvm.lifetime.end.p0(ptr %11) #4, !dbg !588 %49 = load i32, ptr %12, align 4 switch i32 %49, label %51 [ i32 0, label %50 i32 1, label %50 ] 50: ; preds = %48, %48 ret void, !dbg !589 51: ; preds = %48 unreachable } Note that the above 'switch' statement is added by clang frontend. Without [1], the switch statement will survive until SelectionDag, so the switch statement acts like a 'barrier' and prevents some transformation involved with both 'before' and 'after' the switch statement. But with [1], the switch statement will be removed during middle end optimization and later middle end passes (esp. after inlining) have more freedom to reorder the code. The following is the related source code: static void *calc_location(struct strobe_value_loc *loc, void *tls_base): bpf_probe_read_user(&tls_ptr, sizeof(void *), dtv); /* if pointer has (void *)-1 value, then TLS wasn't initialized yet */ return tls_ptr && tls_ptr != (void *)-1 ? tls_ptr + tls_index.offset : NULL; In read_int_var() func, we have: void *location = calc_location(&cfg->int_locs[idx], tls_base); if (!location) return; bpf_probe_read_user(value, sizeof(struct strobe_value_generic), location); ... The static func calc_location() is called inside read_int_var(). The asm code without [1]: 77: .123....89 (85) call bpf_probe_read_user#112 78: ........89 (79) r1 = *(u64 *)(r10 -368) 79: .1......89 (79) r2 = *(u64 *)(r10 -8) 80: .12.....89 (bf) r3 = r2 81: .123....89 (0f) r3 += r1 82: ..23....89 (07) r2 += 1 83: ..23....89 (79) r4 = *(u64 *)(r10 -464) 84: ..234...89 (a5) if r2 < 0x2 goto pc+13 85: ...34...89 (15) if r3 == 0x0 goto pc+12 86: ...3....89 (bf) r1 = r10 87: .1.3....89 (07) r1 += -400 88: .1.3....89 (b4) w2 = 16 In this case, 'r2 < 0x2' and 'r3 == 0x0' go to null 'locaiton' place, so the verifier actually prefers to do verification first at 'r1 = r10' etc. The asm code with [1]: 119: .123....89 (85) call bpf_probe_read_user#112 120: ........89 (79) r1 = *(u64 *)(r10 -368) 121: .1......89 (79) r2 = *(u64 *)(r10 -8) 122: .12.....89 (bf) r3 = r2 123: .123....89 (0f) r3 += r1 124: ..23....89 (07) r2 += -1 125: ..23....89 (a5) if r2 < 0xfffffffe goto pc+6 126: ........89 (05) goto pc+17 ... 144: ........89 (b4) w1 = 0 145: .1......89 (6b) *(u16 *)(r8 +80) = r1 In this case, if 'r2 < 0xfffffffe' is true, the control will go to non-null 'location' branch, so 'goto pc+17' will actually go to null 'location' branch. This seems causing tremendous amount of verificaiton state. To fix the issue, rewrite the following code return tls_ptr && tls_ptr != (void *)-1 ? tls_ptr + tls_index.offset : NULL; to if/then statement and hopefully these explicit if/then statements are sticky during middle-end optimizations. Test with llvm20 and llvm21 as well and all strobemeta related selftests are passed. [1] llvm/llvm-project#161000 Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
kernel-patches-daemon-bpf-rc bot
pushed a commit
that referenced
this pull request
Oct 16, 2025
With latest llvm22, I hit the verif_scale_strobemeta selftest failure below: $ ./test_progs -n 618 libbpf: prog 'on_event': BPF program load failed: -E2BIG libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG -- BPF program is too large. Processed 1000001 insn verification time 7019091 usec stack depth 488 processed 1000001 insns (limit 1000000) max_states_per_insn 28 total_states 33927 peak_states 12813 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'on_event': failed to load: -E2BIG libbpf: failed to load object 'strobemeta.bpf.o' scale_test:FAIL:expect_success unexpected error: -7 (errno 7) #618 verif_scale_strobemeta:FAIL But if I increase the verificaiton insn limit from 1M to 10M, the above test_progs run actually will succeed. The below is the result from veristat: $ ./veristat strobemeta.bpf.o Processing 'strobemeta.bpf.o'... File Program Verdict Duration (us) Insns States Program size Jited size ---------------- -------- ------- ------------- ------- ------ ------------ ---------- strobemeta.bpf.o on_event success 90250893 9777685 358230 15954 80794 ---------------- -------- ------- ------------- ------- ------ ------------ ---------- Done. Processed 1 files, 0 programs. Skipped 1 files, 0 programs. Further debugging shows the llvm commit [1] is responsible for the verificaiton failure as it tries to convert certain switch statement to if-condition. Such change may cause different transformation compared to original switch statement. In bpf program strobemeta.c case, the initial llvm ir for read_int_var() function is define internal void @read_int_var(ptr noundef %0, i64 noundef %1, ptr noundef %2, ptr noundef %3, ptr noundef %4) #2 !dbg !535 { %6 = alloca ptr, align 8 %7 = alloca i64, align 8 %8 = alloca ptr, align 8 %9 = alloca ptr, align 8 %10 = alloca ptr, align 8 %11 = alloca ptr, align 8 %12 = alloca i32, align 4 ... %20 = icmp ne ptr %19, null, !dbg !561 br i1 %20, label %22, label %21, !dbg !562 21: ; preds = %5 store i32 1, ptr %12, align 4 br label %48, !dbg !563 22: %23 = load ptr, ptr %9, align 8, !dbg !564 ... 47: ; preds = %38, %22 store i32 0, ptr %12, align 4, !dbg !588 br label %48, !dbg !588 48: ; preds = %47, %21 call void @llvm.lifetime.end.p0(ptr %11) #4, !dbg !588 %49 = load i32, ptr %12, align 4 switch i32 %49, label %51 [ i32 0, label %50 i32 1, label %50 ] 50: ; preds = %48, %48 ret void, !dbg !589 51: ; preds = %48 unreachable } Note that the above 'switch' statement is added by clang frontend. Without [1], the switch statement will survive until SelectionDag, so the switch statement acts like a 'barrier' and prevents some transformation involved with both 'before' and 'after' the switch statement. But with [1], the switch statement will be removed during middle end optimization and later middle end passes (esp. after inlining) have more freedom to reorder the code. The following is the related source code: static void *calc_location(struct strobe_value_loc *loc, void *tls_base): bpf_probe_read_user(&tls_ptr, sizeof(void *), dtv); /* if pointer has (void *)-1 value, then TLS wasn't initialized yet */ return tls_ptr && tls_ptr != (void *)-1 ? tls_ptr + tls_index.offset : NULL; In read_int_var() func, we have: void *location = calc_location(&cfg->int_locs[idx], tls_base); if (!location) return; bpf_probe_read_user(value, sizeof(struct strobe_value_generic), location); ... The static func calc_location() is called inside read_int_var(). The asm code without [1]: 77: .123....89 (85) call bpf_probe_read_user#112 78: ........89 (79) r1 = *(u64 *)(r10 -368) 79: .1......89 (79) r2 = *(u64 *)(r10 -8) 80: .12.....89 (bf) r3 = r2 81: .123....89 (0f) r3 += r1 82: ..23....89 (07) r2 += 1 83: ..23....89 (79) r4 = *(u64 *)(r10 -464) 84: ..234...89 (a5) if r2 < 0x2 goto pc+13 85: ...34...89 (15) if r3 == 0x0 goto pc+12 86: ...3....89 (bf) r1 = r10 87: .1.3....89 (07) r1 += -400 88: .1.3....89 (b4) w2 = 16 In this case, 'r2 < 0x2' and 'r3 == 0x0' go to null 'locaiton' place, so the verifier actually prefers to do verification first at 'r1 = r10' etc. The asm code with [1]: 119: .123....89 (85) call bpf_probe_read_user#112 120: ........89 (79) r1 = *(u64 *)(r10 -368) 121: .1......89 (79) r2 = *(u64 *)(r10 -8) 122: .12.....89 (bf) r3 = r2 123: .123....89 (0f) r3 += r1 124: ..23....89 (07) r2 += -1 125: ..23....89 (a5) if r2 < 0xfffffffe goto pc+6 126: ........89 (05) goto pc+17 ... 144: ........89 (b4) w1 = 0 145: .1......89 (6b) *(u16 *)(r8 +80) = r1 In this case, if 'r2 < 0xfffffffe' is true, the control will go to non-null 'location' branch, so 'goto pc+17' will actually go to null 'location' branch. This seems causing tremendous amount of verificaiton state. To fix the issue, rewrite the following code return tls_ptr && tls_ptr != (void *)-1 ? tls_ptr + tls_index.offset : NULL; to if/then statement and hopefully these explicit if/then statements are sticky during middle-end optimizations. Test with llvm20 and llvm21 as well and all strobemeta related selftests are passed. [1] llvm/llvm-project#161000 Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
kernel-patches-daemon-bpf-rc bot
pushed a commit
that referenced
this pull request
Oct 16, 2025
With latest llvm22, I hit the verif_scale_strobemeta selftest failure below: $ ./test_progs -n 618 libbpf: prog 'on_event': BPF program load failed: -E2BIG libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG -- BPF program is too large. Processed 1000001 insn verification time 7019091 usec stack depth 488 processed 1000001 insns (limit 1000000) max_states_per_insn 28 total_states 33927 peak_states 12813 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'on_event': failed to load: -E2BIG libbpf: failed to load object 'strobemeta.bpf.o' scale_test:FAIL:expect_success unexpected error: -7 (errno 7) #618 verif_scale_strobemeta:FAIL But if I increase the verificaiton insn limit from 1M to 10M, the above test_progs run actually will succeed. The below is the result from veristat: $ ./veristat strobemeta.bpf.o Processing 'strobemeta.bpf.o'... File Program Verdict Duration (us) Insns States Program size Jited size ---------------- -------- ------- ------------- ------- ------ ------------ ---------- strobemeta.bpf.o on_event success 90250893 9777685 358230 15954 80794 ---------------- -------- ------- ------------- ------- ------ ------------ ---------- Done. Processed 1 files, 0 programs. Skipped 1 files, 0 programs. Further debugging shows the llvm commit [1] is responsible for the verificaiton failure as it tries to convert certain switch statement to if-condition. Such change may cause different transformation compared to original switch statement. In bpf program strobemeta.c case, the initial llvm ir for read_int_var() function is define internal void @read_int_var(ptr noundef %0, i64 noundef %1, ptr noundef %2, ptr noundef %3, ptr noundef %4) #2 !dbg !535 { %6 = alloca ptr, align 8 %7 = alloca i64, align 8 %8 = alloca ptr, align 8 %9 = alloca ptr, align 8 %10 = alloca ptr, align 8 %11 = alloca ptr, align 8 %12 = alloca i32, align 4 ... %20 = icmp ne ptr %19, null, !dbg !561 br i1 %20, label %22, label %21, !dbg !562 21: ; preds = %5 store i32 1, ptr %12, align 4 br label %48, !dbg !563 22: %23 = load ptr, ptr %9, align 8, !dbg !564 ... 47: ; preds = %38, %22 store i32 0, ptr %12, align 4, !dbg !588 br label %48, !dbg !588 48: ; preds = %47, %21 call void @llvm.lifetime.end.p0(ptr %11) #4, !dbg !588 %49 = load i32, ptr %12, align 4 switch i32 %49, label %51 [ i32 0, label %50 i32 1, label %50 ] 50: ; preds = %48, %48 ret void, !dbg !589 51: ; preds = %48 unreachable } Note that the above 'switch' statement is added by clang frontend. Without [1], the switch statement will survive until SelectionDag, so the switch statement acts like a 'barrier' and prevents some transformation involved with both 'before' and 'after' the switch statement. But with [1], the switch statement will be removed during middle end optimization and later middle end passes (esp. after inlining) have more freedom to reorder the code. The following is the related source code: static void *calc_location(struct strobe_value_loc *loc, void *tls_base): bpf_probe_read_user(&tls_ptr, sizeof(void *), dtv); /* if pointer has (void *)-1 value, then TLS wasn't initialized yet */ return tls_ptr && tls_ptr != (void *)-1 ? tls_ptr + tls_index.offset : NULL; In read_int_var() func, we have: void *location = calc_location(&cfg->int_locs[idx], tls_base); if (!location) return; bpf_probe_read_user(value, sizeof(struct strobe_value_generic), location); ... The static func calc_location() is called inside read_int_var(). The asm code without [1]: 77: .123....89 (85) call bpf_probe_read_user#112 78: ........89 (79) r1 = *(u64 *)(r10 -368) 79: .1......89 (79) r2 = *(u64 *)(r10 -8) 80: .12.....89 (bf) r3 = r2 81: .123....89 (0f) r3 += r1 82: ..23....89 (07) r2 += 1 83: ..23....89 (79) r4 = *(u64 *)(r10 -464) 84: ..234...89 (a5) if r2 < 0x2 goto pc+13 85: ...34...89 (15) if r3 == 0x0 goto pc+12 86: ...3....89 (bf) r1 = r10 87: .1.3....89 (07) r1 += -400 88: .1.3....89 (b4) w2 = 16 In this case, 'r2 < 0x2' and 'r3 == 0x0' go to null 'locaiton' place, so the verifier actually prefers to do verification first at 'r1 = r10' etc. The asm code with [1]: 119: .123....89 (85) call bpf_probe_read_user#112 120: ........89 (79) r1 = *(u64 *)(r10 -368) 121: .1......89 (79) r2 = *(u64 *)(r10 -8) 122: .12.....89 (bf) r3 = r2 123: .123....89 (0f) r3 += r1 124: ..23....89 (07) r2 += -1 125: ..23....89 (a5) if r2 < 0xfffffffe goto pc+6 126: ........89 (05) goto pc+17 ... 144: ........89 (b4) w1 = 0 145: .1......89 (6b) *(u16 *)(r8 +80) = r1 In this case, if 'r2 < 0xfffffffe' is true, the control will go to non-null 'location' branch, so 'goto pc+17' will actually go to null 'location' branch. This seems causing tremendous amount of verificaiton state. To fix the issue, rewrite the following code return tls_ptr && tls_ptr != (void *)-1 ? tls_ptr + tls_index.offset : NULL; to if/then statement and hopefully these explicit if/then statements are sticky during middle-end optimizations. Test with llvm20 and llvm21 as well and all strobemeta related selftests are passed. [1] llvm/llvm-project#161000 Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
kernel-patches-daemon-bpf-rc bot
pushed a commit
that referenced
this pull request
Oct 17, 2025
With latest llvm22, I hit the verif_scale_strobemeta selftest failure below: $ ./test_progs -n 618 libbpf: prog 'on_event': BPF program load failed: -E2BIG libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG -- BPF program is too large. Processed 1000001 insn verification time 7019091 usec stack depth 488 processed 1000001 insns (limit 1000000) max_states_per_insn 28 total_states 33927 peak_states 12813 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'on_event': failed to load: -E2BIG libbpf: failed to load object 'strobemeta.bpf.o' scale_test:FAIL:expect_success unexpected error: -7 (errno 7) #618 verif_scale_strobemeta:FAIL But if I increase the verificaiton insn limit from 1M to 10M, the above test_progs run actually will succeed. The below is the result from veristat: $ ./veristat strobemeta.bpf.o Processing 'strobemeta.bpf.o'... File Program Verdict Duration (us) Insns States Program size Jited size ---------------- -------- ------- ------------- ------- ------ ------------ ---------- strobemeta.bpf.o on_event success 90250893 9777685 358230 15954 80794 ---------------- -------- ------- ------------- ------- ------ ------------ ---------- Done. Processed 1 files, 0 programs. Skipped 1 files, 0 programs. Further debugging shows the llvm commit [1] is responsible for the verificaiton failure as it tries to convert certain switch statement to if-condition. Such change may cause different transformation compared to original switch statement. In bpf program strobemeta.c case, the initial llvm ir for read_int_var() function is define internal void @read_int_var(ptr noundef %0, i64 noundef %1, ptr noundef %2, ptr noundef %3, ptr noundef %4) #2 !dbg !535 { %6 = alloca ptr, align 8 %7 = alloca i64, align 8 %8 = alloca ptr, align 8 %9 = alloca ptr, align 8 %10 = alloca ptr, align 8 %11 = alloca ptr, align 8 %12 = alloca i32, align 4 ... %20 = icmp ne ptr %19, null, !dbg !561 br i1 %20, label %22, label %21, !dbg !562 21: ; preds = %5 store i32 1, ptr %12, align 4 br label %48, !dbg !563 22: %23 = load ptr, ptr %9, align 8, !dbg !564 ... 47: ; preds = %38, %22 store i32 0, ptr %12, align 4, !dbg !588 br label %48, !dbg !588 48: ; preds = %47, %21 call void @llvm.lifetime.end.p0(ptr %11) #4, !dbg !588 %49 = load i32, ptr %12, align 4 switch i32 %49, label %51 [ i32 0, label %50 i32 1, label %50 ] 50: ; preds = %48, %48 ret void, !dbg !589 51: ; preds = %48 unreachable } Note that the above 'switch' statement is added by clang frontend. Without [1], the switch statement will survive until SelectionDag, so the switch statement acts like a 'barrier' and prevents some transformation involved with both 'before' and 'after' the switch statement. But with [1], the switch statement will be removed during middle end optimization and later middle end passes (esp. after inlining) have more freedom to reorder the code. The following is the related source code: static void *calc_location(struct strobe_value_loc *loc, void *tls_base): bpf_probe_read_user(&tls_ptr, sizeof(void *), dtv); /* if pointer has (void *)-1 value, then TLS wasn't initialized yet */ return tls_ptr && tls_ptr != (void *)-1 ? tls_ptr + tls_index.offset : NULL; In read_int_var() func, we have: void *location = calc_location(&cfg->int_locs[idx], tls_base); if (!location) return; bpf_probe_read_user(value, sizeof(struct strobe_value_generic), location); ... The static func calc_location() is called inside read_int_var(). The asm code without [1]: 77: .123....89 (85) call bpf_probe_read_user#112 78: ........89 (79) r1 = *(u64 *)(r10 -368) 79: .1......89 (79) r2 = *(u64 *)(r10 -8) 80: .12.....89 (bf) r3 = r2 81: .123....89 (0f) r3 += r1 82: ..23....89 (07) r2 += 1 83: ..23....89 (79) r4 = *(u64 *)(r10 -464) 84: ..234...89 (a5) if r2 < 0x2 goto pc+13 85: ...34...89 (15) if r3 == 0x0 goto pc+12 86: ...3....89 (bf) r1 = r10 87: .1.3....89 (07) r1 += -400 88: .1.3....89 (b4) w2 = 16 In this case, 'r2 < 0x2' and 'r3 == 0x0' go to null 'locaiton' place, so the verifier actually prefers to do verification first at 'r1 = r10' etc. The asm code with [1]: 119: .123....89 (85) call bpf_probe_read_user#112 120: ........89 (79) r1 = *(u64 *)(r10 -368) 121: .1......89 (79) r2 = *(u64 *)(r10 -8) 122: .12.....89 (bf) r3 = r2 123: .123....89 (0f) r3 += r1 124: ..23....89 (07) r2 += -1 125: ..23....89 (a5) if r2 < 0xfffffffe goto pc+6 126: ........89 (05) goto pc+17 ... 144: ........89 (b4) w1 = 0 145: .1......89 (6b) *(u16 *)(r8 +80) = r1 In this case, if 'r2 < 0xfffffffe' is true, the control will go to non-null 'location' branch, so 'goto pc+17' will actually go to null 'location' branch. This seems causing tremendous amount of verificaiton state. To fix the issue, rewrite the following code return tls_ptr && tls_ptr != (void *)-1 ? tls_ptr + tls_index.offset : NULL; to if/then statement and hopefully these explicit if/then statements are sticky during middle-end optimizations. Test with llvm20 and llvm21 as well and all strobemeta related selftests are passed. [1] llvm/llvm-project#161000 Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
kernel-patches-daemon-bpf-rc bot
pushed a commit
that referenced
this pull request
Oct 19, 2025
With latest llvm22, I hit the verif_scale_strobemeta selftest failure below: $ ./test_progs -n 618 libbpf: prog 'on_event': BPF program load failed: -E2BIG libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG -- BPF program is too large. Processed 1000001 insn verification time 7019091 usec stack depth 488 processed 1000001 insns (limit 1000000) max_states_per_insn 28 total_states 33927 peak_states 12813 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'on_event': failed to load: -E2BIG libbpf: failed to load object 'strobemeta.bpf.o' scale_test:FAIL:expect_success unexpected error: -7 (errno 7) #618 verif_scale_strobemeta:FAIL But if I increase the verificaiton insn limit from 1M to 10M, the above test_progs run actually will succeed. The below is the result from veristat: $ ./veristat strobemeta.bpf.o Processing 'strobemeta.bpf.o'... File Program Verdict Duration (us) Insns States Program size Jited size ---------------- -------- ------- ------------- ------- ------ ------------ ---------- strobemeta.bpf.o on_event success 90250893 9777685 358230 15954 80794 ---------------- -------- ------- ------------- ------- ------ ------------ ---------- Done. Processed 1 files, 0 programs. Skipped 1 files, 0 programs. Further debugging shows the llvm commit [1] is responsible for the verificaiton failure as it tries to convert certain switch statement to if-condition. Such change may cause different transformation compared to original switch statement. In bpf program strobemeta.c case, the initial llvm ir for read_int_var() function is define internal void @read_int_var(ptr noundef %0, i64 noundef %1, ptr noundef %2, ptr noundef %3, ptr noundef %4) #2 !dbg !535 { %6 = alloca ptr, align 8 %7 = alloca i64, align 8 %8 = alloca ptr, align 8 %9 = alloca ptr, align 8 %10 = alloca ptr, align 8 %11 = alloca ptr, align 8 %12 = alloca i32, align 4 ... %20 = icmp ne ptr %19, null, !dbg !561 br i1 %20, label %22, label %21, !dbg !562 21: ; preds = %5 store i32 1, ptr %12, align 4 br label %48, !dbg !563 22: %23 = load ptr, ptr %9, align 8, !dbg !564 ... 47: ; preds = %38, %22 store i32 0, ptr %12, align 4, !dbg !588 br label %48, !dbg !588 48: ; preds = %47, %21 call void @llvm.lifetime.end.p0(ptr %11) #4, !dbg !588 %49 = load i32, ptr %12, align 4 switch i32 %49, label %51 [ i32 0, label %50 i32 1, label %50 ] 50: ; preds = %48, %48 ret void, !dbg !589 51: ; preds = %48 unreachable } Note that the above 'switch' statement is added by clang frontend. Without [1], the switch statement will survive until SelectionDag, so the switch statement acts like a 'barrier' and prevents some transformation involved with both 'before' and 'after' the switch statement. But with [1], the switch statement will be removed during middle end optimization and later middle end passes (esp. after inlining) have more freedom to reorder the code. The following is the related source code: static void *calc_location(struct strobe_value_loc *loc, void *tls_base): bpf_probe_read_user(&tls_ptr, sizeof(void *), dtv); /* if pointer has (void *)-1 value, then TLS wasn't initialized yet */ return tls_ptr && tls_ptr != (void *)-1 ? tls_ptr + tls_index.offset : NULL; In read_int_var() func, we have: void *location = calc_location(&cfg->int_locs[idx], tls_base); if (!location) return; bpf_probe_read_user(value, sizeof(struct strobe_value_generic), location); ... The static func calc_location() is called inside read_int_var(). The asm code without [1]: 77: .123....89 (85) call bpf_probe_read_user#112 78: ........89 (79) r1 = *(u64 *)(r10 -368) 79: .1......89 (79) r2 = *(u64 *)(r10 -8) 80: .12.....89 (bf) r3 = r2 81: .123....89 (0f) r3 += r1 82: ..23....89 (07) r2 += 1 83: ..23....89 (79) r4 = *(u64 *)(r10 -464) 84: ..234...89 (a5) if r2 < 0x2 goto pc+13 85: ...34...89 (15) if r3 == 0x0 goto pc+12 86: ...3....89 (bf) r1 = r10 87: .1.3....89 (07) r1 += -400 88: .1.3....89 (b4) w2 = 16 In this case, 'r2 < 0x2' and 'r3 == 0x0' go to null 'locaiton' place, so the verifier actually prefers to do verification first at 'r1 = r10' etc. The asm code with [1]: 119: .123....89 (85) call bpf_probe_read_user#112 120: ........89 (79) r1 = *(u64 *)(r10 -368) 121: .1......89 (79) r2 = *(u64 *)(r10 -8) 122: .12.....89 (bf) r3 = r2 123: .123....89 (0f) r3 += r1 124: ..23....89 (07) r2 += -1 125: ..23....89 (a5) if r2 < 0xfffffffe goto pc+6 126: ........89 (05) goto pc+17 ... 144: ........89 (b4) w1 = 0 145: .1......89 (6b) *(u16 *)(r8 +80) = r1 In this case, if 'r2 < 0xfffffffe' is true, the control will go to non-null 'location' branch, so 'goto pc+17' will actually go to null 'location' branch. This seems causing tremendous amount of verificaiton state. To fix the issue, rewrite the following code return tls_ptr && tls_ptr != (void *)-1 ? tls_ptr + tls_index.offset : NULL; to if/then statement and hopefully these explicit if/then statements are sticky during middle-end optimizations. Test with llvm20 and llvm21 as well and all strobemeta related selftests are passed. [1] llvm/llvm-project#161000 Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
kernel-patches-daemon-bpf-rc bot
pushed a commit
that referenced
this pull request
Oct 19, 2025
With latest llvm22, I hit the verif_scale_strobemeta selftest failure below: $ ./test_progs -n 618 libbpf: prog 'on_event': BPF program load failed: -E2BIG libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG -- BPF program is too large. Processed 1000001 insn verification time 7019091 usec stack depth 488 processed 1000001 insns (limit 1000000) max_states_per_insn 28 total_states 33927 peak_states 12813 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'on_event': failed to load: -E2BIG libbpf: failed to load object 'strobemeta.bpf.o' scale_test:FAIL:expect_success unexpected error: -7 (errno 7) #618 verif_scale_strobemeta:FAIL But if I increase the verificaiton insn limit from 1M to 10M, the above test_progs run actually will succeed. The below is the result from veristat: $ ./veristat strobemeta.bpf.o Processing 'strobemeta.bpf.o'... File Program Verdict Duration (us) Insns States Program size Jited size ---------------- -------- ------- ------------- ------- ------ ------------ ---------- strobemeta.bpf.o on_event success 90250893 9777685 358230 15954 80794 ---------------- -------- ------- ------------- ------- ------ ------------ ---------- Done. Processed 1 files, 0 programs. Skipped 1 files, 0 programs. Further debugging shows the llvm commit [1] is responsible for the verificaiton failure as it tries to convert certain switch statement to if-condition. Such change may cause different transformation compared to original switch statement. In bpf program strobemeta.c case, the initial llvm ir for read_int_var() function is define internal void @read_int_var(ptr noundef %0, i64 noundef %1, ptr noundef %2, ptr noundef %3, ptr noundef %4) #2 !dbg !535 { %6 = alloca ptr, align 8 %7 = alloca i64, align 8 %8 = alloca ptr, align 8 %9 = alloca ptr, align 8 %10 = alloca ptr, align 8 %11 = alloca ptr, align 8 %12 = alloca i32, align 4 ... %20 = icmp ne ptr %19, null, !dbg !561 br i1 %20, label %22, label %21, !dbg !562 21: ; preds = %5 store i32 1, ptr %12, align 4 br label %48, !dbg !563 22: %23 = load ptr, ptr %9, align 8, !dbg !564 ... 47: ; preds = %38, %22 store i32 0, ptr %12, align 4, !dbg !588 br label %48, !dbg !588 48: ; preds = %47, %21 call void @llvm.lifetime.end.p0(ptr %11) #4, !dbg !588 %49 = load i32, ptr %12, align 4 switch i32 %49, label %51 [ i32 0, label %50 i32 1, label %50 ] 50: ; preds = %48, %48 ret void, !dbg !589 51: ; preds = %48 unreachable } Note that the above 'switch' statement is added by clang frontend. Without [1], the switch statement will survive until SelectionDag, so the switch statement acts like a 'barrier' and prevents some transformation involved with both 'before' and 'after' the switch statement. But with [1], the switch statement will be removed during middle end optimization and later middle end passes (esp. after inlining) have more freedom to reorder the code. The following is the related source code: static void *calc_location(struct strobe_value_loc *loc, void *tls_base): bpf_probe_read_user(&tls_ptr, sizeof(void *), dtv); /* if pointer has (void *)-1 value, then TLS wasn't initialized yet */ return tls_ptr && tls_ptr != (void *)-1 ? tls_ptr + tls_index.offset : NULL; In read_int_var() func, we have: void *location = calc_location(&cfg->int_locs[idx], tls_base); if (!location) return; bpf_probe_read_user(value, sizeof(struct strobe_value_generic), location); ... The static func calc_location() is called inside read_int_var(). The asm code without [1]: 77: .123....89 (85) call bpf_probe_read_user#112 78: ........89 (79) r1 = *(u64 *)(r10 -368) 79: .1......89 (79) r2 = *(u64 *)(r10 -8) 80: .12.....89 (bf) r3 = r2 81: .123....89 (0f) r3 += r1 82: ..23....89 (07) r2 += 1 83: ..23....89 (79) r4 = *(u64 *)(r10 -464) 84: ..234...89 (a5) if r2 < 0x2 goto pc+13 85: ...34...89 (15) if r3 == 0x0 goto pc+12 86: ...3....89 (bf) r1 = r10 87: .1.3....89 (07) r1 += -400 88: .1.3....89 (b4) w2 = 16 In this case, 'r2 < 0x2' and 'r3 == 0x0' go to null 'locaiton' place, so the verifier actually prefers to do verification first at 'r1 = r10' etc. The asm code with [1]: 119: .123....89 (85) call bpf_probe_read_user#112 120: ........89 (79) r1 = *(u64 *)(r10 -368) 121: .1......89 (79) r2 = *(u64 *)(r10 -8) 122: .12.....89 (bf) r3 = r2 123: .123....89 (0f) r3 += r1 124: ..23....89 (07) r2 += -1 125: ..23....89 (a5) if r2 < 0xfffffffe goto pc+6 126: ........89 (05) goto pc+17 ... 144: ........89 (b4) w1 = 0 145: .1......89 (6b) *(u16 *)(r8 +80) = r1 In this case, if 'r2 < 0xfffffffe' is true, the control will go to non-null 'location' branch, so 'goto pc+17' will actually go to null 'location' branch. This seems causing tremendous amount of verificaiton state. To fix the issue, rewrite the following code return tls_ptr && tls_ptr != (void *)-1 ? tls_ptr + tls_index.offset : NULL; to if/then statement and hopefully these explicit if/then statements are sticky during middle-end optimizations. Test with llvm20 and llvm21 as well and all strobemeta related selftests are passed. [1] llvm/llvm-project#161000 Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
kernel-patches-daemon-bpf-rc bot
pushed a commit
that referenced
this pull request
Oct 19, 2025
With latest llvm22, I hit the verif_scale_strobemeta selftest failure below: $ ./test_progs -n 618 libbpf: prog 'on_event': BPF program load failed: -E2BIG libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG -- BPF program is too large. Processed 1000001 insn verification time 7019091 usec stack depth 488 processed 1000001 insns (limit 1000000) max_states_per_insn 28 total_states 33927 peak_states 12813 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'on_event': failed to load: -E2BIG libbpf: failed to load object 'strobemeta.bpf.o' scale_test:FAIL:expect_success unexpected error: -7 (errno 7) #618 verif_scale_strobemeta:FAIL But if I increase the verificaiton insn limit from 1M to 10M, the above test_progs run actually will succeed. The below is the result from veristat: $ ./veristat strobemeta.bpf.o Processing 'strobemeta.bpf.o'... File Program Verdict Duration (us) Insns States Program size Jited size ---------------- -------- ------- ------------- ------- ------ ------------ ---------- strobemeta.bpf.o on_event success 90250893 9777685 358230 15954 80794 ---------------- -------- ------- ------------- ------- ------ ------------ ---------- Done. Processed 1 files, 0 programs. Skipped 1 files, 0 programs. Further debugging shows the llvm commit [1] is responsible for the verificaiton failure as it tries to convert certain switch statement to if-condition. Such change may cause different transformation compared to original switch statement. In bpf program strobemeta.c case, the initial llvm ir for read_int_var() function is define internal void @read_int_var(ptr noundef %0, i64 noundef %1, ptr noundef %2, ptr noundef %3, ptr noundef %4) #2 !dbg !535 { %6 = alloca ptr, align 8 %7 = alloca i64, align 8 %8 = alloca ptr, align 8 %9 = alloca ptr, align 8 %10 = alloca ptr, align 8 %11 = alloca ptr, align 8 %12 = alloca i32, align 4 ... %20 = icmp ne ptr %19, null, !dbg !561 br i1 %20, label %22, label %21, !dbg !562 21: ; preds = %5 store i32 1, ptr %12, align 4 br label %48, !dbg !563 22: %23 = load ptr, ptr %9, align 8, !dbg !564 ... 47: ; preds = %38, %22 store i32 0, ptr %12, align 4, !dbg !588 br label %48, !dbg !588 48: ; preds = %47, %21 call void @llvm.lifetime.end.p0(ptr %11) #4, !dbg !588 %49 = load i32, ptr %12, align 4 switch i32 %49, label %51 [ i32 0, label %50 i32 1, label %50 ] 50: ; preds = %48, %48 ret void, !dbg !589 51: ; preds = %48 unreachable } Note that the above 'switch' statement is added by clang frontend. Without [1], the switch statement will survive until SelectionDag, so the switch statement acts like a 'barrier' and prevents some transformation involved with both 'before' and 'after' the switch statement. But with [1], the switch statement will be removed during middle end optimization and later middle end passes (esp. after inlining) have more freedom to reorder the code. The following is the related source code: static void *calc_location(struct strobe_value_loc *loc, void *tls_base): bpf_probe_read_user(&tls_ptr, sizeof(void *), dtv); /* if pointer has (void *)-1 value, then TLS wasn't initialized yet */ return tls_ptr && tls_ptr != (void *)-1 ? tls_ptr + tls_index.offset : NULL; In read_int_var() func, we have: void *location = calc_location(&cfg->int_locs[idx], tls_base); if (!location) return; bpf_probe_read_user(value, sizeof(struct strobe_value_generic), location); ... The static func calc_location() is called inside read_int_var(). The asm code without [1]: 77: .123....89 (85) call bpf_probe_read_user#112 78: ........89 (79) r1 = *(u64 *)(r10 -368) 79: .1......89 (79) r2 = *(u64 *)(r10 -8) 80: .12.....89 (bf) r3 = r2 81: .123....89 (0f) r3 += r1 82: ..23....89 (07) r2 += 1 83: ..23....89 (79) r4 = *(u64 *)(r10 -464) 84: ..234...89 (a5) if r2 < 0x2 goto pc+13 85: ...34...89 (15) if r3 == 0x0 goto pc+12 86: ...3....89 (bf) r1 = r10 87: .1.3....89 (07) r1 += -400 88: .1.3....89 (b4) w2 = 16 In this case, 'r2 < 0x2' and 'r3 == 0x0' go to null 'locaiton' place, so the verifier actually prefers to do verification first at 'r1 = r10' etc. The asm code with [1]: 119: .123....89 (85) call bpf_probe_read_user#112 120: ........89 (79) r1 = *(u64 *)(r10 -368) 121: .1......89 (79) r2 = *(u64 *)(r10 -8) 122: .12.....89 (bf) r3 = r2 123: .123....89 (0f) r3 += r1 124: ..23....89 (07) r2 += -1 125: ..23....89 (a5) if r2 < 0xfffffffe goto pc+6 126: ........89 (05) goto pc+17 ... 144: ........89 (b4) w1 = 0 145: .1......89 (6b) *(u16 *)(r8 +80) = r1 In this case, if 'r2 < 0xfffffffe' is true, the control will go to non-null 'location' branch, so 'goto pc+17' will actually go to null 'location' branch. This seems causing tremendous amount of verificaiton state. To fix the issue, rewrite the following code return tls_ptr && tls_ptr != (void *)-1 ? tls_ptr + tls_index.offset : NULL; to if/then statement and hopefully these explicit if/then statements are sticky during middle-end optimizations. Test with llvm20 and llvm21 as well and all strobemeta related selftests are passed. [1] llvm/llvm-project#161000 Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20251014051639.1996331-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull request for series with
subject: bpf: Don't redirect packets with pkt_len 0
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=651294