-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bench: fast in-kernel triggering benchmarks #6616
Conversation
Upstream branch: 4c2a26f |
afd5f24
to
bb08139
Compare
Upstream branch: af8d27b |
451d0f6
to
29fa441
Compare
bb08139
to
d9a2513
Compare
Upstream branch: af8d27b |
29fa441
to
02ddb9d
Compare
d9a2513
to
12747d1
Compare
Upstream branch: 61df575 |
02ddb9d
to
640391a
Compare
12747d1
to
f0f6efd
Compare
Upstream branch: 476a5e9 |
640391a
to
ee05ab6
Compare
f0f6efd
to
a8f0c14
Compare
Upstream branch: a849750 |
ee05ab6
to
d28abd0
Compare
a8f0c14
to
283e9c4
Compare
Upstream branch: 770546a |
d28abd0
to
152fa34
Compare
283e9c4
to
a4e5017
Compare
At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=837193 expired. Closing PR. |
a4e5017
to
388b228
Compare
Upstream branch: e0754c1 |
359d901
to
302d976
Compare
388b228
to
367dff7
Compare
Upstream branch: 14bb1e8 |
302d976
to
dad662b
Compare
367dff7
to
73732be
Compare
Upstream branch: 2a702c2 |
dad662b
to
833e090
Compare
73732be
to
8d01f9b
Compare
Upstream branch: 2a720cc |
833e090
to
1535650
Compare
8d01f9b
to
0609f1e
Compare
Rename uprobe-base to more precise usermode-count (it will match other baseline-like benchmarks, kernel-count and syscall-count). Also use BENCH_TRIG_USERMODE() macro to define all usermode-based triggering benchmarks, which include usermode-count and uprobe/uretprobe benchmarks. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Existing kprobe/fentry triggering benchmarks have 1-to-1 mapping between one syscall execution and BPF program run. While we use a fast get_pgid() syscall, syscall overhead can still be non-trivial. This patch adds kprobe/fentry set of benchmarks significantly amortizing the cost of syscall vs actual BPF triggering overhead. We do this by employing BPF_PROG_TEST_RUN command to trigger "driver" raw_tp program which does a tight parameterized loop calling cheap BPF helper (bpf_get_numa_node_id()), to which kprobe/fentry programs are attached for benchmarking. This way 1 bpf() syscall causes N executions of BPF program being benchmarked. N defaults to 100, but can be adjusted with --trig-batch-iters CLI argument. For comparison we also implement a new baseline program that instead of triggering another BPF program just does N atomic per-CPU counter increments, establishing the limit for all other types of program within this batched benchmarking setup. Taking the final set of benchmarks added in this patch set (including tp/raw_tp/fmodret, added in later patch), and keeping for now "legacy" syscall-driven benchmarks, we can capture all triggering benchmarks in one place for comparison, before we remove the legacy ones (and rename xxx-batched into just xxx). $ benchs/run_bench_trigger.sh usermode-count : 79.500 ± 0.024M/s kernel-count : 49.949 ± 0.081M/s syscall-count : 9.009 ± 0.007M/s fentry-batch : 31.002 ± 0.015M/s fexit-batch : 20.372 ± 0.028M/s fmodret-batch : 21.651 ± 0.659M/s rawtp-batch : 36.775 ± 0.264M/s tp-batch : 19.411 ± 0.248M/s kprobe-batch : 12.949 ± 0.220M/s kprobe-multi-batch : 15.400 ± 0.007M/s kretprobe-batch : 5.559 ± 0.011M/s kretprobe-multi-batch: 5.861 ± 0.003M/s fentry-legacy : 8.329 ± 0.004M/s fexit-legacy : 6.239 ± 0.003M/s fmodret-legacy : 6.595 ± 0.001M/s rawtp-legacy : 8.305 ± 0.004M/s tp-legacy : 6.382 ± 0.001M/s kprobe-legacy : 5.528 ± 0.003M/s kprobe-multi-legacy : 5.864 ± 0.022M/s kretprobe-legacy : 3.081 ± 0.001M/s kretprobe-multi-legacy: 3.193 ± 0.001M/s Note how xxx-batch variants are measured with significantly higher throughput, even though it's exactly the same in-kernel overhead. As such, results can be compared only between benchmarks of the same kind (syscall vs batched): fentry-legacy : 8.329 ± 0.004M/s fentry-batch : 31.002 ± 0.015M/s kprobe-multi-legacy : 5.864 ± 0.022M/s kprobe-multi-batch : 15.400 ± 0.007M/s Note also that syscall-count is setting a theoretical limit for syscall-triggered benchmarks, while kernel-count is setting similar limits for batch variants. usermode-count is a happy and unachievable case of user space counting without doing any syscalls, and is mostly the measure of CPU speed for such a trivial benchmark. As was mentioned, tp/raw_tp/fmodret require kernel-side kfunc to produce similar benchmark, which we address in a separate patch. Note that run_bench_trigger.sh allows to override a list of benchmarks to run, which is very useful for performance work. Cc: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Remove "legacy" benchmarks triggered by syscalls in favor of newly added in-kernel/batched benchmarks. Drop -batched suffix now as well. Next patch will restore "feature parity" by adding back tp/raw_tp/fmodret benchmarks based on in-kernel kfunc approach. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Instead of front-loading all possible benchmarking BPF programs for trigger benchmarks, explicitly specify which BPF programs are used by specific benchmark and load only it. This allows to be more flexible in supporting older kernels, where some program types might not be possible to load (e.g., those that rely on newly added kfunc). Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Add a simple bpf_modify_return_test_tp() kfunc, available to all program types, that is useful for various testing and benchmarking scenarios, as it allows to trigger most tracing BPF program types from BPF side, allowing to do complex testing and benchmarking scenarios. It is also attachable to for fmod_ret programs, making it a good and simple way to trigger fmod_ret program under test/benchmark. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Utilize bpf_modify_return_test_tp() kfunc to have a fast way to trigger tp/raw_tp/fmodret programs from another BPF program, which gives us comparable batched benchmarks to (batched) kprobe/fentry benchmarks. We don't switch kprobe/fentry batched benchmarks to this kfunc to make bench tool usable on older kernels as well. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Upstream branch: 45a683b |
1535650
to
4183c02
Compare
0609f1e
to
8b25225
Compare
At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=838467 irrelevant now. Closing PR. |
Pull request for series with
subject: bench: fast in-kernel triggering benchmarks
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=837193