Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bench: fast in-kernel triggering benchmarks #6616

Closed

Conversation

kernel-patches-daemon-bpf[bot]
Copy link

Pull request for series with
subject: bench: fast in-kernel triggering benchmarks
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=837193

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 4c2a26f
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=837193
version: 1

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: af8d27b
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=837193
version: 1

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: af8d27b
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=837193
version: 1

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 61df575
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=837193
version: 1

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 476a5e9
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=837193
version: 1

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: a849750
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=837193
version: 1

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 770546a
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=837193
version: 1

@kernel-patches-daemon-bpf
Copy link
Author

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=837193 expired. Closing PR.

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: e0754c1
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=838467
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 14bb1e8
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=838467
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 2a702c2
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=838467
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 2a720cc
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=838467
version: 2

Rename uprobe-base to more precise usermode-count (it will match other
baseline-like benchmarks, kernel-count and syscall-count). Also use
BENCH_TRIG_USERMODE() macro to define all usermode-based triggering
benchmarks, which include usermode-count and uprobe/uretprobe benchmarks.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Existing kprobe/fentry triggering benchmarks have 1-to-1 mapping between
one syscall execution and BPF program run. While we use a fast
get_pgid() syscall, syscall overhead can still be non-trivial.

This patch adds kprobe/fentry set of benchmarks significantly amortizing
the cost of syscall vs actual BPF triggering overhead. We do this by
employing BPF_PROG_TEST_RUN command to trigger "driver" raw_tp program
which does a tight parameterized loop calling cheap BPF helper
(bpf_get_numa_node_id()), to which kprobe/fentry programs are
attached for benchmarking.

This way 1 bpf() syscall causes N executions of BPF program being
benchmarked. N defaults to 100, but can be adjusted with
--trig-batch-iters CLI argument.

For comparison we also implement a new baseline program that instead of
triggering another BPF program just does N atomic per-CPU counter
increments, establishing the limit for all other types of program within
this batched benchmarking setup.

Taking the final set of benchmarks added in this patch set (including
tp/raw_tp/fmodret, added in later patch), and keeping for now "legacy"
syscall-driven benchmarks, we can capture all triggering benchmarks in
one place for comparison, before we remove the legacy ones (and rename
xxx-batched into just xxx).

$ benchs/run_bench_trigger.sh
usermode-count       :   79.500 ± 0.024M/s
kernel-count         :   49.949 ± 0.081M/s
syscall-count        :    9.009 ± 0.007M/s

fentry-batch         :   31.002 ± 0.015M/s
fexit-batch          :   20.372 ± 0.028M/s
fmodret-batch        :   21.651 ± 0.659M/s
rawtp-batch          :   36.775 ± 0.264M/s
tp-batch             :   19.411 ± 0.248M/s
kprobe-batch         :   12.949 ± 0.220M/s
kprobe-multi-batch   :   15.400 ± 0.007M/s
kretprobe-batch      :    5.559 ± 0.011M/s
kretprobe-multi-batch:    5.861 ± 0.003M/s

fentry-legacy        :    8.329 ± 0.004M/s
fexit-legacy         :    6.239 ± 0.003M/s
fmodret-legacy       :    6.595 ± 0.001M/s
rawtp-legacy         :    8.305 ± 0.004M/s
tp-legacy            :    6.382 ± 0.001M/s
kprobe-legacy        :    5.528 ± 0.003M/s
kprobe-multi-legacy  :    5.864 ± 0.022M/s
kretprobe-legacy     :    3.081 ± 0.001M/s
kretprobe-multi-legacy:   3.193 ± 0.001M/s

Note how xxx-batch variants are measured with significantly higher
throughput, even though it's exactly the same in-kernel overhead. As
such, results can be compared only between benchmarks of the same kind
(syscall vs batched):

fentry-legacy        :    8.329 ± 0.004M/s
fentry-batch         :   31.002 ± 0.015M/s

kprobe-multi-legacy  :    5.864 ± 0.022M/s
kprobe-multi-batch   :   15.400 ± 0.007M/s

Note also that syscall-count is setting a theoretical limit for
syscall-triggered benchmarks, while kernel-count is setting similar
limits for batch variants. usermode-count is a happy and unachievable
case of user space counting without doing any syscalls, and is mostly
the measure of CPU speed for such a trivial benchmark.

As was mentioned, tp/raw_tp/fmodret require kernel-side kfunc to produce
similar benchmark, which we address in a separate patch.

Note that run_bench_trigger.sh allows to override a list of benchmarks
to run, which is very useful for performance work.

Cc: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Remove "legacy" benchmarks triggered by syscalls in favor of newly added
in-kernel/batched benchmarks. Drop -batched suffix now as well.
Next patch will restore "feature parity" by adding back
tp/raw_tp/fmodret benchmarks based on in-kernel kfunc approach.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Instead of front-loading all possible benchmarking BPF programs for
trigger benchmarks, explicitly specify which BPF programs are used by
specific benchmark and load only it.

This allows to be more flexible in supporting older kernels, where some
program types might not be possible to load (e.g., those that rely on
newly added kfunc).

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Add a simple bpf_modify_return_test_tp() kfunc, available to all program
types, that is useful for various testing and benchmarking scenarios, as
it allows to trigger most tracing BPF program types from BPF side,
allowing to do complex testing and benchmarking scenarios.

It is also attachable to for fmod_ret programs, making it a good and
simple way to trigger fmod_ret program under test/benchmark.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Utilize bpf_modify_return_test_tp() kfunc to have a fast way to trigger
tp/raw_tp/fmodret programs from another BPF program, which gives us
comparable batched benchmarks to (batched) kprobe/fentry benchmarks.

We don't switch kprobe/fentry batched benchmarks to this kfunc to make
bench tool usable on older kernels as well.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 45a683b
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=838467
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=838467 irrelevant now. Closing PR.

@kernel-patches-daemon-bpf kernel-patches-daemon-bpf bot deleted the series/837193=>bpf-next branch March 27, 2024 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant