bpf: rstat: cgroup hierarchical stats #3421

kernel-patches-bot · 2022-08-01T18:03:39Z

Pull request for series with
subject: bpf: rstat: cgroup hierarchical stats
version: 6
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=664557

kernel-patches-bot · 2022-08-01T18:03:40Z

Master branch: 7193084
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=664557
version: 6

…able From: Benjamin Tissoires <benjamin.tissoires@redhat.com> This allows to declare a kfunc as sleepable and prevents its use in a non sleepable program. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Co-developed-by: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Hao Luo <haoluo@google.com>

From: Yosry Ahmed <yosryahmed@google.com> cgroup_get_from_file() currently fails with -EBADF if called on cgroup v1. However, the current implementation works on cgroup v1 as well, so the restriction is unnecessary. This enabled cgroup_get_from_fd() to work on cgroup v1, which would be the only thing stopping bpf cgroup_iter from supporting cgroup v1. Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Hao Luo <haoluo@google.com>

In bpf_seq_read, seq->op->next() could return an ERR and jump to the label stop. However, the existing code in stop does not handle the case when p (returned from next()) is an ERR. Adds the handling of ERR of p by converting p into an error and jumping to done. Because all the current implementations do not have a case that returns ERR from next(), so this patch doesn't have behavior changes right now. Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Hao Luo <haoluo@google.com>

Cgroup_iter is a type of bpf_iter. It walks over cgroups in three modes: - walking a cgroup's descendants in pre-order. - walking a cgroup's descendants in post-order. - walking a cgroup's ancestors. When attaching cgroup_iter, one can set a cgroup to the iter_link created from attaching. This cgroup is passed as a file descriptor and serves as the starting point of the walk. If no cgroup is specified, the starting point will be the root cgroup. For walking descendants, one can specify the order: either pre-order or post-order. For walking ancestors, the walk starts at the specified cgroup and ends at the root. One can also terminate the walk early by returning 1 from the iter program. Note that because walking cgroup hierarchy holds cgroup_mutex, the iter program is called with cgroup_mutex held. Currently only one session is supported, which means, depending on the volume of data bpf program intends to send to user space, the number of cgroups that can be walked is limited. For example, given the current buffer size is 8 * PAGE_SIZE, if the program sends 64B data for each cgroup, assuming PAGE_SIZE is 4kb, the total number of cgroups that can be walked is 512. This is a limitation of cgroup_iter. If the output data is larger than the buffer size, the second read() will signal EOPNOTSUPP. In order to work around, the user may have to update their program to reduce the volume of data sent to output. For example, skip some uninteresting cgroups. In future, we may extend bpf_iter flags to allow customizing buffer size. Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Hao Luo <haoluo@google.com>

kernel-patches-bot · 2022-08-01T20:18:01Z

Master branch: 7193084
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=664557
version: 6

Add a selftest for cgroup_iter. The selftest creates a mini cgroup tree of the following structure: ROOT (working cgroup) | PARENT / \ CHILD1 CHILD2 and tests the following scenarios: - invalid cgroup fd. - pre-order walk over descendants from PARENT. - post-order walk over descendants from PARENT. - walk of ancestors from PARENT. - early termination. Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Hao Luo <haoluo@google.com>

From: Yosry Ahmed <yosryahmed@google.com> Enable bpf programs to make use of rstat to collect cgroup hierarchical stats efficiently: - Add cgroup_rstat_updated() kfunc, for bpf progs that collect stats. - Add cgroup_rstat_flush() sleepable kfunc, for bpf progs that read stats. - Add an empty bpf_rstat_flush() hook that is called during rstat flushing, for bpf progs that flush stats to attach to. Attaching a bpf prog to this hook effectively registers it as a flush callback. Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Hao Luo <haoluo@google.com>

From: Yosry Ahmed <yosryahmed@google.com> This patch extends bpf selft cgroup_helpers [ID] n various ways: - Add enable_controllers() that allows tests to enable all or a subset of controllers for a specific cgroup. - Add join_cgroup_parent(). The cgroup workdir is based on the pid, therefore a spawned child cannot join the same cgroup hierarchy of the test through join_cgroup(). join_cgroup_parent() is used in child processes to join a cgroup under the parent's workdir. - Add write_cgroup_file() and write_cgroup_file_parent() (similar to join_cgroup_parent() above). - Add get_root_cgroup() for tests that need to do checks on root cgroup. - Distinguish relative and absolute cgroup paths in function arguments. Now relative paths are called relative_path, and absolute paths are called cgroup_path. Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Hao Luo <haoluo@google.com>

From: Yosry Ahmed <yosryahmed@google.com> Add a selftest that tests the whole workflow for collecting, aggregating (flushing), and displaying cgroup hierarchical stats. TL;DR: - Userspace program creates a cgroup hierarchy and induces memcg reclaim in parts of it. - Whenever reclaim happens, vmscan_start and vmscan_end update per-cgroup percpu readings, and tell rstat which (cgroup, cpu) pairs have updates. - When userspace tries to read the stats, vmscan_dump calls rstat to flush the stats, and outputs the stats in text format to userspace (similar to cgroupfs stats). - rstat calls vmscan_flush once for every (cgroup, cpu) pair that has updates, vmscan_flush aggregates cpu readings and propagates updates to parents. - Userspace program makes sure the stats are aggregated and read correctly. Detailed explanation: - The test loads tracing bpf programs, vmscan_start and vmscan_end, to measure the latency of cgroup reclaim. Per-cgroup readings are stored in percpu maps for efficiency. When a cgroup reading is updated on a cpu, cgroup_rstat_updated(cgroup, cpu) is called to add the cgroup to the rstat updated tree on that cpu. - A cgroup_iter program, vmscan_dump, is loaded and pinned to a file, for each cgroup. Reading this file invokes the program, which calls cgroup_rstat_flush(cgroup) to ask rstat to propagate the updates for all cpus and cgroups that have updates in this cgroup's subtree. Afterwards, the stats are exposed to the user. vmscan_dump returns 1 to terminate iteration early, so that we only expose stats for one cgroup per read. - An ftrace program, vmscan_flush, is also loaded and attached to bpf_rstat_flush. When rstat flushing is ongoing, vmscan_flush is invoked once for each (cgroup, cpu) pair that has updates. cgroups are popped from the rstat tree in a bottom-up fashion, so calls will always be made for cgroups that have updates before their parents. The program aggregates percpu readings to a total per-cgroup reading, and also propagates them to the parent cgroup. After rstat flushing is over, all cgroups will have correct updated hierarchical readings (including all cpus and all their descendants). - Finally, the test creates a cgroup hierarchy and induces memcg reclaim in parts of it, and makes sure that the stats collection, aggregation, and reading workflow works as expected. Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Hao Luo <haoluo@google.com>

kernel-patches-bot · 2022-08-02T04:54:07Z

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=664557 expired. Closing PR.

kernel-patches-bot added bpf-next new V6 labels Aug 1, 2022

Kernel Patches Daemon and others added 5 commits August 1, 2022 13:18

adding ci files

32cbe53

haoluo1022 and others added 4 commits August 1, 2022 13:18

kernel-patches-bot force-pushed the series/641705=>bpf-next branch from 9c90146 to b91861f Compare August 1, 2022 20:18

kernel-patches-bot added changes-requested and removed new labels Aug 2, 2022

kernel-patches-bot closed this Aug 2, 2022

kernel-patches-bot deleted the series/641705=>bpf-next branch August 4, 2022 12:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpf: rstat: cgroup hierarchical stats #3421

bpf: rstat: cgroup hierarchical stats #3421

kernel-patches-bot commented Aug 1, 2022

kernel-patches-bot commented Aug 1, 2022

kernel-patches-bot commented Aug 1, 2022

kernel-patches-bot commented Aug 2, 2022

bpf: rstat: cgroup hierarchical stats #3421

bpf: rstat: cgroup hierarchical stats #3421

Conversation

kernel-patches-bot commented Aug 1, 2022

kernel-patches-bot commented Aug 1, 2022

kernel-patches-bot commented Aug 1, 2022

kernel-patches-bot commented Aug 2, 2022