New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bpf programmable device #5731
Conversation
Upstream branch: 0e73ef1 |
d8a94cd
to
e853900
Compare
Upstream branch: 0e73ef1 |
b6df942
to
3b282d2
Compare
e853900
to
dac1f79
Compare
This work adds a new, minimal BPF-programmable device called "meta" we recently presented at LSF/MM/BPF. The latter name derives from the Greek μετά, encompassing a wide array of meanings such as "on top of", "beyond". Given business logic is defined by BPF, this device can have many meanings. The core idea is that BPF programs are executed within the drivers xmit routine and therefore e.g. in case of containers/Pods moving BPF processing closer to the source. One of the goals was that in case of Pod egress traffic, this allows to move BPF programs from hostns tcx ingress into the device itself, providing earlier drop or forward mechanisms, for example, if the BPF program determines that the skb must be sent out of the node, then a redirect to the physical device can take place directly without going through per-CPU backlog queue. This helps to shift processing for such traffic from softirq to process context, leading to better scheduling decisions and better performance. In this initial version, the meta device ships as a pair, but we plan to extend this further so it can also operate in single device mode. The pair comes with a primary and a peer device. Only the primary device, typically residing in hostns, can manage BPF programs for itself and its peer. The peer device is designated for containers/Pods and cannot attach/detach BPF programs. Upon the device creation, the user can set the default policy to 'forward' or 'drop' for the case when no BPF program is attached. Additionally, the device can be operated in L3 (default) or L2 mode. The management of BPF programs is done via bpf_mprog, so that multi-attach is supported right from the beginning with similar API/dependency controls as tcx. For details on the latter see commit 053c8e1 ("bpf: Add generic attach/detach/query API for multi-progs"). tc BPF compatibility is provided, so that existing programs can be easily migrated. Going forward, we plan to use meta devices in Cilium as the main device type for connecting Pods. They will be operated in L3 mode in order to simplify a Pod's neighbor management and the peer will operate in default drop mode, so that no traffic is leaving between the time when a Pod is brought up by the CNI plugin and programs attached by the agent. Additionally, the programs we attach via tcx on the physical devices are using bpf_redirect_peer() for inbound traffic into meta device, hence the latter also supporting the ndo_get_peer_dev callback. Similarly, we use bpf_redirect_neigh() for the way out, pushing to phys device directly. Also, BIG TCP is supported on meta device. For the follow-up work in single device mode, we plan to convert Cilium's cilium_host/_net devices into a single one. An extensive test suite for checking device operations and the BPF program and link management API comes as BPF selftests in this series. Co-developed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://github.com/borkmann/iproute2/commits/pr/meta Link: http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf (24ff.)
This adds BPF link support for meta device (BPF_LINK_TYPE_META). Similar as with tcx or XDP, the BPF link for meta contains the device. The bpf_mprog API has been reused for its implementation. For details, see also commit e420bed ("bpf: Add fd-based tcx multi-prog infra with link support"). This is now the second user of bpf_mprog after tcx, and in meta case the implementation is also a bit more straight forward since it does not need to deal with miniq. The UAPI extensions for the BPF_LINK_CREATE command are similar as for tcx, that is, relative_{fd,id} and expected_revision. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Sync if_link uapi header to the latest version as we need the refresher in tooling for meta device. Given it's been a while since the last sync and the diff is fairly big, it has been done as its own commit. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This adds bpf_program__attach_meta() API to libbpf. Overall it is very similar to tcx. The API looks as following: LIBBPF_API struct bpf_link * bpf_program__attach_meta(const struct bpf_program *prog, int ifindex, bool peer_device, const struct bpf_meta_opts *opts); The struct bpf_meta_opts is done in similar way as struct bpf_tcx_opts. bpf_program__attach_meta() compared to bpf_program__attach_tcx() has one additional argument, that is peer_device. The latter denotes whether the program should be attached to the relative peer of ifindex or whether it should be attached to ifindex itself. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add support to dump meta link information to bpftool in similar way as we have for XDP. The meta link info only exposes the ifindex. Below shows an example link dump output, and a cgroup link is included for comparison, too: # bpftool link [...] 10: cgroup prog 2466 cgroup_id 1 attach_type cgroup_inet6_post_bind [...] 8: meta prog 35 ifindex meta1(18) [...] Equivalent json output: # bpftool link --json [...] { "id": 10, "type": "cgroup", "prog_id": 2466, "cgroup_id": 1, "attach_type": "cgroup_inet6_post_bind" }, [...] { "id": 12, "type": "meta", "prog_id": 61, "devname": "meta1", "ifindex": 21 } [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Upstream branch: 9e09b75 |
Add support to dump BPF programs on meta via bpftool. This includes both the BPF link and attach ops programs. Dumped information contain the attach location, function entry name, program ID and link ID when applicable. Example with tc BPF link: # ./bpftool net xdp: tc: meta1(22) meta/peer tc1 prog_id 43 link_id 12 [...] Example with json dump: # ./bpftool net --json | jq [ { "xdp": [], "tc": [ { "devname": "meta1", "ifindex": 18, "kind": "meta/primary", "name": "tc1", "prog_id": 29, "prog_flags": [], "link_id": 8, "link_flags": [] } ], "flow_dissector": [], "netfilter": [] } ] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Add a basic netlink helper library for the BPF selftests. This has been taken and cut down/cleaned up from iproute2. More can be added at some later point in time when needed, but for now this covers basics such as device creation which we need for BPF selftests / BPF CI. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add a bigger batch of test coverage to assert correct operation of meta device and its BPF program management: # ./vmtest.sh -- ./test_progs -t tc_meta [...] ./test_progs -t tc_meta [ 1.211407] bpf_testmod: loading out-of-tree module taints kernel. [ 1.211805] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel [ 1.271692] tsc: Refined TSC clocksource calibration: 3407.989 MHz [ 1.274015] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fc9c9451, max_idle_ns: 440795361646 ns [ 1.275241] clocksource: Switched to clocksource tsc #255 tc_meta_basic:OK #256 tc_meta_device:OK #257 tc_meta_multi_links:OK #258 tc_meta_multi_opts:OK #259 tc_meta_neigh_links:OK Summary: 5/0 PASSED, 0 SKIPPED, 0 FAILED [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
3b282d2
to
6aabe91
Compare
At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=787524 expired. Closing PR. |
Pull request for series with
subject: Add bpf programmable device
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=787524