Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bpf programmable device #5731

Closed

Conversation

kernel-patches-daemon-bpf[bot]
Copy link

Pull request for series with
subject: Add bpf programmable device
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=787524

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 0e73ef1
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=787524
version: 1

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 0e73ef1
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=787524
version: 1

This work adds a new, minimal BPF-programmable device called "meta" we
recently presented at LSF/MM/BPF. The latter name derives from the Greek
μετά, encompassing a wide array of meanings such as "on top of", "beyond".
Given business logic is defined by BPF, this device can have many meanings.
The core idea is that BPF programs are executed within the drivers xmit
routine and therefore e.g. in case of containers/Pods moving BPF processing
closer to the source.

One of the goals was that in case of Pod egress traffic, this allows to
move BPF programs from hostns tcx ingress into the device itself, providing
earlier drop or forward mechanisms, for example, if the BPF program
determines that the skb must be sent out of the node, then a redirect to
the physical device can take place directly without going through per-CPU
backlog queue. This helps to shift processing for such traffic from softirq
to process context, leading to better scheduling decisions and better
performance.

In this initial version, the meta device ships as a pair, but we plan to
extend this further so it can also operate in single device mode. The pair
comes with a primary and a peer device. Only the primary device, typically
residing in hostns, can manage BPF programs for itself and its peer. The
peer device is designated for containers/Pods and cannot attach/detach
BPF programs. Upon the device creation, the user can set the default policy
to 'forward' or 'drop' for the case when no BPF program is attached.

Additionally, the device can be operated in L3 (default) or L2 mode. The
management of BPF programs is done via bpf_mprog, so that multi-attach is
supported right from the beginning with similar API/dependency controls as
tcx. For details on the latter see commit 053c8e1 ("bpf: Add generic
attach/detach/query API for multi-progs"). tc BPF compatibility is provided,
so that existing programs can be easily migrated.

Going forward, we plan to use meta devices in Cilium as the main device type
for connecting Pods. They will be operated in L3 mode in order to simplify
a Pod's neighbor management and the peer will operate in default drop mode,
so that no traffic is leaving between the time when a Pod is brought up by
the CNI plugin and programs attached by the agent. Additionally, the programs
we attach via tcx on the physical devices are using bpf_redirect_peer()
for inbound traffic into meta device, hence the latter also supporting the
ndo_get_peer_dev callback. Similarly, we use bpf_redirect_neigh() for the
way out, pushing to phys device directly. Also, BIG TCP is supported on meta
device. For the follow-up work in single device mode, we plan to convert
Cilium's cilium_host/_net devices into a single one.

An extensive test suite for checking device operations and the BPF program
and link management API comes as BPF selftests in this series.

Co-developed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://github.com/borkmann/iproute2/commits/pr/meta
Link: http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf (24ff.)
This adds BPF link support for meta device (BPF_LINK_TYPE_META). Similar
as with tcx or XDP, the BPF link for meta contains the device.

The bpf_mprog API has been reused for its implementation. For details, see
also commit e420bed ("bpf: Add fd-based tcx multi-prog infra with link
support").

This is now the second user of bpf_mprog after tcx, and in meta case the
implementation is also a bit more straight forward since it does not need
to deal with miniq.

The UAPI extensions for the BPF_LINK_CREATE command are similar as for tcx,
that is, relative_{fd,id} and expected_revision.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Sync if_link uapi header to the latest version as we need the refresher
in tooling for meta device. Given it's been a while since the last sync
and the diff is fairly big, it has been done as its own commit.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This adds bpf_program__attach_meta() API to libbpf. Overall it is very
similar to tcx. The API looks as following:

  LIBBPF_API struct bpf_link *
  bpf_program__attach_meta(const struct bpf_program *prog, int ifindex,
                           bool peer_device, const struct bpf_meta_opts *opts);

The struct bpf_meta_opts is done in similar way as struct bpf_tcx_opts.
bpf_program__attach_meta() compared to bpf_program__attach_tcx() has one
additional argument, that is peer_device. The latter denotes whether the
program should be attached to the relative peer of ifindex or whether it
should be attached to ifindex itself.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add support to dump meta link information to bpftool in similar way
as we have for XDP. The meta link info only exposes the ifindex.

Below shows an example link dump output, and a cgroup link is included
for comparison, too:

  # bpftool link
  [...]
  10: cgroup  prog 2466
        cgroup_id 1  attach_type cgroup_inet6_post_bind
  [...]
  8: meta  prog 35
        ifindex meta1(18)
  [...]

Equivalent json output:

  # bpftool link --json
  [...]
  {
    "id": 10,
    "type": "cgroup",
    "prog_id": 2466,
    "cgroup_id": 1,
    "attach_type": "cgroup_inet6_post_bind"
  },
  [...]
  {
    "id": 12,
    "type": "meta",
    "prog_id": 61,
    "devname": "meta1",
    "ifindex": 21
  }
  [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 9e09b75
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=787524
version: 1

Add support to dump BPF programs on meta via bpftool. This includes both
the BPF link and attach ops programs. Dumped information contain the attach
location, function entry name, program ID and link ID when applicable.

Example with tc BPF link:

  # ./bpftool net
  xdp:

  tc:
  meta1(22) meta/peer tc1 prog_id 43 link_id 12

  [...]

Example with json dump:

  # ./bpftool net --json | jq
  [
    {
      "xdp": [],
      "tc": [
        {
          "devname": "meta1",
          "ifindex": 18,
          "kind": "meta/primary",
          "name": "tc1",
          "prog_id": 29,
          "prog_flags": [],
          "link_id": 8,
          "link_flags": []
        }
      ],
      "flow_dissector": [],
      "netfilter": []
    }
  ]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Add a basic netlink helper library for the BPF selftests. This has been
taken and cut down/cleaned up from iproute2. More can be added at some
later point in time when needed, but for now this covers basics such as
device creation which we need for BPF selftests / BPF CI.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add a bigger batch of test coverage to assert correct operation of
meta device and its BPF program management:

  # ./vmtest.sh -- ./test_progs -t tc_meta
  [...]
  ./test_progs -t tc_meta
  [    1.211407] bpf_testmod: loading out-of-tree module taints kernel.
  [    1.211805] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  [    1.271692] tsc: Refined TSC clocksource calibration: 3407.989 MHz
  [    1.274015] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fc9c9451, max_idle_ns: 440795361646 ns
  [    1.275241] clocksource: Switched to clocksource tsc
  #255     tc_meta_basic:OK
  #256     tc_meta_device:OK
  #257     tc_meta_multi_links:OK
  #258     tc_meta_multi_opts:OK
  #259     tc_meta_neigh_links:OK
  Summary: 5/0 PASSED, 0 SKIPPED, 0 FAILED
  [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
@kernel-patches-daemon-bpf
Copy link
Author

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=787524 expired. Closing PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant