Skip to content

Libbpf: the road to v1.0

Andrii Nakryiko edited this page Dec 3, 2021 · 14 revisions

NOTE: This wiki is adapted from the original Google doc which also contains discussion in comments. This wiki is now source of truth, but Google doc will be kept around for historical reasons.

Libbpf has come a long way in the last few years. It became a mature and powerful library powering many BPF applications. So it’s only logical to finally reflect this maturity with the 1.0 version.

Jump from 0.x to 1.0 is a major version bump, though, which gives libbpf an opportunity to break some backwards compatibility (within a reason, of course) and shed some of the cruft, inconsistencies, and sub-optimal (in retrospective) API choices: both to simplify developer experience through more coherent and uniform API and behavior, as well as to clean up and simplify some of the internal implementation details.

This document is attempting to document changes I have in mind for 1.0 and how API breakage and backwards compatibility is going to be handled in the transition period. The 1.0 version bump won’t happen overnight. My plan is to have a few more 0.x minor version releases to give users time and opportunity to migrate their usage of deprecated APIs to new recommended APIs. No existing functionality (with one exception for xsk.h part) is going to be removed without providing a way to achieve the same result with the new API. Deprecated APIs won’t be marked with __attribute__((deprecated)) until replacement APIs have been released as part of an official 0.x release. This is to avoid the situation of getting deprecation warnings before there is an official libbpf version providing replacement APIs.

Handling deprecation of APIs and functionality

There are three categories of deprecations proposed throughout this document.

  1. Deprecated APIs. Such APIs will get marked with __attribute__((deprecated)) some time before 1.0 release and will be completely removed in v1.0. For such APIs users are expected to migrate their code to use recommended APIs before 1.0 release. As mentioned above, we’ll go through a few more minor releases, so that gives at least a few months to perform migrations.

  2. Discouraged APIs. Some APIs are not broken, but contribute to non-uniform API and/or sub-optimal experiences, but otherwise don’t cause extra maintenance. Often such APIs are also pretty frequently used, so it can cause a lot of unnecessary code churn for existing projects to migrate without a clear benefit. For such APIs, the plan is to move their definitions into a new libbpf_legacy.h header to “hide” them from the “official” APIs in libbpf.h, bpf.h, and btf.h. Eventually, we might consider removing them completely to clean up the code base. With such an approach, new users won’t be using outdated and discouraged APIs, hopefully.

  3. Stricter or changing behaviors. Some changes and breakages are in the area of libbpf behavior and have no direct reflection in public APIs. One example is stricter handling of BPF program’s section name parsing. For such changes, where possible and appropriate, libbpf will log a warning about non-conforming behavior and recommendations on how to avoid such issues. In v1.0, instead of logging a warning, such “violations” will cause hard errors.

As we progress through API deprecation, a Libbpf 1.0 migration guide is populated with short instructions on how to migrate off of deprecated APIs and all the deprecation messages will give a link to corresponding sections of that wiki. This hopefully will make it as straightforward to migrate as possible.

High-level behavior changes

Low-level BPF APIs error reporting is changing

Status: done.

Migration guidelines: here.

Currently, some low-level BPF APIs (APIs in bpf.h, prefixed with bpf_) return errors following two different conventions:

  • -1 result and actual error set as errno (syscall convention);
  • while in other cases -Exxx is returned without setting errno (typical user-space convention, as well as kernel standard).

errno is notoriously inconvenient in practice and users often get it wrong (e.g., doing close(), printf(), etc, which might invalidate actual errno, before recording errno value). It’s much more convenient to get the actual error number directly. We’ll standardize on low-level APIs returning the value of -errno directly as a result. This matches the behavior of high-level API and is generally much less error-prone to handle. But, in addition, all low-level APIs will still set errno on every error. So if users prefer errno, they can still use it. Also, this will be compatible with high-level “constructor” APIs, returning pointers (see below).

This is potentially breaking if applications are doing exact == -1 check, followed by errno check. -1 was never guaranteed (even syscall() documentation doesn’t state -1 will always be returned), but there is still code out there with such a pattern. Such applications would need to switch to < 0 checks. Given it’s impossible to smoothly transition from one convention to another, we’ll do our best to audit existing (open-source) code and make sure they do < 0 error checking. And after that start return -errno directly even before libbpf 1.0.

All “constructor APIs” will return NULL on error

Status: done.

Migration guidelines: here.

All constructor-like APIs that return a new “object”, e.g., bpf_object__open() variants, btf__parse(), bpf_program__attach() returning bpf_link, etc., will start returning NULL, not an error code encoded as a pointer. Current convention is extremely surprising to people not well-versed in kernel development, so leads to bad code like:

struct bpf_link *link = bpf_program__attach(prog);
if (!link) { /* handle error, but not really */ }

In practice, if an error happens, the link won’t be NULL and the program will proceed to (most probably) crash in runtime. For a lot of such code paths, error is not very probable, so there is certainly a bunch of production code that is just a ticking bomb due to this convention. Practice shows that it’s easy to forget about this convention even for kernel developers.

While in most cases whether the operation failed with -EINVAL or -ENOMEM is not that important, libbpf will log human-friendly details on what went wrong (to the best of libbpf’s knowledge). But additionally, for cases where users do care about exact error, such constructor APIs will set errno, just like low-level APIs do. Errno is expected to be rarely needed, evidenced by BPF skeleton and perf_buffer/ring_buffer APIs, all of which return NULL on error.

As for handling the transition and minimizing the surprise factor, consider that in user-space all users of libbpf are supposed to use libbpf_get_error() API to check returned pointer for encoded error. Any application using their own PTR_ERR() implementation is technically guessing the implementation detail and is already broken in the strict sense of the word. So, taking that into account, libbpf_get_error() will start to return -EINVAL for NULL cases, handling both ERR_PTR() and NULL cases transparently. Everyone else is strongly encouraged to use libbpf_get_error() for the transition period.

libbpf_get_error() itself can be either deprecated and removed in v1.0 or become discouraged API and hidden away in libbpf_legacy.h. This can be discussed much later.

In summary, the error reporting approach across all APIs (low-level, high-level, both returning int and pointers) will be as follows:

  • for int-returning APIs that can fail, actual error is returned directly as -Exxx and errno is set to Exxx;
  • for pointer-returning (constructor) APIs, NULL is returned on error and errno is set to the underlying Exxx.

xsk.{c,h} is moving into libxdp

Status: issue #270

AF_XDP parts of libbpf (xsk.h and xsk.c) will be removed from libbpf v1.0 and will become a part of libxdp. The process is underway already. Toke Høiland-Jørgensen and Magnus Karlsson are working on this and should be done well before libbpf 1.0 is released. The intent is to make transition as painless as possible. The rationale of this change is that XSK parts of libbpf are more of a high-level user of libbpf APIs with its own high-level abstractions, rather than a fundamental BPF functionality that libbpf is striving to provide. It also is conceptually very close to XDP in general, so it will benefit users long term to have it be developed as part of libxdp.

Stricter and more uniform BPF program section name (SEC()) handling

Status: issue #271

Libbpf is sometimes pretty lax about BPF program section names and usually cares only about a recognizable program type prefix, ignoring everything else. Historically there was one valid case when this was necessary: multiple entry-level BPF programs of the same type (e.g., two programs attaching to the same tracepoint). Since libbpf started supporting multiple BPF programs per same section, there is no more justification for such use.

On the other hand, having stricter and more uniform section name conventions is helpful to make advanced parsing easier. There are tentative plans to allow “pluggable” BPF program section name parsing to allow other libraries to inject their custom parsing logic (e.g., perf event names parsing), so standardization is important. Besides, having less variation in section names is less confusing for new users trying to infer what’s going on from open-source examples.

So the proposal is to:

  1. Use ‘/’ as a separator consistently. So no more “perf_event_whatever”, only “perf_event/whatever”.
  2. For section names not having any extra “parameters”, don’t allow anything extra beyond BPF program type. So no more “xdp_my_prog”, only “xdp”. And no “cgroup_skb/ingress/garbage”, only “cgroup_skb/ingress”.

During transition period libbpf will still handle such non-conforming section names successfully, but will emit a warning log message at runtime.

Drop support for legacy BPF map declaration syntax

Status: issue #272

Legacy fixed-layout (through struct bpf_map_def) BPF map declaration in BPF code, residing in SEC("maps") will be dropped. Only BTF-defined maps will be supported starting from v1.0.

So instead of

struct bpf_map_def SEC("maps") btf_map = {
        .type = BPF_MAP_TYPE_ARRAY,
        .max_entries = 4,
        .key_size = sizeof(int),
        .value_size = sizeof(struct ipv_counts),
};

only

struct {
    __uint(type, BPF_MAP_TYPE_ARRAY);
    __uint(max_entries, 4);
    __type(key, int);
    __type(value, struct ipv_counts);
} btf_map SEC(".maps");

will be supported.

In the same vein, BPF_ANNOTATE_KV_PAIR(btf_map, int, struct ipv_counts) support will be dropped.

Pinning path differences

Status: issue #273

Currently, BPF program pinning path is auto-derived from its section name, not its C function name. With multiple BPF programs per section it’s already broken. Further, given section name has special characters like ‘/’, it requires sanitization, making pin path still different from actual section name. All that just creates a convoluted behavior.

We’ll change this behavior to use BPF program’s C function name for the filename part of the pinning path. This matches behavior for maps and is more in line with multiple BPF programs per section. This is unfortunate potentially breaking change, but it’s almost inevitable to be really usable as a generic mechanism.

Merging .rodata* and converged .data* + .bss

Status: issue #274

.rodata and other read-only sections (e.g., .rodata.str1.1) will be merged by libbpf into a final .rodata map (with accordingly adjusted BTF information). This will get rid of annoying warnings about .rodata.str1.1 and will allow developers to have custom .rodata sections, yet not cause multitude of BPF maps created. With BTF adjusted and BPF skeleton providing nice access to contents of .rodata, there is little downside to such approach.

Similarly, .data and .data.* will be merged together, just like .rodata.

Additionally, .bss is a historical legacy quirk of user-space static linkers (“Block Started by Symbol”…) and is quite confusing. Further, it’s annoying when BPF code changes zero initialization of a variable to non-zero value and suddenly one needs to update user-space references from skel->bss->my_var to skel->data->my_var. So to avoid such annoyances, .bss will be merged into .data by libbpf at open time (and in BPF skeleton), so all the non-read-only variables will be addressed through consistent skel->data->my_var.

During the transition period, skel->bss and skel->data will be the same pointer. Sometime around or after libbpf 1.0 release bpftool will be updated to stop generating skel->bss completely.

Drop object name prefix from special map names (.rodata, .data, .kconfig)

Status: issue #275

Currently, special internal maps that libbpf creates for things like global and Kconfig variables contain (often truncated) object name along the special name suffix. E.g., “my_ob.rodata”. It has some serious inconveniences in practice. E.g., it’s almost impossible to guess the name of the resulting map (is it “my_ob.rodata” or “my_obje.rodata”, it’s almost never something logical like “my_obj.rodata”). On the other hand, that truncated object name is often insufficient in practice to match map to its BPF application. Further, with bpftool reporting PID of the “owning” process for maps and programs, it becomes quite easy and convenient to map given map to the BPF application, so name doesn’t play a big role in that.

So the proposal is to do away with this truncated prefix and settle on obvious “.rodata”, “.data”, “.kconfig” names. For the transition period, libbpf will be able to handle both conventions. In 1.0 only new-style names will be supported.

btf.h APIs

Starting from the smaller and simpler set of APIs, let’s look at BTF-related APIs and what we want to do with them.

  • #276 btf__finalize_data() will be deprecated. It is used only for internal libbpf needs. Shouldn’t have been exposed, probably. Unlikely anyone is even relying on it.

  • #277 The following APIs are only used internally or by BCC. We’ll adapt BCC to not rely on these APIs and deprecate them.

    • btf__get_map_kv_tids() doesn’t make any sense now that BPF_ANNOTATE_KV_PAIR() is not supported;
    • btf_ext__reloc_func_info() is already marked deprecated and is not compatible with libbpf’s support for multiple BPF programs per section;
    • btf_ext__reloc_line_info() is in the similar boat;
    • btf_ext__func_info_rec_size() is just a specialized “extractor” of record size for specific types of .BTF.ext sections. Anyone iterating and handling .BTF.ext on their own will be able to implement this trivially;
    • btf_ext__line_info_rec_size(), same as above.
  • #278 btf__load() and btf__get_from_id() are used to “upload” and “download” BTF into/from the kernel. I propose to deprecate these two naming variants and introduce two new ones, reflecting their relationship to the kernel. E.g., btf__load_into_kernel() and btf__load_from_kernel() (or btf__load_from_kernel_by_id()) would be less confusing. Given they are not used that frequently, longer names don’t seem to pose a problem for usability.

  • #279 btf__get_nr_types() and btf__get_raw_data() don’t follow the name convention for getters of omitting “get” in the name. Appropriate names should be btf__nr_types() (btf__type_cnt()?) and btf__raw_data(). Further, btf__get_nr_types() confusingly returns number of types except VOID (or, when put alternatively, it returns type ID of the last BTF type), which leads to confusing and non-conventional iteration loop with <= instead of natural <:

for (i = 0; i <= btf__get_nr_types(); i++) { ... }

So for the new btf__nr_types() I propose to fix this and return the number of all types that a user can iterate. Existing “get_” variants can stay as is as discouraged APIs in libbpf_legacy.h, they don’t need much maintenance.

  • #280 libbpf_find_kernel_btf() – its name should more correctly be libbpf_load_vmlinux_btf(), so the proposal is to add such a naming variant and discourage/deprecate the current one. Additionally, libbpf_load_module_btf(const char *module_name) will be added for completeness.

  • #281 struct btf_dedup_opts should be converted into extensible OPTS struct, similar to those used throughout most of the libbpf APIs. dont_resolve_fwds option will be dropped, it’s never used. dedup_table_size can be dropped as well, it’s only used by selftests to force hash collisions. We can drop that pretty safely. The only known user of btf__dedup()() API that accepts btf_dedup_opts is pahole, but it passes NULL for options. So there is little danger of breaking anyone. We can also use symbol versioning to introduce a new variant of btf__dedup(), accepting new-style opts struct. btf_ext argument would be supplied through opts (it rarely is provided).

  • #283 similarly, struct btf_dump_opts and btf_dump__new() will be converted to OPTS framework and new version of btf_dump__new() API will be introduced, looking like this:

struct btf_dump *btf_dump__new(cosnt struct btf *btf,
                               btf_dump_printf_fn_t printf_fn, 
                               const btf_dump_opts *opts);

bpf.h low-level APIs

  • #282 Deprecate the whole zoo of bpf_create_map_xattr(), bpf_create_map_node(), bpf_create_map_name(), bpf_create_map(), bpf_create_map_in_map_node(), bpf_create_map_in_map() APIs in favor of a single unified OPTS-based one, with the name matching bpf syscall’s command (BPF_MAP_CREATE):
int bpf_map_create(enum bpf_map_type map_type map_type,
                   const char *map_name,
                   unsigned int key_size,
                   unsigned int value_size,
                   unsigned int max_entries,
                   const struct bpf_map_create_opts *opts);
  • #284 Similarly, deprecate bpf_load_program_xattr(), bpf_load_program(), bpf_verify_program() in favor of a single unified new API:
int bpf_prog_load(enum bpf_prog_type type, const char *name,
                  const struct bpf_insn *insns, size_t insn_cnt,
                  const char *license,
                  const struct bpf_prog_load_opts *opts);
  • bpf_map_lookup_elem() and bpf_map_lookup_elem_flags() would probably stay the same. While it’s a bit inconsistent to have specific _flags variant, both APIs seem to be holding up just fine and it’s not likely they would need to be extended, so to reduce the migration churn, it makes sense to leave them as is. If we ever need to extend them, adding another OPTS-based variant would be a way to go.

  • Similarly, bpf_prog_detach() and bpf_prog_detach2() are not the most consistent APIs, but given they are sort of deprecated (due to bpf_link-based APIs), probably better to leave them as is for now.

  • #285 bpf_prog_attach() and bpf_prog_attach_xattr() are probably OK to stay as is, but bpf_prog_attach_xattr()'s naming makes less and less sense with dropping all the xattr APIs. So we’ll converge to the convention used in high-level APIs and use bpf_prog_attach_opts() API name. We’ll end up with non-OPTS bpf_prog_attach() and OPTS-based bpf_prog_attach_opts().

  • #286 bpf_prog_test_run_xattr() and bpf_prog_test_run() will be deprecated in favor of already existing bpf_prog_test_run_opts().

  • #419 bpf_load_btf() has inconsistent naming (similar to bpf_load_program() and bpf_create_map()), and it doesn't allow to specify log level. So deprecate bpf_load_btf() in favor of bpf_btf_load() with OPTS.

  • bpf_prog_query() and bpf_task_fd_query() will stay the same. They are rarely used and don’t suffer from constant expansion of API. I can be persuaded otherwise, though.

libbpf.h high-level APIs

  • #287 bpf_object__open(), bpf_object__open_buffer(), bpf_object__open_xattr() – deprecated in favor of bpf_object__open_file() and bpf_object__open_mem().

  • #288 Remove few options from struct bpf_object_open_opts:

    • relaxed_core_relocs weren’t honored for a long time now, remove it;
    • remove attach_prog_fd, it only works for a case of bpf_object containing a single BPF program or if all BPF programs are attached to the same target. Neither is a typical situation, so this option just leads to more confusion. Use bpf_program__set_attach_target() on individual bpf_program after bpf_object is open.
  • #289 Deprecate bpf_object__load_xattr(). bpf_object__load() is enough. Anything else could be provided in open opts or set through setter APIs before the load.

  • #290 Deprecate bpf_object__unload(). BPF objects are not re-loadable after unload. Use bpf_object__close() to unload and free up resources in one operation.

  • #291 For completeness, add bpf_object__set_name(), to match bpf_object__name() getter.

  • #292 Deprecate bpf_object__find_program_by_title() as an API. Searching by title (i.e., section name) is ambiguous with multiple BPF programs per section. Use bpf_object__for_each_program() macro to loop over all BPF programs and compare using bpf_program__section_name().

  • #293 Deprecate bpf_object__next() API and bpf_object__for_each_safe() macro. There is little utility to it, but it’s not thread-safe. Any application that needs to iterate over all its bpf_objects can keep track of them on their own easily.

  • #294 Deprecate bpf_object__set_priv()/bpf_object__priv(), bpf_program__set_priv()/bpf_program__priv() and bpf_map__set_priv()/bpf_map__priv(). libbpf entities are not arbitrary containers for user’s private data. In cases where users should provide callbacks, they will be able to provide their own context at the time of callback registration.

  • #295 libbpf_find_vmlinux_btf_id() supports only vmlinux BTF, but no kernel module support. Should we add another API that would search for types across kernel modules and return kernel module BTF FD, in addition to BTF type ID? Probably, but that could be done later. So, bottom line, just leave this API be for now.

  • #296 bpf_program__next() and bpf_program__prev() are confusingly named. They are really “methods” of bpf_object, so should be called bpf_object__next_program() and bpf_object__prev_program(). Luckily, they are mostly used through bpf_object__for_each_program() macro, so deprecate confusingly named APIs and update the macro to use new ones. bpf_map__next() and bpf_map__prev() suffer from the same problem. Deal with them similarly.

  • #297 Deprecate bpf_program__title() in favor of bpf_program__section_name(). “Title” term is confusing and unconventional, it’s SEC() in code and “section name” everywhere else.

  • #298 bpf_program__size() is mildly confusing, though not broken by any means. Add bpf_program__insn_cnt() and bpf_program__insns() getters to get access to finalized BPF assembly of each BPF program. This will help with some cases where deprecated bpf_program__set_prep() might have been used, see below.

  • #299 bpf_program__set_prep() is an obscure, less-known API which adds unnecessary complexity to the public API and internal implementation. Deprecate it. For cases when someone do still want to adjust and/or clone BPF programs, it could be achieved by using new bpf_program__insns() and bpf_program__insn_cnt() APIs to get raw (but libbpf-processed for CO-RE, bpf-to-bpf calls, map relocations, etc) BPF instructions and proceed with low-level bpf_prog_load() API.

  • #300 Deprecate bpf_program__pin_instance(), bpf_program__unpin_instance() and bpf_program__nth_fd(). With no bpf_program__set_prep(), there is no concept of multiple bpf_program instances.

  • #301 Deprecate bpf_program__load(). In general, it’s impossible for libbpf to load an individual BPF program in isolation. libbpf fully embraced the concept of bpf_object as a collection of related bpf_programs and bpf_maps (and global variables, kconfig, ksym, etc). So this API is just confusing and impossible to support properly.

  • #302 Remove bpf_object__find_map_by_offset(). API created with simplistic assumptions about BPF map definitions. It hasn’t worked for a while, so just remove it finally.

  • #303 Remove bpf_map__for_each() macro in favor of better named bpf_object__for_each_map().

  • #304 Discourage bpf_map__resize(), which is an alias to more clearly named bpf_map__set_max_entries().

  • #305 Discourage bpf_map__def(). It is rarely used and non-extensible. Provide individual getter APIs to compensate (if we are still missing some of them).

  • #306 Deprecate bpf_map__is_offload_neutral(). It’s most probably broken already. PERF_EVENT_ARRAY isn’t the only map that’s not suitable for hardware offloading. Unlikely anyone is using this and it is a maintenance burden, if we were to make it correct.

  • #307 Discourage bpf_map__get_pin_path() and use consistent naming for getter, bpf_map__pin_path().

  • #308 Deprecate bpf_prog_load() and bpf_prog_load_xattr() in favor of bpf_object__open_{mem, file}() and bpf_object__load() combo.

  • #309 bpf_set_link_xdp_fd() / bpf_set_link_xdp_fd_opts() / bpf_get_link_xdp_id() / bpf_get_link_xdp_info() don’t follow libbpf naming guidelines. They are akin to object-less libbpf helpers, so should be called with libbpf_ prefix. They also have a distinct low-level feel. So:

    • Discourage them (move them to libbpf_legacy.h);
    • Introduce OPTS-based libbpf_xdp_set_prog_fd() and libbpf_xdp_get_info() or something along those lines. I hope XDP-using users can make good suggestions here. And probably move them to bpf.h.
  • #310 bpf_perf_event_read_simple() is a low-level and dangerous API that could be used to implement custom perf buffer consumption, but is hard to actually use correctly. Deprecate it. There is no reason to use it with perf_buffer__poll() and perf_buffer__consume() APIs available. If anyone really needs custom perf buffer consumption, re-implementing bpf_perf_event_read_simple() shouldn’t be a problem (for them) at that point.

  • #311 perf_buffer options are not OPTS-based. Also perf_buffer constructor APIs (perf_buffer__new() and perf_buffer__new_raw()) could use a bit better design. But perf_buffer__new() is a very popular API used in many programs, so deprecating it will cause almost universal code churn. We also didn’t have a need to extend any of the opts yet (in more than 18 months now), so most practical approach would be to leave them as is. But if we were to change those APIs, I’d switch perf_buffer_opts and perf_buffer_raw_opts into OPTS-based ones and made constructor APIs look like this:

struct perf_buffer *
perf_buffer__new(int map_fd, size_t page_cnt,
                 perf_buffer_sample_fn sample_cb,
                 perf_buffer_lost_fn lost_cb,
                 const struct perf_buffer_opts *opts);

struct perf_buffer *
perf_buffer__new_raw(int map_fd, size_t page_cnt,
                     struct perf_event_attr *attr,
                     perf_buffer_event_fn event_cb,
                     const struct struct perf_buffer_raw_opts *opts);

But it doesn’t seem worthwhile as it doesn’t buy as much. But I’d love to see strong opinions on this.

  • #312 I never had to use bpf_probe_prog_type() / bpf_probe_map_type() / bpf_probe_helper() / bpf_probe_large_insn_limit() in practice, but they don’t seem to cause much maintenance issues. So I’m inclined to leave them as is, unless someone objects strongly. But we need to have a wider discussion about libbpf's convenience probing APIs. There is a niche demand for various xxx-to-string conversion APIs (e.g., to convert bpf_prog_type to string representation), so we should make a decision whether it's libbpf's first-class citizens and make appropriate implementations high-quality and complete.

  • #313 bpf_prog_info_linear-related APIs. They completely fail libbpf naming guidelines (they use bpf_program__ prefix, but they don’t operate on bpf_program objects). bpf_program__get_prog_info_linear() doesn’t even declare that it is expecting enum bpf_prog_info_array and just goes with generic __u64. They seem to be only used in perf and bpftool, so updating all users seems doable. At the very least, I’d fix the naming to be libbpf_ prefixed. But I need input from Song, Arnaldo and others here.