Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local kptrs, BPF linked lists #3794

Closed
wants to merge 26 commits into from

Conversation

kernel-patches-bot
Copy link

Pull request for series with
subject: Local kptrs, BPF linked lists
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=684373

@kernel-patches-bot
Copy link
Author

Master branch: f6ac03e
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=684373
version: 1

Pull request is NOT updated. Failed to apply https://patchwork.kernel.org/project/netdevbpf/list/?series=684373
error message:

Cmd('git') failed due to: exit code(128)
  cmdline: git am -3
  stdout: 'Applying: bpf: Document UAPI details for special BPF types
Applying: bpf: Allow specifying volatile type modifier for kptrs
Applying: bpf: Clobber stack slot when writing over spilled PTR_TO_BTF_ID
Applying: bpf: Fix slot type check in check_stack_write_var_off
Applying: bpf: Drop reg_type_may_be_refcounted_or_null
Applying: bpf: Refactor kptr_off_tab into fields_tab
Applying: bpf: Consolidate spin_lock, timer management into fields_tab
Applying: bpf: Refactor map->off_arr handling
Applying: bpf: Support bpf_list_head in map values
Applying: bpf: Introduce local kptrs
Applying: bpf: Recognize bpf_{spin_lock,list_head,list_node} in local kptrs
Applying: bpf: Verify ownership relationships for owning types
Applying: bpf: Support locking bpf_spin_lock in local kptr
Applying: bpf: Allow locking bpf_spin_lock global variables
Applying: bpf: Rewrite kfunc argument handling
Applying: bpf: Drop kfunc bits from btf_check_func_arg_match
Applying: bpf: Support constant scalar arguments for kfuncs
Applying: bpf: Teach verifier about non-size constant arguments
Applying: bpf: Introduce bpf_kptr_new
Applying: bpf: Introduce bpf_kptr_drop
Applying: bpf: Permit NULL checking pointer with non-zero fixed offset
Applying: bpf: Introduce single ownership BPF linked list API
Applying: libbpf: Add support for private BSS map section
Applying: selftests/bpf: Add __contains macro to bpf_experimental.h
Applying: selftests/bpf: Add BPF linked list API tests
Using index info to reconstruct a base tree...
M	tools/testing/selftests/bpf/DENYLIST.s390x
Falling back to patching base and 3-way merge...
Auto-merging tools/testing/selftests/bpf/DENYLIST.s390x
CONFLICT (content): Merge conflict in tools/testing/selftests/bpf/DENYLIST.s390x
Patch failed at 0025 selftests/bpf: Add BPF linked list API tests
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".'
  stderr: 'error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch'

conflict:

diff --cc tools/testing/selftests/bpf/DENYLIST.s390x
index beef1232a47a,1a59524516df..000000000000
--- a/tools/testing/selftests/bpf/DENYLIST.s390x
+++ b/tools/testing/selftests/bpf/DENYLIST.s390x
@@@ -76,4 -76,4 +76,8 @@@ lookup_ke
  verify_pkcs7_sig                         # JIT does not support calling kernel function                                (kfunc)
  kfunc_dynptr_param                       # JIT does not support calling kernel function                                (kfunc)
  deny_namespace                           # failed to attach: ERROR: strerror_r(-524)=22                                (trampoline)
++<<<<<<< HEAD
 +libbpf_get_fd_by_id_opts                 # failed to attach: ERROR: strerror_r(-524)=22                                (trampoline)
++=======
+ linked_list				 # JIT does not support calling kernel function                                (kfunc)
++>>>>>>> selftests/bpf: Add BPF linked list API tests

@kernel-patches-bot
Copy link
Author

Master branch: d31ada3
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=684982
version: 2

Kernel Patches Daemon and others added 20 commits October 13, 2022 08:14
The kernel recognizes some special BPF types in map values or local
kptrs. Document that only bpf_spin_lock and bpf_timer will preserve
backwards compatibility, and kptr will preserve backwards compatibility
for the operations on the pointer, not the types supported for such
kptrs.

For local kptrs, document that there are no stability guarantees at all.

Finally, document that 'bpf_' namespace is reserved for adding future
special fields, hence BPF programs must not declare types with such
names in their programs and still expect backwards compatibility.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
This is useful in particular to mark the pointer as volatile, so that
compiler treats each load and store to the field as a volatile access.
The alternative is having to define and use READ_ONCE and WRITE_ONCE in
the BPF program.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
When support was added for spilled PTR_TO_BTF_ID to be accessed by
helper memory access, the stack slot was not overwritten to STACK_MISC
(and that too is only safe when env->allow_ptr_leaks is true).

This means that helpers who take ARG_PTR_TO_MEM and write to it may
essentially overwrite the value while the verifier continues to track
the slot for spilled register.

This can cause issues when PTR_TO_BTF_ID is spilled to stack, and then
overwritten by helper write access, which can then be passed to BPF
helpers or kfuncs.

Handle this by falling back to the case introduced in a later commit,
which will also handle PTR_TO_BTF_ID along with other pointer types,
i.e. cd17d38 ("bpf: Permits pointers on stack for helper calls").

Finally, include a comment on why REG_LIVE_WRITTEN is not being set when
clobber is set to true. In short, the reason is that while when clobber
is unset, we know that we won't be writing, when it is true, we *may*
write to any of the stack slots in that range. It may be a partial or
complete write, to just one or many stack slots.

We cannot be sure, hence to be conservative, we leave things as is and
never set REG_LIVE_WRITTEN for any stack slot. However, clobber still
needs to reset them to STACK_MISC assuming writes happened. However read
marks still need to be propagated upwards from liveness point of view,
as parent stack slot's contents may still continue to matter to child
states.

Cc: Yonghong Song <yhs@meta.com>
Fixes: 1d68f22 ("bpf: Handle spilled PTR_TO_BTF_ID properly when checking stack_boundary")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
For the case where allow_ptr_leaks is false, code is checking whether
slot type is STACK_INVALID and STACK_SPILL and rejecting other cases.
This is a consequence of incorrectly checking for register type instead
of the slot type (NOT_INIT and SCALAR_VALUE respectively). Fix the
check.

Fixes: 01f810a ("bpf: Allow variable-offset stack access")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
It is not scalable to maintain a list of types that can have non-zero
ref_obj_id. It is never set for scalars anyway, so just remove the
conditional on register types and print it whenever it is non-zero.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To prepare the BPF verifier to handle special fields in both map values
and program allocated types coming from program BTF, we need to refactor
the kptr_off_tab handling code into something more generic and reusable
across both cases to avoid code duplication.

Later patches also require passing this data to helpers at runtime, so
that they can work on user defined types, initialize them, destruct
them, etc.

The main observation is that both map values and such allocated types
point to a type in program BTF, hence they can be handled similarly. We
can prepare a field metadata table for both cases and store them in
struct bpf_map or struct btf depending on the use case.

Hence, refactor the code into generic btf_type_fields and btf_field
member structs. The btf_type_fields represents the fields of a specific
btf_type in user BTF. The cnt indicates the number of special fields we
successfully recognized, and field_mask is a bitmask of fields that were
found, to enable quick determination of availability of a certain field.

Subsequently, refactor the rest of the code to work with these generic
types, remove assumptions about kptr and kptr_off_tab, rename variables
to more meaningful names, etc.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Now that kptr_off_tab has been refactored into fields_tab, and can hold
more than one specific field type, accomodate bpf_spin_lock and
bpf_timer as well.

While they don't require any more metadata than offset, having all
special fields in one place allows us to share the same code for
allocated user defined types and handle both map values and these
allocated objects in a similar fashion.

As an optimization, we still keep spin_lock_off and timer_off offsets in
the btf_type_fields structure, just to avoid having to find the
btf_field struct each time their offset is needed. This is mostly needed
to manipulate such objects in a map value at runtime. It's ok to
hardcode just one offset as more than one field is disallowed.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Refactor map->off_arr handling into generic functions that can work on
their own without hardcoding map specific code. The btf_type_fields_off
structure is now returned from btf_parse_fields_off, which can be reused
later for types in program BTF.

All functions like copy_map_value, zero_map_value call generic
underlying functions so that they can also be reused later for copying
to values allocated in programs which encode specific fields.

Later, some helper functions will also require access to this off_arr
structure to be able to skip over special fields at runtime.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Add the basic support on the map side to parse, recognize, verify, and
build metadata table for a new special field of the type struct
bpf_list_head. To parameterize the bpf_list_head for a certain value
type and the list_node member it will accept in that value type, we use
BTF declaration tags.

The definition of bpf_list_head in a map value will be done as follows:

struct foo {
	struct bpf_list_node node;
	int data;
};

struct map_value {
	struct bpf_list_head head __contains(foo, node);
};

Then, the bpf_list_head only allows adding to the list 'head' using the
bpf_list_node 'node' for the type struct foo.

The 'contains' annotation is a BTF declaration tag composed of four
parts, "contains:kind:name:node" where the kind and name is then used to
look up the type in the map BTF. The node defines name of the member in
this type that has the type struct bpf_list_node, which is actually used
for linking into the linked list. For now, 'kind' part is hardcoded as
struct.

This allows building intrusive linked lists in BPF, using container_of
to obtain pointer to entry, while being completely type safe from the
perspective of the verifier. The verifier knows exactly the type of the
nodes, and knows that list helpers return that type at some fixed offset
where the bpf_list_node member used for this list exists. The verifier
also uses this information to disallow adding types that are not
accepted by a certain list.

For now, no elements can be added to such lists. Support for that is
coming in future patches, hence draining and freeing items is done
with a TODO that will be resolved in a future patch.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Introduce the idea of local kptrs, i.e. PTR_TO_BTF_ID that point to a
type in program BTF. This is indicated by the presence of MEM_TYPE_LOCAL
type tag in reg->type to avoid having to check btf_is_kernel when trying
to match argument types in helpers.

For now, these local kptrs will always be referenced in verifier
context, hence ref_obj_id == 0 for them is a bug. It is allowed to write
to such objects, as long fields that are special are not touched
(support for which will be added in subsequent patches).

No PROBE_MEM handling is hence done since they can never be in an
undefined state, and their lifetime will always be valid.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Allow specifying bpf_spin_lock, bpf_list_head, bpf_list_node fields in a
local kptr.

A bpf_list_head allows implementing map-in-map style use cases, where
local kptr with bpf_list_head is linked into a list in a map value. This
would require embedding a bpf_list_node, support for which is also
included.

Lastly, while we strictly don't require to hold a bpf_spin_lock while
manipulating the bpf_list_head of a local kptr, as when have access to
it, we have complete ownership of the object, the locking constraint is
still kept and may be conditionally lifted in the future.

Note that the specification of such types can be done just like map
values, e.g.:

struct bar {
	struct bpf_list_node node;
};

struct foo {
	struct bpf_spin_lock lock;
	struct bpf_list_head head __contains(bar, node);
	struct bpf_list_node node;
};

struct map_value {
	struct bpf_spin_lock lock;
	struct bpf_list_head head __contains(foo, node);
};

To recognize such types in user BTF, we build a btf_struct_metas array
of metadata items corresponding to each BTF ID. This is done once during
the btf_parse stage to avoid having to do it each time during the
verification process's requirement to inspect the metadata.

Moreover, the computed metadata needs to be passed to some helpers in
future patches which requires allocating them and storing them in the
BTF that is pinned by the program itself, so that valid access can be
assumed to such data during program runtime.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Ensure that there can be no ownership cycles among different types by
way of having owning objects that can hold some other type as their
element. For instance, a map value can only hold local kptrs, but these
are allowed to have another bpf_list_head. To prevent unbounded
recursion while freeing resources, elements of bpf_list_head in local
kptrs can never have a bpf_list_head which are part of list in a map
value.

Also, to make runtime destruction easier, once btf_struct_metas is fully
populated, we can stash the metadata of the value type directly in the
metadata of the list_head fields, as that allows easier access to the
value type's layout to destruct it at runtime from the btf_field entry
of the list head itself.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Allow locking a bpf_spin_lock embedded in local kptr, in addition to
already support map value pointers. The handling is similar to that
of map values, by just preserving the reg->id of local kptrs as well,
and adjusting process_spin_lock to work with non-PTR_TO_MAP_VALUE and
remember the id in verifier state.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Global variables reside in maps accessible using direct_value_addr
callbacks, so giving each load instruction's rewrite a unique reg->id
disallows us from holding locks which are global.

This is not great, so refactor the active_spin_lock into two separate
fields, active_spin_lock_ptr and active_spin_lock_id, which is generic
enough to allow it for global variables, map lookups, and local kptr
registers at the same time.

Held vs non-held is indicated by active_spin_lock_ptr, which stores the
reg->map_ptr or reg->btf pointer of the register used for locking spin
lock. But the active_spin_lock_id also needs to be compared to ensure
whether bpf_spin_unlock is for the same register.

Next, pseudo load instructions are not given a unique reg->id, as they
are doing lookup for the same map value (max_entries is never greater
than 1).

Essentially, we consider that the tuple of (active_spin_lock_ptr,
active_spin_lock_id) will always be unique for any kind of argument to
bpf_spin_{lock,unlock}.

Note that this can be extended in the future to also remember offset
used for locking, so that we can introduce multiple bpf_spin_lock fields
in the same allocation.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
As we continue to add more features, argument types, kfunc flags, and
different extensions to kfuncs, the code to verify the correctness of
the kfunc prototype wrt the passed in registers has become ad-hoc and
ugly to read.

To make life easier, and make a very clear split between different
stages of argument processing, move all the code into verifier.c and
refactor into easier to read helpers and functions.

This also makes sharing code within the verifier easier with kfunc
argument processing. This will be more and more useful in later patches
as we are now moving to implement very core BPF helpers as kfuncs, to
keep them experimental before baking into UAPI.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Remove all kfunc related bits now from btf_check_func_arg_match, as
users have been converted away to refactored kfunc argument handling.

This is split into a separate commit to aid review, in order to compare
what has been preserved from the removed bits easily instead of mixing
removed hunks with previous patch.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Allow passing known constant scalars as arguments to kfuncs that do not
represent a size parameter. This makes the search pruning optimization
of verifier more conservative for such kfunc calls, and each
non-distinct argument is considered unequivalent.

We will use this support to then expose a global bpf_kptr_alloc function
where it takes the local type ID in program BTF, and returns a
PTR_TO_BTF_ID to the local type. These will be called local kptrs, and
allows programs to allocate their own objects.

However, this is still not completely safe, as mark_chain_precision
logic is buggy without more work when the constant argument is not a
size, but still needs precise marker propagation for pruning checks.
Next patch will fix this problem.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Currently, the verifier has support for various arguments that either
describe the size of the memory being passed in to a helper, or describe
the size of the memory being returned. When a constant is passed in like
this, it is assumed for the purposes of precision tracking that if the
value in the already explored safe state is within the value in current
state, it would fine to prune the search.

While this holds well for size arguments, arguments where each value may
denote a distinct meaning and needs to be verified separately needs more
work. Search can only be pruned if both are constant values and both are
equal. In all other cases, it would be incorrect to treat those two
precise registers as equivalent if the new value satisfies the old one
(i.e. old <= cur).

Hence, make the register precision marker tri-state. There are now three
values that reg->precise takes: NOT_PRECISE, PRECISE, EXACT.

Both PRECISE and EXACT are 'true' values. EXACT affects how regsafe
decides whether both registers are equivalent for the purposes of
verifier state equivalence. When it sees that one register has
reg->precise == EXACT, unless both are absolute, it will return false.
When both are, it returns true only when both are const and both have
the same value. Otherwise, for PRECISE case it falls back to the default
check that is present now (i.e. thinking that we're talking about
sizes).

This is required as a future patch introduces a BPF memory allocator
interface, where we take the program BTF's type ID as an argument. Each
distinct type ID may result in the returned pointer obtaining a
different size, hence precision tracking is needed, and pruning cannot
just happen when the old value is within the current value. It must only
happen when the type ID is equal. The type ID will always correspond to
prog->aux->btf hence actual type match is not required.

Finally, change mark_chain_precision precision argument to EXACT for
kfuncs constant non-size scalar arguments (tagged with __k suffix).

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Introduce type safe memory allocator bpf_kptr_new for BPF programs. The
kernel side kfunc is named bpf_kptr_new_impl, as passing hidden
arguments to kfuncs still requires having them in prototype, unlike BPF
helpers which always take 5 arguments and have them checked using
bpf_func_proto in verifier, ignoring unset argument types.

Introduce __ign suffix to ignore a specific kfunc argument during type
checks, then use this to introduce support for passing type metadata to
the bpf_kptr_new_impl kfunc.

The user passes BTF ID of the type it wants to allocates in program BTF,
the verifier then rewrites the first argument as the size of this type,
after performing some sanity checks (to ensure it exists and it is a
struct type).

The second argument flags is reserved to be 0 for now.

The third argument is also fixed up and passed by the verifier. This is
the btf_struct_meta for the type being allocated. It would be needed
mostly for the offset array which is required for zero initializing
special fields while leaving the rest of storage in unitialized state.

It would also be needed in the next patch to perform proper destruction
of the object's special fields.

A convenience macro is included in the bpf_experimental.h header to hide
over the ugly details of the implementation, leading to user code
looking similar to a language level extension which allocates and
constructs fields of a user type.

struct bar {
	struct bpf_list_node node;
};

struct foo {
	struct bpf_spin_lock lock;
	struct bpf_list_head head __contains(bar, node);
};

void prog(void) {
	struct foo *f;

	f = bpf_kptr_new(typeof(*f));
	if (!f)
		return;
	...
}

A key piece of this story is still missing, i.e. the free function,
which will come in the next patch.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
kkdwivedi and others added 6 commits October 13, 2022 08:14
Introduce bpf_kptr_drop, which is the kfunc used to free local kptrs
allocated using bpf_kptr_new. Similar to bpf_kptr_new, it implicitly
destructs the fields part of the local kptr automatically without user
intervention.

Just like the previous patch, btf_struct_meta that is needed to free up
the special fields is passed as a hidden argument to the kfunc.

For the user, a convenience macro hides over the kernel side kfunc which
is named bpf_kptr_drop_impl.

Continuing the previous example:

void prog(void) {
	struct foo *f;

	f = bpf_kptr_new(typeof(*f));
	if (!f)
		return;
	bpf_kptr_drop(f);
}

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Pointer increment on seeing PTR_MAYBE_NULL is already protected against,
hence make an exception for local kptrs while still keeping the warning
for other unintended cases that might creep in.

bpf_list_del{,tail} helpers return a local kptr with incremented offset
pointing to bpf_list_node field. The user is supposed to then obtain the
pointer to the entry using container_of after NULL checking it. The
current restrictions trigger a warning when doing the NULL checking.
Revisiting the reason, it is meant as an assertion which seems to
actually work and catch the bad case.

Hence, under no other circumstances can reg->off be non-zero for a
register that has the PTR_MAYBE_NULL type flag set.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Add a linked list API for use in BPF programs, where it expects
protection from the bpf_spin_lock in the same allocation as the
bpf_list_head. Future patches will extend the same infrastructure to
have different flavors with varying protection domains and visibility
(e.g. percpu variant with local_t protection, usable in NMI progs).

The following functions are added to kick things off:

bpf_list_add
bpf_list_add_tail
bpf_list_del
bpf_list_del_tail

The lock protecting the bpf_list_head needs to be taken for all
operations.

Once a node has been added to the list, it's pointer changes to
PTR_UNTRUSTED. However, it is only released once the lock protecting the
list is unlocked. For such local kptrs with PTR_UNTRUSTED set but an
active ref_obj_id, it is still permitted to read and write to them as
long as the lock is held.

bpf_list_del and bpf_list_del_tail delete the first or last item of the
list respectively, and return pointer to the element at the list_node
offset. The user can then use container_of style macro to get the actual
entry type. The verifier however statically knows the actual type, so
the safety properties are still preserved.

With these additions, programs can now manage their own linked lists and
store their objects in them.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Currently libbpf does not allow declaration of a struct bpf_spin_lock in
global scope. Attempting to do so results in "failed to re-mmap" error,
as .bss arraymap containing spinlock is not allowed to be mmap'd.

This patch adds support for a .bss.private section. The maps contained
in this section will not be mmaped into userspace by libbpf, nor will
they be exposed via bpftool-generated skeleton.

Intent here is to allow more natural programming pattern for
global-scope spinlocks which will be used by rbtree locking mechanism in
further patches in this series.

Notes:

  * Initially I called the section .bss.no_mmap, but the broader
    'private' term better indicates that skeleton shouldn't expose these
    maps at all, IMO.

  * bpftool/gen.c's is_internal_mmapable_map function checks whether the
    map flags have BPF_F_MMAPABLE, so no bpftool changes were necessary
    to remove .bss.private maps from skeleton

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Add user facing __contains macro which provides a convenient wrapper
over the verbose kernel specific BTF declaration tag required to
annotate BPF list head structs in user types.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Include various tests covering the success and failure cases. Also, run
the success cases at runtime to verify correctness of linked list
manipulation routines, in addition to ensuring successful verification.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
@kernel-patches-bot
Copy link
Author

Master branch: de9c8d8
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=684982
version: 2

@kernel-patches-bot
Copy link
Author

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=684982 expired. Closing PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants