Local kptrs, BPF linked lists #3794

kernel-patches-bot · 2022-10-11T01:34:18Z

Pull request for series with
subject: Local kptrs, BPF linked lists
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=684373

kernel-patches-bot · 2022-10-11T01:34:19Z

Master branch: f6ac03e
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=684373
version: 1

Pull request is NOT updated. Failed to apply https://patchwork.kernel.org/project/netdevbpf/list/?series=684373
error message:

Cmd('git') failed due to: exit code(128)
  cmdline: git am -3
  stdout: 'Applying: bpf: Document UAPI details for special BPF types
Applying: bpf: Allow specifying volatile type modifier for kptrs
Applying: bpf: Clobber stack slot when writing over spilled PTR_TO_BTF_ID
Applying: bpf: Fix slot type check in check_stack_write_var_off
Applying: bpf: Drop reg_type_may_be_refcounted_or_null
Applying: bpf: Refactor kptr_off_tab into fields_tab
Applying: bpf: Consolidate spin_lock, timer management into fields_tab
Applying: bpf: Refactor map->off_arr handling
Applying: bpf: Support bpf_list_head in map values
Applying: bpf: Introduce local kptrs
Applying: bpf: Recognize bpf_{spin_lock,list_head,list_node} in local kptrs
Applying: bpf: Verify ownership relationships for owning types
Applying: bpf: Support locking bpf_spin_lock in local kptr
Applying: bpf: Allow locking bpf_spin_lock global variables
Applying: bpf: Rewrite kfunc argument handling
Applying: bpf: Drop kfunc bits from btf_check_func_arg_match
Applying: bpf: Support constant scalar arguments for kfuncs
Applying: bpf: Teach verifier about non-size constant arguments
Applying: bpf: Introduce bpf_kptr_new
Applying: bpf: Introduce bpf_kptr_drop
Applying: bpf: Permit NULL checking pointer with non-zero fixed offset
Applying: bpf: Introduce single ownership BPF linked list API
Applying: libbpf: Add support for private BSS map section
Applying: selftests/bpf: Add __contains macro to bpf_experimental.h
Applying: selftests/bpf: Add BPF linked list API tests
Using index info to reconstruct a base tree...
M	tools/testing/selftests/bpf/DENYLIST.s390x
Falling back to patching base and 3-way merge...
Auto-merging tools/testing/selftests/bpf/DENYLIST.s390x
CONFLICT (content): Merge conflict in tools/testing/selftests/bpf/DENYLIST.s390x
Patch failed at 0025 selftests/bpf: Add BPF linked list API tests
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".'
  stderr: 'error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch'

conflict:

diff --cc tools/testing/selftests/bpf/DENYLIST.s390x
index beef1232a47a,1a59524516df..000000000000
--- a/tools/testing/selftests/bpf/DENYLIST.s390x
+++ b/tools/testing/selftests/bpf/DENYLIST.s390x
@@@ -76,4 -76,4 +76,8 @@@ lookup_ke
  verify_pkcs7_sig                         # JIT does not support calling kernel function                                (kfunc)
  kfunc_dynptr_param                       # JIT does not support calling kernel function                                (kfunc)
  deny_namespace                           # failed to attach: ERROR: strerror_r(-524)=22                                (trampoline)
++<<<<<<< HEAD
 +libbpf_get_fd_by_id_opts                 # failed to attach: ERROR: strerror_r(-524)=22                                (trampoline)
++=======
+ linked_list				 # JIT does not support calling kernel function                                (kfunc)
++>>>>>>> selftests/bpf: Add BPF linked list API tests

kernel-patches-bot · 2022-10-13T06:36:12Z

Master branch: d31ada3
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=684982
version: 2

The kernel recognizes some special BPF types in map values or local kptrs. Document that only bpf_spin_lock and bpf_timer will preserve backwards compatibility, and kptr will preserve backwards compatibility for the operations on the pointer, not the types supported for such kptrs. For local kptrs, document that there are no stability guarantees at all. Finally, document that 'bpf_' namespace is reserved for adding future special fields, hence BPF programs must not declare types with such names in their programs and still expect backwards compatibility. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

This is useful in particular to mark the pointer as volatile, so that compiler treats each load and store to the field as a volatile access. The alternative is having to define and use READ_ONCE and WRITE_ONCE in the BPF program. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

When support was added for spilled PTR_TO_BTF_ID to be accessed by helper memory access, the stack slot was not overwritten to STACK_MISC (and that too is only safe when env->allow_ptr_leaks is true). This means that helpers who take ARG_PTR_TO_MEM and write to it may essentially overwrite the value while the verifier continues to track the slot for spilled register. This can cause issues when PTR_TO_BTF_ID is spilled to stack, and then overwritten by helper write access, which can then be passed to BPF helpers or kfuncs. Handle this by falling back to the case introduced in a later commit, which will also handle PTR_TO_BTF_ID along with other pointer types, i.e. cd17d38 ("bpf: Permits pointers on stack for helper calls"). Finally, include a comment on why REG_LIVE_WRITTEN is not being set when clobber is set to true. In short, the reason is that while when clobber is unset, we know that we won't be writing, when it is true, we *may* write to any of the stack slots in that range. It may be a partial or complete write, to just one or many stack slots. We cannot be sure, hence to be conservative, we leave things as is and never set REG_LIVE_WRITTEN for any stack slot. However, clobber still needs to reset them to STACK_MISC assuming writes happened. However read marks still need to be propagated upwards from liveness point of view, as parent stack slot's contents may still continue to matter to child states. Cc: Yonghong Song <yhs@meta.com> Fixes: 1d68f22 ("bpf: Handle spilled PTR_TO_BTF_ID properly when checking stack_boundary") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

For the case where allow_ptr_leaks is false, code is checking whether slot type is STACK_INVALID and STACK_SPILL and rejecting other cases. This is a consequence of incorrectly checking for register type instead of the slot type (NOT_INIT and SCALAR_VALUE respectively). Fix the check. Fixes: 01f810a ("bpf: Allow variable-offset stack access") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

It is not scalable to maintain a list of types that can have non-zero ref_obj_id. It is never set for scalars anyway, so just remove the conditional on register types and print it whenever it is non-zero. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

To prepare the BPF verifier to handle special fields in both map values and program allocated types coming from program BTF, we need to refactor the kptr_off_tab handling code into something more generic and reusable across both cases to avoid code duplication. Later patches also require passing this data to helpers at runtime, so that they can work on user defined types, initialize them, destruct them, etc. The main observation is that both map values and such allocated types point to a type in program BTF, hence they can be handled similarly. We can prepare a field metadata table for both cases and store them in struct bpf_map or struct btf depending on the use case. Hence, refactor the code into generic btf_type_fields and btf_field member structs. The btf_type_fields represents the fields of a specific btf_type in user BTF. The cnt indicates the number of special fields we successfully recognized, and field_mask is a bitmask of fields that were found, to enable quick determination of availability of a certain field. Subsequently, refactor the rest of the code to work with these generic types, remove assumptions about kptr and kptr_off_tab, rename variables to more meaningful names, etc. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Now that kptr_off_tab has been refactored into fields_tab, and can hold more than one specific field type, accomodate bpf_spin_lock and bpf_timer as well. While they don't require any more metadata than offset, having all special fields in one place allows us to share the same code for allocated user defined types and handle both map values and these allocated objects in a similar fashion. As an optimization, we still keep spin_lock_off and timer_off offsets in the btf_type_fields structure, just to avoid having to find the btf_field struct each time their offset is needed. This is mostly needed to manipulate such objects in a map value at runtime. It's ok to hardcode just one offset as more than one field is disallowed. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Refactor map->off_arr handling into generic functions that can work on their own without hardcoding map specific code. The btf_type_fields_off structure is now returned from btf_parse_fields_off, which can be reused later for types in program BTF. All functions like copy_map_value, zero_map_value call generic underlying functions so that they can also be reused later for copying to values allocated in programs which encode specific fields. Later, some helper functions will also require access to this off_arr structure to be able to skip over special fields at runtime. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Add the basic support on the map side to parse, recognize, verify, and build metadata table for a new special field of the type struct bpf_list_head. To parameterize the bpf_list_head for a certain value type and the list_node member it will accept in that value type, we use BTF declaration tags. The definition of bpf_list_head in a map value will be done as follows: struct foo { struct bpf_list_node node; int data; }; struct map_value { struct bpf_list_head head __contains(foo, node); }; Then, the bpf_list_head only allows adding to the list 'head' using the bpf_list_node 'node' for the type struct foo. The 'contains' annotation is a BTF declaration tag composed of four parts, "contains:kind:name:node" where the kind and name is then used to look up the type in the map BTF. The node defines name of the member in this type that has the type struct bpf_list_node, which is actually used for linking into the linked list. For now, 'kind' part is hardcoded as struct. This allows building intrusive linked lists in BPF, using container_of to obtain pointer to entry, while being completely type safe from the perspective of the verifier. The verifier knows exactly the type of the nodes, and knows that list helpers return that type at some fixed offset where the bpf_list_node member used for this list exists. The verifier also uses this information to disallow adding types that are not accepted by a certain list. For now, no elements can be added to such lists. Support for that is coming in future patches, hence draining and freeing items is done with a TODO that will be resolved in a future patch. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Introduce the idea of local kptrs, i.e. PTR_TO_BTF_ID that point to a type in program BTF. This is indicated by the presence of MEM_TYPE_LOCAL type tag in reg->type to avoid having to check btf_is_kernel when trying to match argument types in helpers. For now, these local kptrs will always be referenced in verifier context, hence ref_obj_id == 0 for them is a bug. It is allowed to write to such objects, as long fields that are special are not touched (support for which will be added in subsequent patches). No PROBE_MEM handling is hence done since they can never be in an undefined state, and their lifetime will always be valid. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Allow specifying bpf_spin_lock, bpf_list_head, bpf_list_node fields in a local kptr. A bpf_list_head allows implementing map-in-map style use cases, where local kptr with bpf_list_head is linked into a list in a map value. This would require embedding a bpf_list_node, support for which is also included. Lastly, while we strictly don't require to hold a bpf_spin_lock while manipulating the bpf_list_head of a local kptr, as when have access to it, we have complete ownership of the object, the locking constraint is still kept and may be conditionally lifted in the future. Note that the specification of such types can be done just like map values, e.g.: struct bar { struct bpf_list_node node; }; struct foo { struct bpf_spin_lock lock; struct bpf_list_head head __contains(bar, node); struct bpf_list_node node; }; struct map_value { struct bpf_spin_lock lock; struct bpf_list_head head __contains(foo, node); }; To recognize such types in user BTF, we build a btf_struct_metas array of metadata items corresponding to each BTF ID. This is done once during the btf_parse stage to avoid having to do it each time during the verification process's requirement to inspect the metadata. Moreover, the computed metadata needs to be passed to some helpers in future patches which requires allocating them and storing them in the BTF that is pinned by the program itself, so that valid access can be assumed to such data during program runtime. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Ensure that there can be no ownership cycles among different types by way of having owning objects that can hold some other type as their element. For instance, a map value can only hold local kptrs, but these are allowed to have another bpf_list_head. To prevent unbounded recursion while freeing resources, elements of bpf_list_head in local kptrs can never have a bpf_list_head which are part of list in a map value. Also, to make runtime destruction easier, once btf_struct_metas is fully populated, we can stash the metadata of the value type directly in the metadata of the list_head fields, as that allows easier access to the value type's layout to destruct it at runtime from the btf_field entry of the list head itself. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Allow locking a bpf_spin_lock embedded in local kptr, in addition to already support map value pointers. The handling is similar to that of map values, by just preserving the reg->id of local kptrs as well, and adjusting process_spin_lock to work with non-PTR_TO_MAP_VALUE and remember the id in verifier state. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Global variables reside in maps accessible using direct_value_addr callbacks, so giving each load instruction's rewrite a unique reg->id disallows us from holding locks which are global. This is not great, so refactor the active_spin_lock into two separate fields, active_spin_lock_ptr and active_spin_lock_id, which is generic enough to allow it for global variables, map lookups, and local kptr registers at the same time. Held vs non-held is indicated by active_spin_lock_ptr, which stores the reg->map_ptr or reg->btf pointer of the register used for locking spin lock. But the active_spin_lock_id also needs to be compared to ensure whether bpf_spin_unlock is for the same register. Next, pseudo load instructions are not given a unique reg->id, as they are doing lookup for the same map value (max_entries is never greater than 1). Essentially, we consider that the tuple of (active_spin_lock_ptr, active_spin_lock_id) will always be unique for any kind of argument to bpf_spin_{lock,unlock}. Note that this can be extended in the future to also remember offset used for locking, so that we can introduce multiple bpf_spin_lock fields in the same allocation. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

As we continue to add more features, argument types, kfunc flags, and different extensions to kfuncs, the code to verify the correctness of the kfunc prototype wrt the passed in registers has become ad-hoc and ugly to read. To make life easier, and make a very clear split between different stages of argument processing, move all the code into verifier.c and refactor into easier to read helpers and functions. This also makes sharing code within the verifier easier with kfunc argument processing. This will be more and more useful in later patches as we are now moving to implement very core BPF helpers as kfuncs, to keep them experimental before baking into UAPI. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Remove all kfunc related bits now from btf_check_func_arg_match, as users have been converted away to refactored kfunc argument handling. This is split into a separate commit to aid review, in order to compare what has been preserved from the removed bits easily instead of mixing removed hunks with previous patch. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Allow passing known constant scalars as arguments to kfuncs that do not represent a size parameter. This makes the search pruning optimization of verifier more conservative for such kfunc calls, and each non-distinct argument is considered unequivalent. We will use this support to then expose a global bpf_kptr_alloc function where it takes the local type ID in program BTF, and returns a PTR_TO_BTF_ID to the local type. These will be called local kptrs, and allows programs to allocate their own objects. However, this is still not completely safe, as mark_chain_precision logic is buggy without more work when the constant argument is not a size, but still needs precise marker propagation for pruning checks. Next patch will fix this problem. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Currently, the verifier has support for various arguments that either describe the size of the memory being passed in to a helper, or describe the size of the memory being returned. When a constant is passed in like this, it is assumed for the purposes of precision tracking that if the value in the already explored safe state is within the value in current state, it would fine to prune the search. While this holds well for size arguments, arguments where each value may denote a distinct meaning and needs to be verified separately needs more work. Search can only be pruned if both are constant values and both are equal. In all other cases, it would be incorrect to treat those two precise registers as equivalent if the new value satisfies the old one (i.e. old <= cur). Hence, make the register precision marker tri-state. There are now three values that reg->precise takes: NOT_PRECISE, PRECISE, EXACT. Both PRECISE and EXACT are 'true' values. EXACT affects how regsafe decides whether both registers are equivalent for the purposes of verifier state equivalence. When it sees that one register has reg->precise == EXACT, unless both are absolute, it will return false. When both are, it returns true only when both are const and both have the same value. Otherwise, for PRECISE case it falls back to the default check that is present now (i.e. thinking that we're talking about sizes). This is required as a future patch introduces a BPF memory allocator interface, where we take the program BTF's type ID as an argument. Each distinct type ID may result in the returned pointer obtaining a different size, hence precision tracking is needed, and pruning cannot just happen when the old value is within the current value. It must only happen when the type ID is equal. The type ID will always correspond to prog->aux->btf hence actual type match is not required. Finally, change mark_chain_precision precision argument to EXACT for kfuncs constant non-size scalar arguments (tagged with __k suffix). Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Introduce type safe memory allocator bpf_kptr_new for BPF programs. The kernel side kfunc is named bpf_kptr_new_impl, as passing hidden arguments to kfuncs still requires having them in prototype, unlike BPF helpers which always take 5 arguments and have them checked using bpf_func_proto in verifier, ignoring unset argument types. Introduce __ign suffix to ignore a specific kfunc argument during type checks, then use this to introduce support for passing type metadata to the bpf_kptr_new_impl kfunc. The user passes BTF ID of the type it wants to allocates in program BTF, the verifier then rewrites the first argument as the size of this type, after performing some sanity checks (to ensure it exists and it is a struct type). The second argument flags is reserved to be 0 for now. The third argument is also fixed up and passed by the verifier. This is the btf_struct_meta for the type being allocated. It would be needed mostly for the offset array which is required for zero initializing special fields while leaving the rest of storage in unitialized state. It would also be needed in the next patch to perform proper destruction of the object's special fields. A convenience macro is included in the bpf_experimental.h header to hide over the ugly details of the implementation, leading to user code looking similar to a language level extension which allocates and constructs fields of a user type. struct bar { struct bpf_list_node node; }; struct foo { struct bpf_spin_lock lock; struct bpf_list_head head __contains(bar, node); }; void prog(void) { struct foo *f; f = bpf_kptr_new(typeof(*f)); if (!f) return; ... } A key piece of this story is still missing, i.e. the free function, which will come in the next patch. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Introduce bpf_kptr_drop, which is the kfunc used to free local kptrs allocated using bpf_kptr_new. Similar to bpf_kptr_new, it implicitly destructs the fields part of the local kptr automatically without user intervention. Just like the previous patch, btf_struct_meta that is needed to free up the special fields is passed as a hidden argument to the kfunc. For the user, a convenience macro hides over the kernel side kfunc which is named bpf_kptr_drop_impl. Continuing the previous example: void prog(void) { struct foo *f; f = bpf_kptr_new(typeof(*f)); if (!f) return; bpf_kptr_drop(f); } Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Pointer increment on seeing PTR_MAYBE_NULL is already protected against, hence make an exception for local kptrs while still keeping the warning for other unintended cases that might creep in. bpf_list_del{,tail} helpers return a local kptr with incremented offset pointing to bpf_list_node field. The user is supposed to then obtain the pointer to the entry using container_of after NULL checking it. The current restrictions trigger a warning when doing the NULL checking. Revisiting the reason, it is meant as an assertion which seems to actually work and catch the bad case. Hence, under no other circumstances can reg->off be non-zero for a register that has the PTR_MAYBE_NULL type flag set. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Add a linked list API for use in BPF programs, where it expects protection from the bpf_spin_lock in the same allocation as the bpf_list_head. Future patches will extend the same infrastructure to have different flavors with varying protection domains and visibility (e.g. percpu variant with local_t protection, usable in NMI progs). The following functions are added to kick things off: bpf_list_add bpf_list_add_tail bpf_list_del bpf_list_del_tail The lock protecting the bpf_list_head needs to be taken for all operations. Once a node has been added to the list, it's pointer changes to PTR_UNTRUSTED. However, it is only released once the lock protecting the list is unlocked. For such local kptrs with PTR_UNTRUSTED set but an active ref_obj_id, it is still permitted to read and write to them as long as the lock is held. bpf_list_del and bpf_list_del_tail delete the first or last item of the list respectively, and return pointer to the element at the list_node offset. The user can then use container_of style macro to get the actual entry type. The verifier however statically knows the actual type, so the safety properties are still preserved. With these additions, programs can now manage their own linked lists and store their objects in them. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Currently libbpf does not allow declaration of a struct bpf_spin_lock in global scope. Attempting to do so results in "failed to re-mmap" error, as .bss arraymap containing spinlock is not allowed to be mmap'd. This patch adds support for a .bss.private section. The maps contained in this section will not be mmaped into userspace by libbpf, nor will they be exposed via bpftool-generated skeleton. Intent here is to allow more natural programming pattern for global-scope spinlocks which will be used by rbtree locking mechanism in further patches in this series. Notes: * Initially I called the section .bss.no_mmap, but the broader 'private' term better indicates that skeleton shouldn't expose these maps at all, IMO. * bpftool/gen.c's is_internal_mmapable_map function checks whether the map flags have BPF_F_MMAPABLE, so no bpftool changes were necessary to remove .bss.private maps from skeleton Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Add user facing __contains macro which provides a convenient wrapper over the verbose kernel specific BTF declaration tag required to annotate BPF list head structs in user types. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Include various tests covering the success and failure cases. Also, run the success cases at runtime to verify correctness of linked list manipulation routines, in addition to ensuring successful verification. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

kernel-patches-bot · 2022-10-13T15:14:11Z

Master branch: de9c8d8
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=684982
version: 2

kernel-patches-bot · 2022-10-13T15:35:42Z

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=684982 expired. Closing PR.

kernel-patches-bot added bpf-next merge-conflict new V1 labels Oct 11, 2022

kernel-patches-bot added V2 and removed merge-conflict V1 labels Oct 13, 2022

kernel-patches-bot force-pushed the series/673909=>bpf-next branch from bf69e3d to 3460ac5 Compare October 13, 2022 06:36

Kernel Patches Daemon and others added 20 commits October 13, 2022 08:14

adding ci files

9b42970

kkdwivedi and others added 6 commits October 13, 2022 08:14

kernel-patches-bot force-pushed the series/673909=>bpf-next branch from 3460ac5 to e46a503 Compare October 13, 2022 15:14

kernel-patches-bot added changes-requested and removed new labels Oct 13, 2022

kernel-patches-bot closed this Oct 13, 2022

kernel-patches-bot deleted the series/673909=>bpf-next branch October 15, 2022 22:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local kptrs, BPF linked lists #3794

Local kptrs, BPF linked lists #3794

kernel-patches-bot commented Oct 11, 2022

kernel-patches-bot commented Oct 11, 2022

kernel-patches-bot commented Oct 13, 2022

kernel-patches-bot commented Oct 13, 2022

kernel-patches-bot commented Oct 13, 2022

Local kptrs, BPF linked lists #3794

Local kptrs, BPF linked lists #3794

Conversation

kernel-patches-bot commented Oct 11, 2022

kernel-patches-bot commented Oct 11, 2022

kernel-patches-bot commented Oct 13, 2022

kernel-patches-bot commented Oct 13, 2022

kernel-patches-bot commented Oct 13, 2022