Exceptions - Resource Cleanup #6370

kernel-patches-daemon-bpf · 2024-02-01T04:27:41Z

Pull request for series with
subject: Exceptions - Resource Cleanup
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=821956

kernel-patches-daemon-bpf · 2024-02-01T04:27:42Z

Upstream branch: 77326a4
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=821956
version: 1

The motivation of this patch is to figure out which subprogs participate in exception propagation. In other words, whichever subprog's execution can lead to an exception being thrown either directly or indirectly (by way of calling other subprogs). With the current exceptions support, the runtime performs stack unwinding when bpf_throw is called. For now, any resources acquired by the program cannot be released, therefore bpf_throw calls made with non-zero acquired references must be rejected during verification. However, there currently exists a loophole in this restriction due to the way the verification procedure is structured. The verifier will first walk over the main subprog's instructions, but not descend into subprog calls to ones with global linkage. These global subprogs will then be independently verified instead. Therefore, in a situation where a global subprog ends up throwing an exception (either directly by calling bpf_throw, or indirectly by way of calling another subprog that does so), the verifier will fail to notice this fact and may permit throwing BPF exceptions with non-zero acquired references. Therefore, to fix this, we add a summarization pass before the do_check stage which walks all call chains of the program and marks all of the subprogs that are reachable from a bpf_throw call which unwinds the program stack. We only do so if we actually see a bpf_throw call in the program though, since we do not want to walk all instructions unless we need to. One we analyze all possible call chains of the program, we will be able to mark them as 'is_throw_reachable' in their subprog_info. After performing this step, we need to make another change as to how subprog call verification occurs. In case of global subprog, we will need to explore an alternate program path where the call instruction processing of a global subprog's call will immediately throw an exception. We will thus simulate a normal path without any exceptions, and one where the exception is thrown and the program proceeds no further. In this way, the verifier will be able to detect the whether any acquired references or locks exist in the verifier state and thus reject the program if needed. Fixes: f18b03f ("bpf: Implement BPF exceptions") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Global subprogs are not descended during symbolic execution, but we summarized whether they can throw an exception (reachable from another exception throwing subprog) in mark_exception_reachable_subprogs added by the previous patch. We must now ensure that we explore the path of the program where invoking the call instruction leads to an exception being thrown, so that we can correctly reject programs where it is not permissible to throw an exception. For instance, it might be permissible to throw from a global subprog, but its caller may hold references. Without this patch, the verifier will accept such programs. To do this, we use push_stack to push a separate branch into the branch stack of the verifier, with the same current and previous insn_idx. Then, we set a bit in the verifier state of the branch to indicate that the next instruction it will process is of a global subprog call which will throw an exception. When we encounter this instruction, this bit will be cleared. Special care must be taken to update the state pruning logic, as without any changes, it is possible that we end up pruning when popping the exception throwing state for exploration. Therefore, while we can never have the 'global_subprog_call_exception' bit set in the verifier state of an explored state, we will see it in the current state, and use this to reject pruning requests and continue its exploration. Note that we process the exception after processing the call instruction, similar to how we do a process_bpf_exit_full jump in case of bpf_throw kfuncs. Fixes: f18b03f ("bpf: Implement BPF exceptions") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Add a test case to exercise verifier logic where a global function that may potentially throw an exception is invoked from the main subprog, such that during exploration, the reference state is not visible when the bpf_throw instruction is explored. Without the fixes in prior commits, bpf_throw will not complain when unreleased resources are lingering in the program when a possible exception may be thrown. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Refactor check_pseudo_btf_id's code which adds a new BTF reference to the used_btfs into a separate helper function called add_used_btfs. This will be later useful in exception frame generation to take BTF references with their modules, so that we can keep the modules alive whose functions may be required to unwind a given BPF program when it eventually throws an exception. While typically module references should already be held in such a case, since the program will have used a kfunc to acquire a reference that it did not clean up before throwing an exception, there are corner cases where this may not be true (e.g. one program producing the object, and another simply using bpf_kptr_xchg, and not having a kfunc call into the module). Therefore, it is more prudent to simply bump the reference whenever we encounter such cases for exception frame generation. The behaviour of add_used_btfs is to take an input BTF object with its reference count already raised, and the consume the reference count in case of successful insertion. In case of an error, the caller is responsible for releasing the reference. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Introduce support to the verifier to generate a set of descriptors for each BPF frame to describe the register and stack state, which can be used to reason about the resources acquired by the program at that particular program point, and subsequent patches can introduce support to unwind a given program when it throws an exception while holding ownership of such resources. Descriptors generated for each frame are then tied back to the subprog they belong to, and attached to the bpf_prog instance during the JIT phase, with the program counter serving as a key to find the descriptor of interest for a given subprog. Logically, during the unwinding phase, for each frame, we will use the program counter and bpf_prog object to figure out how we should release acquired resources if any in the frame. Let's study how the frame descriptor generation algorithm works. Whenever an exception throwing instruction is encountered, thus global subprog calls which are throw reachable, and bpf_throw kfunc, we call gen_exception_frame_descs. This function will start with the current frame, and explore the registers and other objects on the current stack. We consider 8-byte granularity as all registers spilled on the stack and objects like dynptr and iter are 8-byte aligned. For each such stack entry, we inspect the slot_type and figure out whether it is a spilled register or a dynptr/iter object. For any acquired resources on the stack, and insert entries representing them into a frame descriptor table for the current subprog at the current instruction index. The same steps are repeated for registers that are callee saved, as these would be possibly spilled on stack of one of the frames in the call chain and would have to be located in order to be freed. In case of registers (spilled or callee saved), we make a special provision for register_is_null scalar values, to increase the chances of merging frame descriptors where the only divergence is NULL in one state being replaced with a valid pointer in another. The next important step is the logic to merge the frame descriptors. It is possible that the verifier reaches the same instruction index in a program from multiple paths, and has to generate frame descriptors for them at that program counter. In such a case, we always ensure that after generating the frame descriptor, we attempt to "merge" it with an existing one. The merging rules are fairly simple except for a few caveats. First, if the layout and type of objects on the stack and in registers is the same, we have a successful merge. Next, in case of registers (spilled or callee saved), we have a special where if the old entry has NULL, the new type (non-NULL) replaces it, and if the new entry has NULL, it satisfies the merge rules with the old entry (can be of any type). This helps in cases where we have an optional value held in a register or stack slot in one program path, which is replaced by the actual value in the other program path. This can also be the case in case of conditionals, where the verifier may see acquired references in verifier state depending on if a condition is true (therefore, not in all of the program paths traversing the same instruction). To illustrate with an example, in the following program: struct foo *p = NULL; if (x) p = bpf_obj_new(typeof(*p)); if (y) bpf_throw(0); if (p) bpf_obj_drop(p); In such a case, bpf_throw may be reached for x == 0, y == 1 and x == 1, y == 1, with two possible values of p. As long as both can be passed into the release function (i.e. NULL or a valid pointer value), we can satisfy the merge. TODO: We need to reserve a slot for STACK_ZERO as well. TODO: Improve the error message in case we have pointer and misc instead of zero. Currently, we only consider resources which are modelled as acquired references in verifier state. In particular, this excludes resources like held spinlocks and RCU read sections. For now, both of these will not be handled, and the verifier will continue to complain when exceptions are thrown in their presence. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

When instruction patching (addition or removal) occurs, the fdtab attached to each subprog, and the program counter in its descriptors will be out of sync wrt relative position in the program. To fix this, we need to adjust the pc, free any unneeded fdtab and descriptors, and ensure the entries correspond to the correct instruction offsets. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

When we perform a bpf_throw kfunc call, callee saved registers in BPF calling convention (R6-R9) may end up getting saved and clobbered by bpf_throw. Typically, the kernel will restore the registers before returning back to the BPF program, but in case of bpf_throw, the function will never return. Therefore, any acquired resources sitting in these registers will end up getting destroyed if not saved on the stack, without any cleanup happening for them. Also, when in a BPF call chain, caller frames may have held acquired resources in R6-R9 and called their subprogs, which may have spilled these on their stack frame to reuse these registers before entering the bpf_throw kfunc. Thus, we also need to locate and free these callee saved registers for each frame. It is thus necessary to save these registers somewhere before we call into the bpf_throw kfunc. Instead of adding spills everywhere bpf_throw is called, we can use a new hidden subprog that saves R6-R9 on the stack and then calls into bpf_throw. This way, all of the bpf_throw call sites can be turned into call instructions for this subprog, and the hidden subprog in turn will save the callee-saved registers before calling into the bpf_throw kfunc. In this way, when unwinding the stack, we can locate the callee saved registers on the hidden subprog stack frame and perform their cleanup. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

During runtime unwinding and cleanup, we will need to figure out where the callee saved registers are stored on the stack, so that when a bpf_thtrow call is made, all frames can release their callee saved registers by finding their saved copies on the stack of callee frames. While the previous patch ensured any BPF callee saved registers are saved on a hidden subprog stack frame before entry into kernel (where we would not know their location if spilled), there are cases where a subprog's R6-R9 are not spilled into its immediate callee stack frame, but much later in the call chain in some later callee stack frame. As such, we would need to figure out while walking down the stack which frames have spilled their incoming callee saved regs, and thus keep track of where the latest spill would have happened with respect to a given frame in the stack trace. To perform this, we would need to know which callee saved registers are saved by a given subprog at runtime during the unwinding phase. Right now, there is a convenient way the x86 JIT figures this out in detect_reg_usage. Utilize such logic in verifier core, and copy this information to bpf_prog_aux struct before the JIT step to preserve this information at runtime, through bpf_prog_aux. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Until now, the program counter value stored in frame descriptor entries was the instruction index of the BPF program's insn and callsites when going down the frames in a call chain. However, at runtime, the program counter will be the pointer to the next instruction, and thus needs to be computed in a position independent way to tally it at runtime to find the frame descriptor when unwinding. To do this, we first convert the global instruction index into an instruction index relative to the start of a subprog, and add 1 to it (to reflect that at runtime, the program counter points to the next instruction). Then, we modify the JIT (for now, x86) to convert them to instruction offsets relative to the start of the JIT image, which is the prog->bpf_func of the subprog in question at runtime. Later, subtracting the prog->bpf_func pointer from runtime program counter will yield the same offset, and allow us to figure out the corresponding frame descriptor entry. Note that we have to mark a frame descriptor entry as 'final' because bpf_int_jit_compile can be called multiple times, and we would try to convert our already converted pc values again, therefore once we do the conversion remember it and do not repeat it. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Finally, tie all ends together and implement functionality to process a frame descriptor at runtime for each frame when bpf_throw is called, and release resources present on the program's stack frames. For each frame, we do bpf_cleanup_frame_resource, which will use the instruction pointer at runtime to figure out the right frame descriptor entry. After this, we explore all stack and registers and call their respective cleanup procedures. Next, if the frame corresponds to a subprog, we all save the location of where it has spilled its callers R6-R9 registers. If so, we record their value in the unwinding context. Only doing this when each frame has scratched the register in question allows us to arrive at the set of values actually needed during the freeing step for registers, regardless of how many callees existed and the varying locations of spilled callee saved registers. These registers can also lie across different frames, but will collected top down when arriving at a frame. Finally, after doing the cleanup, we go on to execute the exception callback and finish unwinding the stack. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Reflect in the verifier state that references would be released whenever we throw a BPF exception. Now that we support generating frame descriptors, and performing the runtime cleanup, whenever processing an entry corresponding to an acquired reference, make sure we release its reference state. Note that we only release this state for the current frame, as the acquired refs are only checked against that when processing an exceptional exit. This would ensure that for acquired resources apart from locks and RCU read sections, BPF programs never fail in case of lingering resources during verification. While at it, we can tweak check_reference_leak to drop the exception_exit parameter, and fix selftests that will fail due to the changed behaviour. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

kernel-patches-daemon-bpf · 2024-02-01T04:45:28Z

Upstream branch: b3d3e29
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=821956
version: 1

Reuse exist BTF dtor infrastructure to also include dtor kfuncs that can be used to release PTR_TO_BTF_ID pointers and other BTF objects (iterators). For this purpose, we extend btf_id_dtor_kfunc object with a flags field, and ensure that entries that cannot work as kptrs are not allowed to be embedded in map values. Prior to this change, btf_id_dtor_kfunc served a dual role of allow list of kptrs and finding their dtors. To separate this role, we must now explicitly pass only BPF_DTOR_KPTR to ensure we don't look up other cleanup kfuncs in the dtor table. Finally, set up iterator and other objects that can be acquired to be released by adding their cleanup kfunc dtor entries and registering them with the BTF. Cc: Jiri Kosina <jikos@kernel.org> Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Move bpf_throw kfunc to common_kfunc_set so that any program type can utilize it to throw exceptions. This will also be useful to test a wider variety of programs to test the cleanup logic properly. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Add tests for the runtime cleanup support for exceptions, ensuring that resources are correctly identified and released when an exception is thrown. Also, we add negative tests to exercise corner cases the verifier should reject. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

kernel-patches-daemon-bpf · 2024-02-01T05:26:39Z

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=821956 expired. Closing PR.

kernel-patches-daemon-bpf bot added rfc bpf-next V1 labels Feb 1, 2024

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from e93b026 to 2f10dba Compare February 1, 2024 04:43

kkdwivedi added 11 commits January 31, 2024 20:45

kkdwivedi added 3 commits January 31, 2024 20:45

kernel-patches-daemon-bpf bot force-pushed the series/821956=>bpf-next branch from 03f5f4b to e8d7732 Compare February 1, 2024 04:45

kernel-patches-daemon-bpf bot closed this Feb 1, 2024

kernel-patches-daemon-bpf bot deleted the series/821956=>bpf-next branch February 3, 2024 13:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exceptions - Resource Cleanup #6370

Exceptions - Resource Cleanup #6370

kernel-patches-daemon-bpf bot commented Feb 1, 2024

kernel-patches-daemon-bpf bot commented Feb 1, 2024

kernel-patches-daemon-bpf bot commented Feb 1, 2024

kernel-patches-daemon-bpf bot commented Feb 1, 2024

Exceptions - Resource Cleanup #6370

Exceptions - Resource Cleanup #6370

Conversation

kernel-patches-daemon-bpf bot commented Feb 1, 2024

kernel-patches-daemon-bpf bot commented Feb 1, 2024

kernel-patches-daemon-bpf bot commented Feb 1, 2024

kernel-patches-daemon-bpf bot commented Feb 1, 2024