Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exceptions - Resource Cleanup #6370

Closed

Conversation

kernel-patches-daemon-bpf[bot]
Copy link

Pull request for series with
subject: Exceptions - Resource Cleanup
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=821956

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 77326a4
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=821956
version: 1

The motivation of this patch is to figure out which subprogs participate
in exception propagation. In other words, whichever subprog's execution
can lead to an exception being thrown either directly or indirectly (by
way of calling other subprogs).

With the current exceptions support, the runtime performs stack
unwinding when bpf_throw is called. For now, any resources acquired by
the program cannot be released, therefore bpf_throw calls made with
non-zero acquired references must be rejected during verification.

However, there currently exists a loophole in this restriction due to
the way the verification procedure is structured. The verifier will
first walk over the main subprog's instructions, but not descend into
subprog calls to ones with global linkage. These global subprogs will
then be independently verified instead. Therefore, in a situation where
a global subprog ends up throwing an exception (either directly by
calling bpf_throw, or indirectly by way of calling another subprog that
does so), the verifier will fail to notice this fact and may permit
throwing BPF exceptions with non-zero acquired references.

Therefore, to fix this, we add a summarization pass before the do_check
stage which walks all call chains of the program and marks all of the
subprogs that are reachable from a bpf_throw call which unwinds the
program stack.

We only do so if we actually see a bpf_throw call in the program though,
since we do not want to walk all instructions unless we need to.  One we
analyze all possible call chains of the program, we will be able to mark
them as 'is_throw_reachable' in their subprog_info.

After performing this step, we need to make another change as to how
subprog call verification occurs. In case of global subprog, we will
need to explore an alternate program path where the call instruction
processing of a global subprog's call will immediately throw an
exception. We will thus simulate a normal path without any exceptions,
and one where the exception is thrown and the program proceeds no
further. In this way, the verifier will be able to detect the whether
any acquired references or locks exist in the verifier state and thus
reject the program if needed.

Fixes: f18b03f ("bpf: Implement BPF exceptions")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Global subprogs are not descended during symbolic execution, but we
summarized whether they can throw an exception (reachable from another
exception throwing subprog) in mark_exception_reachable_subprogs added
by the previous patch.

We must now ensure that we explore the path of the program where
invoking the call instruction leads to an exception being thrown, so
that we can correctly reject programs where it is not permissible to
throw an exception.  For instance, it might be permissible to throw from
a global subprog, but its caller may hold references. Without this
patch, the verifier will accept such programs.

To do this, we use push_stack to push a separate branch into the branch
stack of the verifier, with the same current and previous insn_idx.
Then, we set a bit in the verifier state of the branch to indicate that
the next instruction it will process is of a global subprog call which
will throw an exception. When we encounter this instruction, this bit
will be cleared.

Special care must be taken to update the state pruning logic, as without
any changes, it is possible that we end up pruning when popping the
exception throwing state for exploration. Therefore, while we can never
have the 'global_subprog_call_exception' bit set in the verifier state
of an explored state, we will see it in the current state, and use this
to reject pruning requests and continue its exploration.

Note that we process the exception after processing the call
instruction, similar to how we do a process_bpf_exit_full jump in case
of bpf_throw kfuncs.

Fixes: f18b03f ("bpf: Implement BPF exceptions")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Add a test case to exercise verifier logic where a global function that
may potentially throw an exception is invoked from the main subprog,
such that during exploration, the reference state is not visible when
the bpf_throw instruction is explored. Without the fixes in prior
commits, bpf_throw will not complain when unreleased resources are
lingering in the program when a possible exception may be thrown.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Refactor check_pseudo_btf_id's code which adds a new BTF reference to
the used_btfs into a separate helper function called add_used_btfs. This
will be later useful in exception frame generation to take BTF
references with their modules, so that we can keep the modules alive
whose functions may be required to unwind a given BPF program when it
eventually throws an exception.

While typically module references should already be held in such a case,
since the program will have used a kfunc to acquire a reference that it
did not clean up before throwing an exception, there are corner cases
where this may not be true (e.g. one program producing the object, and
another simply using bpf_kptr_xchg, and not having a kfunc call into the
module). Therefore, it is more prudent to simply bump the reference
whenever we encounter such cases for exception frame generation.

The behaviour of add_used_btfs is to take an input BTF object with its
reference count already raised, and the consume the reference count in
case of successful insertion. In case of an error, the caller is
responsible for releasing the reference.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Introduce support to the verifier to generate a set of descriptors for
each BPF frame to describe the register and stack state, which can be
used to reason about the resources acquired by the program at that
particular program point, and subsequent patches can introduce support
to unwind a given program when it throws an exception while holding
ownership of such resources.

Descriptors generated for each frame are then tied back to the subprog
they belong to, and attached to the bpf_prog instance during the JIT
phase, with the program counter serving as a key to find the descriptor
of interest for a given subprog.

Logically, during the unwinding phase, for each frame, we will use the
program counter and bpf_prog object to figure out how we should release
acquired resources if any in the frame.

Let's study how the frame descriptor generation algorithm works.
Whenever an exception throwing instruction is encountered, thus global
subprog calls which are throw reachable, and bpf_throw kfunc, we call
gen_exception_frame_descs.

This function will start with the current frame, and explore the
registers and other objects on the current stack. We consider 8-byte
granularity as all registers spilled on the stack and objects like
dynptr and iter are 8-byte aligned. For each such stack entry, we
inspect the slot_type and figure out whether it is a spilled register or
a dynptr/iter object.

For any acquired resources on the stack, and insert entries representing
them into a frame descriptor table for the current subprog at the
current instruction index.

The same steps are repeated for registers that are callee saved, as
these would be possibly spilled on stack of one of the frames in the
call chain and would have to be located in order to be freed.

In case of registers (spilled or callee saved), we make a special
provision for register_is_null scalar values, to increase the chances of
merging frame descriptors where the only divergence is NULL in one state
being replaced with a valid pointer in another.

The next important step is the logic to merge the frame descriptors. It
is possible that the verifier reaches the same instruction index in a
program from multiple paths, and has to generate frame descriptors for
them at that program counter. In such a case, we always ensure that
after generating the frame descriptor, we attempt to "merge" it with an
existing one.

The merging rules are fairly simple except for a few caveats. First, if
the layout and type of objects on the stack and in registers is the
same, we have a successful merge. Next, in case of registers (spilled or
callee saved), we have a special where if the old entry has NULL, the
new type (non-NULL) replaces it, and if the new entry has NULL, it
satisfies the merge rules with the old entry (can be of any type).

This helps in cases where we have an optional value held in a register
or stack slot in one program path, which is replaced by the actual value
in the other program path. This can also be the case in case of
conditionals, where the verifier may see acquired references in verifier
state depending on if a condition is true (therefore, not in all of the
program paths traversing the same instruction).

To illustrate with an example, in the following program:

struct foo *p = NULL;
if (x)
	p = bpf_obj_new(typeof(*p));
if (y)
	bpf_throw(0);
if (p)
	bpf_obj_drop(p);

In such a case, bpf_throw may be reached for x == 0, y == 1 and x == 1,
y == 1, with two possible values of p. As long as both can be passed
into the release function (i.e. NULL or a valid pointer value), we can
satisfy the merge.

TODO: We need to reserve a slot for STACK_ZERO as well.
TODO: Improve the error message in case we have pointer and misc instead of zero.

Currently, we only consider resources which are modelled as acquired
references in verifier state. In particular, this excludes resources
like held spinlocks and RCU read sections. For now, both of these will
not be handled, and the verifier will continue to complain when
exceptions are thrown in their presence.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
When instruction patching (addition or removal) occurs, the fdtab
attached to each subprog, and the program counter in its descriptors
will be out of sync wrt relative position in the program. To fix this,
we need to adjust the pc, free any unneeded fdtab and descriptors, and
ensure the entries correspond to the correct instruction offsets.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
When we perform a bpf_throw kfunc call, callee saved registers in BPF
calling convention (R6-R9) may end up getting saved and clobbered by
bpf_throw. Typically, the kernel will restore the registers before
returning back to the BPF program, but in case of bpf_throw, the
function will never return. Therefore, any acquired resources sitting in
these registers will end up getting destroyed if not saved on the
stack, without any cleanup happening for them.

Also, when in a BPF call chain, caller frames may have held acquired
resources in R6-R9 and called their subprogs, which may have spilled
these on their stack frame to reuse these registers before entering the
bpf_throw kfunc. Thus, we also need to locate and free these callee
saved registers for each frame.

It is thus necessary to save these registers somewhere before we call
into the bpf_throw kfunc. Instead of adding spills everywhere bpf_throw
is called, we can use a new hidden subprog that saves R6-R9 on the stack
and then calls into bpf_throw. This way, all of the bpf_throw call sites
can be turned into call instructions for this subprog, and the hidden
subprog in turn will save the callee-saved registers before calling into
the bpf_throw kfunc.

In this way, when unwinding the stack, we can locate the callee saved
registers on the hidden subprog stack frame and perform their cleanup.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
During runtime unwinding and cleanup, we will need to figure out where
the callee saved registers are stored on the stack, so that when a
bpf_thtrow call is made, all frames can release their callee saved
registers by finding their saved copies on the stack of callee frames.

While the previous patch ensured any BPF callee saved registers are
saved on a hidden subprog stack frame before entry into kernel (where we
would not know their location if spilled), there are cases where a
subprog's R6-R9 are not spilled into its immediate callee stack frame,
but much later in the call chain in some later callee stack frame. As
such, we would need to figure out while walking down the stack which
frames have spilled their incoming callee saved regs, and thus keep
track of where the latest spill would have happened with respect to a
given frame in the stack trace.

To perform this, we would need to know which callee saved registers are
saved by a given subprog at runtime during the unwinding phase. Right
now, there is a convenient way the x86 JIT figures this out in
detect_reg_usage. Utilize such logic in verifier core, and copy this
information to bpf_prog_aux struct before the JIT step to preserve this
information at runtime, through bpf_prog_aux.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Until now, the program counter value stored in frame descriptor entries
was the instruction index of the BPF program's insn and callsites when
going down the frames in a call chain. However, at runtime, the program
counter will be the pointer to the next instruction, and thus needs to
be computed in a position independent way to tally it at runtime to find
the frame descriptor when unwinding.

To do this, we first convert the global instruction index into an
instruction index relative to the start of a subprog, and add 1 to it
(to reflect that at runtime, the program counter points to the next
instruction). Then, we modify the JIT (for now, x86) to convert them
to instruction offsets relative to the start of the JIT image, which is
the prog->bpf_func of the subprog in question at runtime.

Later, subtracting the prog->bpf_func pointer from runtime program
counter will yield the same offset, and allow us to figure out the
corresponding frame descriptor entry.

Note that we have to mark a frame descriptor entry as 'final' because
bpf_int_jit_compile can be called multiple times, and we would try to
convert our already converted pc values again, therefore once we do the
conversion remember it and do not repeat it.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Finally, tie all ends together and implement functionality to process a
frame descriptor at runtime for each frame when bpf_throw is called, and
release resources present on the program's stack frames.

For each frame, we do bpf_cleanup_frame_resource, which will use the
instruction pointer at runtime to figure out the right frame descriptor
entry. After this, we explore all stack and registers and call their
respective cleanup procedures.

Next, if the frame corresponds to a subprog, we all save the location of
where it has spilled its callers R6-R9 registers. If so, we record their
value in the unwinding context. Only doing this when each frame has
scratched the register in question allows us to arrive at the set of
values actually needed during the freeing step for registers, regardless
of how many callees existed and the varying locations of spilled callee
saved registers. These registers can also lie across different frames,
but will collected top down when arriving at a frame.

Finally, after doing the cleanup, we go on to execute the exception
callback and finish unwinding the stack.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Reflect in the verifier state that references would be released whenever
we throw a BPF exception. Now that we support generating frame
descriptors, and performing the runtime cleanup, whenever processing an
entry corresponding to an acquired reference, make sure we release its
reference state. Note that we only release this state for the current
frame, as the acquired refs are only checked against that when
processing an exceptional exit.

This would ensure that for acquired resources apart from locks and RCU
read sections, BPF programs never fail in case of lingering resources
during verification.

While at it, we can tweak check_reference_leak to drop the
exception_exit parameter, and fix selftests that will fail due to the
changed behaviour.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: b3d3e29
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=821956
version: 1

Reuse exist BTF dtor infrastructure to also include dtor kfuncs that can
be used to release PTR_TO_BTF_ID pointers and other BTF objects
(iterators). For this purpose, we extend btf_id_dtor_kfunc object with a
flags field, and ensure that entries that cannot work as kptrs are not
allowed to be embedded in map values.

Prior to this change, btf_id_dtor_kfunc served a dual role of allow list
of kptrs and finding their dtors. To separate this role, we must now
explicitly pass only BPF_DTOR_KPTR to ensure we don't look up other
cleanup kfuncs in the dtor table.

Finally, set up iterator and other objects that can be acquired to be
released by adding their cleanup kfunc dtor entries and registering them
with the BTF.

Cc: Jiri Kosina <jikos@kernel.org>
Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Move bpf_throw kfunc to common_kfunc_set so that any program type can
utilize it to throw exceptions. This will also be useful to test a wider
variety of programs to test the cleanup logic properly.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Add tests for the runtime cleanup support for exceptions, ensuring that
resources are correctly identified and released when an exception is
thrown. Also, we add negative tests to exercise corner cases the
verifier should reject.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
@kernel-patches-daemon-bpf
Copy link
Author

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=821956 expired. Closing PR.

@kernel-patches-daemon-bpf kernel-patches-daemon-bpf bot deleted the series/821956=>bpf-next branch February 3, 2024 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant