Skip to content

Commit

Permalink
bpf: add support for open-coded iterator loops
Browse files Browse the repository at this point in the history
Teach verifier about the concept of open-coded (or inline) iterators.

This patch adds generic iterator loop verification logic, new STACK_ITER
stack slot type to contain iterator state, and necessary kfunc plumbing
for iterator's constructor, destructor and next "methods". Next patch
implements first specific version of iterator (number iterator for
implementing for loop). Such split allows to have more focused commits
for verifier logic and separate commit that we could point later to what
it takes to add new kind of iterator.

Each kind of iterators has its own associated struct bpf_iter_<type>,
where <type> specifies a specific type of iterator. struct
bpf_iter_<type> state is supposed to be on BPF program's stack, so there
will be no way to change its size later on without breaking backwards
compatibility, so choose wisely! But given it's specific to a given
<type> of iterator, this allows a lot of flexibility: simple iterators
could be fine with just one stack slot (8 bytes), like numbers iterator
in the next patch. While some other more complicated iterators would
need way more to keep their iterator state. Either way, such design
allows to avoid runtime memory allocations, which otherwise would be
necessary if fixed on-the-stack size turns out too small.

The way BPF verifier logic is implemented, there are no artificial
restrictions on a number of active iterators, it should work correctly
using multiple at the same time. This also means you can have multiple
nested iteration loops. struct bpf_iter_<type> reference can be safely
passed to subprograms as well.

General flow is easiest to demonstrate with a simple example using
number iterator implemented in next patch. Here's the simplest possible
loop:

  struct bpf_iter_num it;
  int *v;

  bpf_iter_num_new(&it, 2, 5);
  while ((v = bpf_iter_num_next(&it))) {
      bpf_printk("X = %d", *v);
  }
  bpf_iter_num_destroy(&it);

Above snippet should output "X = 2", "X = 3", "X = 4". Note that 5 is
exclusive and is not returned. This matches similar APIs (e.g., slices
in Go or Rust) that implement a range of elements, where end index is
non-inclusive.

In the above example, we see a trio of function:
  - constructor, bpf_iter_num_new(), which initializes iterator state
  (struct bpf_iter_num it) on the stack. If any on the input arguments
  are invalid, constructor should make sure to still initialize it such
  that subsequent bpf_iter_num_next() calls will return NULL. I.e., on
  error, return error and construct empty iterator.
  - next method, bpf_iter_num_next(), which accepts pointer to iterator
  state and produces an element. Next method should always return
  a pointer. The contract between BPF verifier is that next method will
  always eventually return NULL when elements are exhausted. One NULL is
  returned, subsequent next calls should keep returning NULL. In the
  case of numbers iterator, bpf_iter_num_next() returns a pointer to int
  (where current integer value itself is stored inside iterator state),
  which can be dereferenced after according NULL check.
  - once done with the iterator, it's mandated that user cleans up its
  state with destructor, bpf_iter_num_destroy() in this case. Destructor
  frees up any resources and marks stack space used by struct
  bpf_iter_num as usable for something else.

Any other iterator implementation will have to implement at least these
three methods. It is enforced that for any given type of iterator only
applicable constructor/destructor/next are callable. I.e., verifier
ensures you can't pass number iterator state into, say, cgroup
iterator's next method.

It is important to keep naming consistent to be able to create generic
helpers/macros to help with bpf_iter usability. E.g., one of the follow
up patches adds generic bpf_for_each() macro to bpf_misc.h in selftests,
which allows to utilize iterator "trio" nicely without having to code
the above somewhat tedious loop explicitly every time. This is enforced
at kfunc registration point by one of the previous patches in this series.

At the implementation level, iterator state tracking for verification
purposes is very similar to dynptr. We add STACK_ITER stack slot type,
reserve necessary number of slots, depending on sizeof(struct
bpf_iter_<type>), and keep track of necessary extra state in the "main"
slot, which is marked with non-zero ref_obj_id.  Other slots are also
marked as STACK_ITER, but have zero ref_obj_id.  This is simpler than
having a separate "is_first_slot" flag.

Another big distinction is that STACK_ITER is *always refcounted*, which
simplifies implementation without sacrificing usability. So no need for
extra "iter_id", no need to anticipate reuse of STACK_ITER slots for new
constructors, etc. Keeping it simpler here.

As far as verification logic, there are two extensive comments, in
process_iter_next_call() and iter_active_depths_differ(), please refer
to them for details.

But from 10,000-foot point of view, next methods are the points of
forking, which are conceptually similar to what verifier is doing when
validating conditional jump. We branch out at call bpf_iter_<type>_next
instruction and simulate two situations: NULL (iteration is done) and
non-NULL (new element returned). NULL is simulated first and is supposed
to reach exit without looping. After that non-NULL case is validated and
it either reaches exit (for trivial examples with no real loop), or
reaches another call bpf_iter_<type>_next instruction with the state
equivalent to already (partially) validated one. State equivalency at
that point means we technically are going to be looping forever without
"breaking out" out of established "state envelope" (i.e., subsequent
iterations don't add any new knowledge or constraints to verifier state,
so running 1, 2, 10, or a million doesn't matter). But taking into
account the contract stating that iterator next method *has to* return
NULL eventually, we can conclude that loop body is safe and will
eventually terminate. Given we validated logic outside of the loop (NULL
case), and concluded that loop body is safe (though potentially looping
many times), verifier can claim safety of the overall program logic.

The rest of the patch is necessary plumbing for state tracking, marking,
validation, and necessary kfunc infrastructure to allow implementing
iterator constructor, destructor, and next methods.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
  • Loading branch information
anakryiko authored and intel-lab-lkp committed Mar 8, 2023
1 parent be28a7e commit 8f263e1
Show file tree
Hide file tree
Showing 2 changed files with 608 additions and 9 deletions.
25 changes: 24 additions & 1 deletion include/linux/bpf_verifier.h
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,12 @@ struct bpf_active_lock {

#define ITER_PREFIX "bpf_iter_"

enum bpf_iter_state {
BPF_ITER_STATE_INVALID, /* for non-first slot */
BPF_ITER_STATE_ACTIVE,
BPF_ITER_STATE_DRAINED,
};

struct bpf_reg_state {
/* Ordering of fields matters. See states_equal() */
enum bpf_reg_type type;
Expand Down Expand Up @@ -105,6 +111,18 @@ struct bpf_reg_state {
bool first_slot;
} dynptr;

/* For bpf_iter stack slots */
struct {
/* BTF container and BTF type ID describing
* struct bpf_iter_<type> of an iterator state
*/
struct btf *btf;
u32 btf_id;
/* packing following two fields to fit iter state into 16 bytes */
enum bpf_iter_state state:2;
int depth:30;
} iter;

/* Max size from any of the above. */
struct {
unsigned long raw1;
Expand Down Expand Up @@ -143,6 +161,8 @@ struct bpf_reg_state {
* same reference to the socket, to determine proper reference freeing.
* For stack slots that are dynptrs, this is used to track references to
* the dynptr to determine proper reference freeing.
* Similarly to dynptrs, we use ID to track "belonging" of a reference
* to a specific instance of bpf_iter.
*/
u32 id;
/* PTR_TO_SOCKET and PTR_TO_TCP_SOCK could be a ptr returned
Expand Down Expand Up @@ -213,11 +233,13 @@ enum bpf_stack_slot_type {
* is stored in bpf_stack_state->spilled_ptr.dynptr.type
*/
STACK_DYNPTR,
STACK_ITER,
};

#define BPF_REG_SIZE 8 /* size of eBPF register in bytes */

#define BPF_DYNPTR_SIZE sizeof(struct bpf_dynptr_kern)
#define BPF_DYNPTR_NR_SLOTS (BPF_DYNPTR_SIZE / BPF_REG_SIZE)
#define BPF_DYNPTR_NR_SLOTS (BPF_DYNPTR_SIZE / BPF_REG_SIZE)

struct bpf_stack_state {
struct bpf_reg_state spilled_ptr;
Expand Down Expand Up @@ -450,6 +472,7 @@ struct bpf_insn_aux_data {
bool sanitize_stack_spill; /* subject to Spectre v4 sanitation */
bool zext_dst; /* this insn zero extends dst reg */
bool storage_get_func_atomic; /* bpf_*_storage_get() with atomic memory alloc */
bool is_iter_next; /* bpf_iter_<type>_next() kfunc call */
u8 alu_state; /* used in combination with alu_limit */

/* below fields are initialized once */
Expand Down

0 comments on commit 8f263e1

Please sign in to comment.