Skip to content

lib: add bf_vector and use it in cgen#487

Open
pzmarzly wants to merge 2 commits intofacebook:mainfrom
pzmarzly:push-ukrnssyykoqn
Open

lib: add bf_vector and use it in cgen#487
pzmarzly wants to merge 2 commits intofacebook:mainfrom
pzmarzly:push-ukrnssyykoqn

Conversation

@pzmarzly
Copy link
Copy Markdown
Contributor

@pzmarzly pzmarzly commented Mar 24, 2026

I initially planned to use bf_vector in bf_hashset, but I got convinced by @yaakov-stein that it's not a good idea. However, we could use bf_vector in other places, especially where we use bf_dynbuf or where we handroll vector-like behavior.

Commits:

  • lib: core: add bf_vector - implementation. Vector doubles in size when it hits the limits. Has a helper to remove elements as well, though shrinking is not implemented.
  • cgen: use bf_vector for img - Instead of managing img, img_size, and img_cap by hand, use bf_vector.

@meta-cla meta-cla bot added the cla signed label Mar 24, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 24, 2026

Claude review of PR #487 (dd1fe11)

Well-structured PR that introduces bf_vector as a generic dynamically-sized array and uses it to replace the hand-rolled img/img_size/img_cap management in program.c. The migration is clean, error handling is consistent, and the new vector implementation has thorough unit tests (12 test cases covering add, remove, set, reserve, foreach, take, etc.). Bounds checking via bf_vector_get and bf_vector_set is a safety improvement over direct array indexing.

Must fix

  • Duplicate include in program.hsrc/libbpfilter/cgen/program.h:16#include <bpfilter/core/list.h> appears twice (lines 11 and 16). The first was correctly added by commit 1 (move bf_list into subfolder), the second was erroneously re-added in commit 3 when inserting vector.h. Remove the duplicate.
  • Invalid commit component prefix — commit 309f17f — The third commit message daemon: cgen: use bf_vector for img uses daemon: as the component, but per doc/developers/style.rst, valid components are lib, cli, tests, build, tools, doc. Since cgen is part of libbpfilter, use lib: cgen: use bf_vector for img.

Suggestions

  • bf_vector_reserve guard checks vec->cap instead of capsrc/libbpfilter/core/vector.c:184 — The guard if (vec->cap > _BF_VECTOR_MAX_CAP) checks the current capacity, not the requested cap. Since we already passed the cap > vec->cap early return, a very large cap (e.g. from network-influenced bf_vector_reserve in _bf_recv_in_buff) bypasses this guard and reaches _bf_vector_realloc uncapped. For elem_size == 1 the overflow check doesn't help — realloc is called with a huge size that fails gracefully, but the intent of _BF_VECTOR_MAX_CAP is to prevent this. Fix: if (cap > _BF_VECTOR_MAX_CAP). (Previously flagged in an earlier review round, not yet addressed.)
  • bf_vector_add_many memcpy aliasing hazardsrc/libbpfilter/core/vector.c:171 — If a caller passes a pointer from bf_vector_get() on the same vector as data, and the append triggers a realloc, the source pointer dangles. Even without realloc, overlapping memcpy is UB. Consider memmove or documenting the restriction.
  • Fixup null check before type filtersrc/libbpfilter/cgen/program.c:215 — The bf_vector_get + null check fired before if (type != fixup->type) continue;. Fixed: the type filter is now correctly placed before the lookup.
  • bf_vector_foreach void arithmetic in public header* — src/libbpfilter/include/bpfilter/core/vector.h:59 — The macro performs pointer arithmetic on void *, which is a GCC extension not valid in standard C. Since this is a public header, downstream consumers compiling with -pedantic -std=c17 will get errors when using the macro. Cast through char * instead.
  • Initial capacity reduced from 64 to 8src/libbpfilter/cgen/program.c:109 — The old _BF_PROGRAM_DEFAULT_IMG_SIZE was 64; the new default _BF_VECTOR_INIT_CAP is 8. BPF programs routinely contain hundreds of instructions, so this adds ~4 extra reallocations per program generation. Consider an initial bf_vector_reserve in bf_program_new.

Nits

  • Line exceeds 80-char limit in bf_vector_add_manysrc/libbpfilter/core/vector.c:161 — 82 characters vs the 80-char ColumnLimit in .clang-format. Can be split by extracting the ternary.
  • _bf_vector_realloc missing assert(vec)src/libbpfilter/core/vector.c:99 — Every other comparable static helper asserts its pointer parameter (_bf_list_node_new, _bf_list_node_free). Adding assert(vec) here would be consistent.
  • Include ordering in program.hsrc/libbpfilter/cgen/program.h:17#include <bpfilter/core/vector.h> is out of alphabetical order. It should be placed after <bpfilter/core/list.h> and before <bpfilter/dump.h>. make -C build fixstyle should fix this automatically.
  • bf_vector_add growth fallbacksrc/libbpfilter/core/vector.c:132 — When cap * 2 > _BF_VECTOR_MAX_CAP, returns -ENOMEM instead of capping at _BF_VECTOR_MAX_CAP. Purely theoretical on 64-bit but inconsistent with bf_vector_add_many's fallback.
  • Error message "bytes" vs element countsrc/libbpfilter/cgen/program.c:798 — Fixed: format now correctly says "insns" matching the element count.

Workflow run

@pzmarzly pzmarzly marked this pull request as draft March 24, 2026 13:30
@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch from 2bbe579 to 419e434 Compare March 24, 2026 14:33
@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch from 419e434 to e77f462 Compare March 24, 2026 15:18
@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch from e77f462 to 305dd19 Compare March 24, 2026 16:07
@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch from 305dd19 to f5a94cd Compare March 24, 2026 16:57
@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch from f5a94cd to e9d06a6 Compare March 24, 2026 17:37
@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch from e9d06a6 to 4cd032f Compare March 24, 2026 17:53
@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch 2 times, most recently from 703e225 to a8759b5 Compare March 24, 2026 18:32
@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch from a8759b5 to c828488 Compare March 24, 2026 19:26
@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch from c828488 to 309f17f Compare March 25, 2026 18:44
@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch from 309f17f to fba6566 Compare March 25, 2026 19:08
@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch from fba6566 to c37916b Compare March 25, 2026 19:18
@pzmarzly pzmarzly changed the title lib: add bf_vector and use it to replace bf_dynbuf lib: add bf_vector and use it in cgen Mar 25, 2026
@pzmarzly pzmarzly marked this pull request as ready for review March 25, 2026 19:29
@pzmarzly pzmarzly requested a review from qdeslandes as a code owner March 25, 2026 19:29
@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch from c37916b to 975b503 Compare March 26, 2026 14:23
@pzmarzly pzmarzly force-pushed the push-ukrnssyykoqn branch from 975b503 to dd1fe11 Compare March 26, 2026 15:24
(elem) && (elem) < (vec)->data + ((vec)->size * (vec)->elem_size); \
(elem) = (elem) + (vec)->elem_size)

/**
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: bf_vector_foreach performs pointer arithmetic on void * ((elem) + (vec)->elem_size and (vec)->data + ...), which is a GCC extension not valid in standard C.

Since vector.h is a public header under include/bpfilter/core/, any downstream consumer using this macro with -pedantic -std=c17 will get a compile error. The pedantic integration test (tests/integration/pedantic_c.sh) includes all public headers with -pedantic-errors -std=c17 -Werror — the macro definition alone won't trigger an error (unexpanded macros aren't compiled), but any call site under those flags will fail.

For comparison, bf_list_foreach in list.h avoids this by iterating over typed pointers (bf_list_node *). Consider casting through char *:

Suggested change
/**
#define bf_vector_foreach(vec, elem) \
for (void *(elem) = (vec)->data; \
(elem) && (char *)(elem) < (char *)(vec)->data + ((vec)->size * (vec)->elem_size); \
(elem) = (void *)((char *)(elem) + (vec)->elem_size))

_program->flavor = bf_hook_to_flavor(chain->hook);
_program->runtime.ops = bf_flavor_ops_get(_program->flavor);
_program->runtime.chain = chain;
_program->img = bf_vector_default(sizeof(struct bpf_insn));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: The old _BF_PROGRAM_DEFAULT_IMG_SIZE was 64 (1 << 6); the new generic _BF_VECTOR_INIT_CAP is 8. Since BPF instruction images routinely contain hundreds of instructions, starting at capacity 8 means ~4 extra reallocations (8 -> 16 -> 32 -> 64 -> 128) per program generation compared to the old code.

Consider adding an initial reserve in bf_program_new after creating the vector:

_program->img = bf_vector_default(sizeof(struct bpf_insn));
bf_vector_reserve(&_program->img, 64);

Alternatively, the caller-side reserve preserves the generic _BF_VECTOR_INIT_CAP for other use cases while optimizing the hot path.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 with Claude, programs are at least 200 instructions, so we can reserve 512 elements.

* @brief Iterate over every element of a `bf_vector`.
*
* `elem` is declared as a pointer to the element type and will point to each
* element in turn. Do not add or remove elements during iteration.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not add

This should not be an issue.

remove elements during iteration

Same here, as we don't shrink the memory buffer.

*
* @param vec Initialised vector. Must be non-NULL.
* @param index Index of the element.
* @return Pointer to the element, or NULL if @p index is out of bounds.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@p index -> `index`

As a general rule, use backticks for parameters, variables, functions...

void *bf_vector_get(const bf_vector *vec, size_t index);

/**
* @brief Overwrite the n-th element.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Replace the n-th element" would be clearer.

* @brief Take ownership of the backing buffer.
*
* Returns the raw data pointer and resets the vector so it will
* re-allocate on the next add.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"on the next bf_vector_add"


assert(vec);

if (!vec->elem_size)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems off as we never check for vec->elem_size. Either we assume the user might modify vec->elem_size (because it has access to bf_vector structure), in which case we need to validate it in each function, or bf_vector should be moved into vector.c (I would suggest this).

ctx->insn_idx);
}

size_t off = ctx->program->img.size - ctx->insn_idx - 1U;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable should be defined at the beginning of the scope.

_program->flavor = bf_hook_to_flavor(chain->hook);
_program->runtime.ops = bf_flavor_ops_get(_program->flavor);
_program->runtime.chain = chain;
_program->img = bf_vector_default(sizeof(struct bpf_insn));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 with Claude, programs are at least 200 instructions, so we can reserve 512 elements.

Comment on lines 471 to 476
int bf_program_emit(struct bf_program *program, struct bpf_insn insn)
{
int r;

assert(program);

if (program->img_size == program->img_cap) {
r = bf_program_grow_img(program);
if (r)
return r;
}

program->img[program->img_size++] = insn;

return 0;
return bf_vector_add(&program->img, &insn);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static inline in program.h.

Comment on lines 506 to 524
@@ -535,8 +519,8 @@

/* This call could fail and return an error, in which case it is not
* properly handled. However, this shouldn't be an issue as we previously
* test whether enough room is available in cgen.img, which is currently
* the only reason for EMIT() to fail. */
* reserved enough room in program->img, which is currently the only
* reason for EMIT() to fail. */
EMIT(program, insn);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code is weird in order to ensure the state of bf_program is always valid (i.e. we don't insert a fixup for an instruction that doesn't exist).

We don't need to maintain a valid bf_program state anymore, as it doesn't survive past the generation step. Hence, you can push the fixup to the list, then use bf_program_emit(), and return an error on failure. It will be properly taken care of by bpfilter.

Comment on lines 529 to 556
int bf_program_emit_fixup_elfstub(struct bf_program *program,
enum bf_elfstub_id id)
{
_free_bf_fixup_ struct bf_fixup *fixup = NULL;
int r;

assert(program);

if (program->img_size == program->img_cap) {
r = bf_program_grow_img(program);
if (r)
return r;
}
r = bf_vector_reserve(&program->img, program->img.size + 1);
if (r)
return r;

r = bf_fixup_new(&fixup, BF_FIXUP_ELFSTUB_CALL, program->img_size, NULL);
r = bf_fixup_new(&fixup, BF_FIXUP_ELFSTUB_CALL, program->img.size, NULL);
if (r)
return r;

fixup->attr.elfstub_id = id;

r = bf_list_add_tail(&program->fixups, fixup);
if (r)
return r;

TAKE_PTR(fixup);

EMIT(program, BPF_CALL_REL(0));

return 0;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as bf_program_emit_fixup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants