lib: cgen: group same-key maps using bitmasks#520
lib: cgen: group same-key maps using bitmasks#520pzmarzly wants to merge 4 commits intofacebook:mainfrom
Conversation
| /** | ||
| * @brief Emit bitmask check instructions after a map lookup. | ||
| * | ||
| * Set map values are bitmasks, sized to fit one bit per set in the | ||
| * group. After map_lookup_elem returns a non-NULL pointer in BPF_REG_0, | ||
| * this function emits instructions to: | ||
| * 1. Load the byte containing this set's bit from the pointer. | ||
| * 2. AND it with the set's bit. | ||
| * 3. Jump to the next rule if the bit is not set. | ||
| */ | ||
| static int _bf_emit_bitmask_check(struct bf_program *program, size_t bit_index) | ||
| { | ||
| assert(program); | ||
|
|
||
| /* r0 points to the map value. Load the byte containing our bit. */ | ||
| EMIT(program, BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, | ||
| (int16_t)(bit_index / CHAR_BIT))); | ||
|
|
||
| /* AND with the set's bit within that byte. */ | ||
| EMIT(program, | ||
| BPF_ALU64_IMM(BPF_AND, BPF_REG_0, 1U << (bit_index % CHAR_BIT))); | ||
|
|
||
| /* Jump to next rule if the result is 0 (bit not set). */ | ||
| EMIT_FIXUP_JMP_NEXT_RULE(program, BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 0)); | ||
|
|
||
| return 0; | ||
| } |
There was a problem hiding this comment.
I don't really know BPF bytecode, this function is vibecoded. Can you check if it looks right?
|
@claude review |
Claude review of PR #520 (7f30d9d)Must fix
Suggestions
Nits
|
qdeslandes
left a comment
There was a problem hiding this comment.
Thanks for working on this, I think's it's a good first implementation, but there are a few things that could be improved to simplify it.
When grouping sets, using a dedicated structure would be helpful:
bf_set_group {
key;
sets;
};
When looping over sets, you can ignore the empty ones, then for non-empty ones find if a set group exists already and add it there. You don't necessary need a "leader".
Fixups can hold a pointer to a set directly, it will be easier to resolve them, instead of relying on a set index. This is possible as the fixup (and bf_program) aren't serialized, they're only temporary objects (which was not the case before).
f15cbd1 to
a49eca0
Compare
a49eca0 to
7f30d9d
Compare
That's amazing. Trying this. @claude review |
|
|
||
| #define _free_bf_set_group_ __attribute__((__cleanup__(_bf_set_group_free))) | ||
|
|
||
| static void _bf_set_group_free(struct bf_set_group **group) |
There was a problem hiding this comment.
Claude: must-fix: Missing assert(group) at the top of _bf_set_group_free. Every other cleanup function in the codebase (bf_fixup_free, bf_set_free, bf_map_free, _bf_btf_free, etc.) validates its double-pointer argument with assert before dereferencing. Without it, a NULL group pointer causes a silent crash rather than a diagnostic assertion failure.
static void _bf_set_group_free(struct bf_set_group **group)
{
assert(group);
if (!*group)
return;| if (r) | ||
| return bf_err_r(r, "failed to add set elements to the map"); | ||
| memset(value, 0, value_size); | ||
| (void)bf_bpf_map_lookup_elem(new_map->fd, elem, value); |
There was a problem hiding this comment.
Claude: must-fix: For LPM trie maps, bf_bpf_map_lookup_elem performs longest-prefix matching, not exact-key matching. When inserting an element from set B (e.g. 192.168.1.0/24), the lookup may return the value for a shorter prefix from set A (e.g. 192.168.0.0/16) that was already inserted. The subsequent OR would incorrectly propagate set A's bit into set B's entry.
This means a packet matching 192.168.1.0/24 would appear to belong to both set A and set B, producing false positives. Consider either:
- Skipping the lookup for LPM trie groups and always writing only the current set's bit, or
- Preventing LPM trie sets from being grouped together when their prefixes can overlap.
| if (r) | ||
| return bf_err_r(r, "set '%s' not assigned to any group", set->name); | ||
|
|
||
| r = bf_stub_rule_check_protocol(program, meta); |
There was a problem hiding this comment.
Claude: suggestion: set->name can be NULL for anonymous sets (documented in set.h:72). Passing NULL to %s in bf_err_r is undefined behavior. Use set->name ?: "<anonymous>" to match the convention in bf_set_dump (set.c:364). Same issue at line 143.
| case BF_FIXUP_TYPE_SET_MAP_FD: | ||
| DUMP(prefix, "set_index: %lu", fixup->attr.set_index); | ||
| DUMP(prefix, "set: %s", | ||
| fixup->attr.set_ptr ? fixup->attr.set_ptr->name : "(null)"); |
There was a problem hiding this comment.
Claude: suggestion: fixup->attr.set_ptr->name can be NULL for anonymous sets. The ternary guards against set_ptr being NULL but not name being NULL. Use fixup->attr.set_ptr->name ?: "<anonymous>" to match the convention in bf_set_dump (set.c:364).
| return bf_err_r( | ||
| -ENOENT, "set map fixup: set '%s' not in any loaded group", | ||
| fixup->attr.set_ptr ? fixup->attr.set_ptr->name : "(null)"); | ||
| } |
There was a problem hiding this comment.
Claude: suggestion: Same NULL name issue: fixup->attr.set_ptr->name can be NULL for anonymous sets. Use fixup->attr.set_ptr->name ?: "<anonymous>" in the non-NULL branch.
| bf_list_node_get_data(bf_list_get_head(&group->sets)); | ||
| size_t n_sets = bf_list_size(&group->sets); | ||
| size_t value_size = (n_sets + CHAR_BIT - 1) / CHAR_BIT; | ||
| size_t total_elems = 0; |
There was a problem hiding this comment.
Claude: suggestion: total_elems sums element counts across all sets in the group, but duplicate keys across sets are merged in the map (via the lookup-then-update pattern). The BPF map's max_entries is set to total_elems, which over-provisions when keys overlap. This is safe (kernel accepts over-sized maps) but worth a brief comment noting it is an upper bound, since it is not immediately obvious.
| values = malloc(nelems); | ||
| if (!values) | ||
| return bf_err_r(-errno, "failed to allocate map values"); | ||
| group->map = new_map; |
There was a problem hiding this comment.
Claude: suggestion: group->map = new_map stores a non-owning alias before bf_list_push on line 928 transfers ownership (NULLs new_map). If a future change inserts an early return between these lines, _free_bf_map_ would destroy the map while group->map still points to it. Consider moving the group->map assignment to after bf_list_push succeeds:
struct bf_map *map_ref = new_map;
r = bf_list_push(&new_prog->handle->sets, (void **)&new_map);
if (r)
return r;
group->map = map_ref;| set a (ip4.saddr) in { 192.0.2.40 } | ||
| set b (ip4.saddr) in { ${HOST_IP_ADDR} } | ||
| rule (ip4.saddr) in a counter DROP" | ||
| ping -c 1 -W 1 ${NS_IP_ADDR} |
There was a problem hiding this comment.
Claude: suggestion: The isolation test pings ${NS_IP_ADDR} but does not check the exit code. If ping fails for an unrelated reason, the test passes silently. Consider:
ping -c 1 -W 1 ${NS_IP_ADDR} || { echo "ERROR: ping should have succeeded"; exit 1; }| assert(program); | ||
|
|
||
| bf_list_clean(&program->set_groups); | ||
| program->set_groups = |
There was a problem hiding this comment.
Claude: nit: The explicit cast (bf_list_ops_free)_bf_set_group_free is unnecessary. The rest of the codebase passes free callbacks without casting (e.g., bf_list_default(bf_fixup_free, NULL) on line 116, bf_list_default(bf_set_free, bf_set_pack) in chain.c). Same at line 260.
| * also as filename when pinning the map to the system. Can't be NULL or | ||
| * empty. | ||
| * @param set Set to create the map from. Can't be NULL. | ||
| * @param set_size Number of elements to reserve room for in the map. |
There was a problem hiding this comment.
Claude: nit: The set_size parameter name could be misread as "number of sets" rather than "number of elements to reserve". Consider n_elems to match the naming convention in the underlying _bf_map_new call.
Closes #381.
We're grouping all sets in a chain by key format. Then, sets of the same format will turn into a single BPF map, with bitmasks in its values that indicate which sets the element appears in. With the 64 maps per program limit, we should now be able to have 64 unique key formats per program, which is much less likely to hit than just 64 sets.