Skip to content

Conversation

@kernel-patches-daemon-bpf-rc
Copy link

Pull request for series with
subject: bpf: Replace bpf_map_kmalloc_node() with kmalloc_nolock() to allocate bpf_async_cb structures.
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1011529

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: 7f9ee5f
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1011529
version: 1

@kernel-patches-review-bot
Copy link

commit 2489a3c1ea54d1954b1d7a21b47078455f3a9bbf
Author: Alexei Starovoitov <ast@kernel.org>

bpf: Replace bpf_map_kmalloc_node() with kmalloc_nolock() to allocate
bpf_async_cb structures.

This patch fixes a kmemleak imbalance by replacing bpf_map_kmalloc_node()
with kmalloc_nolock() for allocations and kfree_rcu() with call_rcu() +
kfree_nolock() for deallocations to ensure symmetric allocation/free pairs.

> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index a98c83346..d808253f2 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -2499,6 +2499,8 @@ int bpf_map_alloc_pages(const struct bpf_map *map, int nid,
>  #ifdef CONFIG_MEMCG
>  void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags,
>  			   int node);
> +void *bpf_map_kmalloc_nolock(const struct bpf_map *map, size_t size, gfp_t flags,
> +			     int node);
>  void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags);
>  void *bpf_map_kvcalloc(struct bpf_map *map, size_t n, size_t size,
>  		       gfp_t flags);
> @@ -2511,6 +2513,8 @@ void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
>   */
>  #define bpf_map_kmalloc_node(_map, _size, _flags, _node)	\
>  		kmalloc_node(_size, _flags, _node)
> +#define bpf_map_kmalloc_nolock(_map, _size, _flags, _node)	\
> +		kmalloc_nolock(_size, _flags, _node)
>  #define bpf_map_kzalloc(_map, _size, _flags)			\
>  		kzalloc(_size, _flags)
>  #define bpf_map_kvcalloc(_map, _n, _size, _flags)		\

[ ... ]

> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index c9fab9a35..c5f63f268 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c

[ ... ]

> @@ -1270,11 +1277,7 @@ static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u
>  		goto out;
>  	}
>
> -	/* Allocate via bpf_map_kmalloc_node() for memcg accounting. Until
> -	 * kmalloc_nolock() is available, avoid locking issues by using
> -	 * __GFP_HIGH (GFP_ATOMIC & ~__GFP_RECLAIM).
> -	 */
> -	cb = bpf_map_kmalloc_node(map, size, __GFP_HIGH, map->numa_node);
> +	cb = bpf_map_kmalloc_nolock(map, size, 0, map->numa_node);
>  	if (!cb) {
>  		ret = -ENOMEM;
>  		goto out;

[ ... ]

>  	WRITE_ONCE(async->cb, cb);
>  	/* Guarantee the order between async->cb and map->usercnt. So
>  	 * when there are concurrent uref release and bpf timer init, either
>  	 * bpf_timer_cancel_and_free() called by uref release reads a no-NULL
>  	 * timer or atomic64_read() below returns a zero usercnt.
>  	 */
>  	smp_mb();
>  	if (!atomic64_read(&map->usercnt)) {
>  		/* maps with timers must be either held by user space
>  		 * or pinned in bpffs.
>  		 */
>  		WRITE_ONCE(async->cb, NULL);
>  		kfree(cb);
>  		ret = -EPERM;
>  	}
>  out:

Should this kfree(cb) in the error path use kfree_nolock(cb) instead?

The cb was allocated with bpf_map_kmalloc_nolock() and the commit
message states: "make sure that the objects allocated with
kmalloc_nolock() are freed with kfree_nolock()".  All other free sites
were converted to use kfree_nolock() via call_rcu(), but this error path
in __bpf_async_init() still uses regular kfree().

> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 2a9456a3e..8a129746b 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -520,6 +520,21 @@ void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags,
>  	return ptr;
>  }
>
> +void *bpf_map_kmalloc_nolock(const struct bpf_map *map, size_t size, gfp_t flags,
> +			     int node)
> +{
> +	struct mem_cgroup *memcg, *old_memcg;
> +	void *ptr;
> +
> +	memcg = bpf_map_get_memcg(map);
> +	old_memcg = set_active_memcg(memcg);
> +	ptr = kmalloc_nolock(size, flags | __GFP_ACCOUNT, node);
> +	set_active_memcg(old_memcg);
> +	mem_cgroup_put(memcg);
> +
> +	return ptr;
> +}
> +
>  void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags)
>  {
>  	struct mem_cgroup *memcg, *old_memcg;



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: bpf: Replace bpf_map_kmalloc_node() with kmalloc_nolock() to allocate bpf_async_cb structures.
CI run summary: https://github.com/kernel-patches/bpf-rc/actions/runs/18511153831

@kernel-patches-daemon-bpf-rc
Copy link
Author

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: 7f9ee5f
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1011529
version: 1

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: 7f9ee5f
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1011529
version: 1

… bpf_async_cb structures.

The following kmemleak splat:
[    8.105530] kmemleak: Trying to color unknown object at 0xff11000100e918c0 as Black
[    8.106521] Call Trace:
[    8.106521]  <TASK>
[    8.106521]  dump_stack_lvl+0x4b/0x70
[    8.106521]  kvfree_call_rcu+0xcb/0x3b0
[    8.106521]  ? hrtimer_cancel+0x21/0x40
[    8.106521]  bpf_obj_free_fields+0x193/0x200
[    8.106521]  htab_map_update_elem+0x29c/0x410
[    8.106521]  bpf_prog_cfc8cd0f42c04044_overwrite_cb+0x47/0x4b
[    8.106521]  bpf_prog_8c30cd7c4db2e963_overwrite_timer+0x65/0x86
[    8.106521]  bpf_prog_test_run_syscall+0xe1/0x2a0

happens due to the combination of features and fixes, but mainly due to
commit 6d78b44 ("bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init()")
It's using __GFP_HIGH, which instructs slub/kmemleak internals to skip
kmemleak_alloc_recursive() on allocation, so subsequent kfree_rcu()->
kvfree_call_rcu()->kmemleak_ignore() complains with the above splat.

To fix this imbalance, replace bpf_map_kmalloc_node() with
kmalloc_nolock() and kfree_rcu() with call_rcu() + kfree_nolock() to
make sure that the objects allocated with kmalloc_nolock() are freed
with kfree_nolock() rather than the implicit kfree() that kfree_rcu()
uses internally.

Note, the kmalloc_nolock() happens under bpf_spin_lock_irqsave(), so
it will always fail in PREEMPT_RT. This is not an issue at the moment,
since bpf_timers are disabled in PREEMPT_RT. In the future
bpf_spin_lock will be replaced with state machine similar to
bpf_task_work.

Fixes: 6d78b44 ("bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init()")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: e603a34
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1011529
version: 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant