Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage #4807

Closed
wants to merge 6 commits into from

Conversation

kernel-patches-bot
Copy link

Pull request for series with
subject: bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage
version: 3
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=732913

@kernel-patches-bot
Copy link
Author

Upstream branch: d9d93f3
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=732913
version: 3

@kernel-patches-bot
Copy link
Author

Upstream branch: 02adf9e
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=732913
version: 3

@kernel-patches-bot
Copy link
Author

Upstream branch: 1a3148f
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=732913
version: 3

@kernel-patches-bot
Copy link
Author

Upstream branch: b63cbc4
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=732913
version: 3

@kernel-patches-bot
Copy link
Author

Upstream branch: 226bc6a
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=732913
version: 3

@kernel-patches-bot
Copy link
Author

Upstream branch: 226bc6a
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=732913
version: 3

@kernel-patches-bot
Copy link
Author

Upstream branch: 226bc6a
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=732913
version: 3

@kernel-patches-bot
Copy link
Author

Upstream branch: 55fbae0
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=732913
version: 3

@kernel-patches-bot
Copy link
Author

Upstream branch: 496f4f1
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=732913
version: 3

This patch adds a few bpf mem allocator functions which will
be used in the bpf_local_storage in a later patch.

bpf_mem_cache_alloc_flags(..., gfp_t flags) is added. When the
flags == GFP_KERNEL, it will fallback to __alloc(..., GFP_KERNEL).
bpf_local_storage knows its running context is sleepable (GFP_KERNEL)
and provides a better guarantee on memory allocation.

bpf_local_storage has some uncommon cases that its selem
cannot be reused immediately. It handles its own
rcu_head and goes through a rcu_trace gp and then free it.
bpf_mem_cache_raw_free() is added for direct free purpose
without leaking the LLIST_NODE_SZ internal knowledge.
During free time, the 'struct bpf_mem_alloc *ma' is no longer
available. However, the caller should know if it is
percpu memory or not and it can call different raw_free functions.
bpf_local_storage does not support percpu value, so only
the non-percpu 'bpf_mem_cache_raw_free()' is added in
this patch.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
@kernel-patches-bot
Copy link
Author

Upstream branch: e993607
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=732913
version: 3

Martin KaFai Lau added 4 commits March 25, 2023 18:19
This patch uses bpf_mem_alloc for the task and cgroup local storage that
the bpf prog can easily get a hold of the storage owner's PTR_TO_BTF_ID.
eg. bpf_get_current_task_btf() can be used in some of the kmalloc code
path which will cause deadlock/recursion. bpf_mem_cache_alloc is
deadlock free and will solve a legit use case in [1].

For sk storage, its batch creation benchmark shows a few percent
regression when the sk create/destroy batch size is larger than 32.
The sk creation/destruction happens much more often and
depends on external traffic. Considering it is hypothetical
to be able to cause deadlock with sk storage, it can cross
the bridge to use bpf_mem_alloc till a legit (ie. useful)
use case comes up.

For inode storage, bpf_local_storage_destroy() is called before
waiting for a rcu gp and its memory cannot be reused immediately.
inode stays with kmalloc/kfree after the rcu [or tasks_trace] gp.

A 'bool bpf_ma' argument is added to bpf_local_storage_map_alloc().
Only task and cgroup storage have 'bpf_ma == true' which
means to use bpf_mem_cache_alloc/free(). This patch only changes
selem to use bpf_mem_alloc for task and cgroup. The next patch
will change the local_storage to use bpf_mem_alloc also for
task and cgroup.

Here is some more details on the changes:

* memory allocation:
After bpf_mem_cache_alloc(), the SDATA(selem)->data is zero-ed because
bpf_mem_cache_alloc() could return a reused selem. It is to keep
the existing bpf_map_kzalloc() behavior. Only SDATA(selem)->data
is zero-ed. SDATA(selem)->data is the visible part to the bpf prog.
No need to use zero_map_value() to do the zeroing because
bpf_selem_free(..., reuse_now = true) ensures no bpf prog is using
the selem before returning the selem through bpf_mem_cache_free().
For the internal fields of selem, they will be initialized when
linking to the new smap and the new local_storage.

When 'bpf_ma == false', nothing changes in this patch. It will
stay with the bpf_map_kzalloc().

* memory free:
The bpf_selem_free() and bpf_selem_free_rcu() are modified to handle
the bpf_ma == true case.

For the common selem free path where its owner is also being destroyed,
the mem is freed in bpf_local_storage_destroy(), the owner (task
and cgroup) has gone through a rcu gp. The memory can be reused
immediately, so bpf_local_storage_destroy() will call
bpf_selem_free(..., reuse_now = true) which will do
bpf_mem_cache_free() for immediate reuse consideration.

An exception is the delete elem code path. The delete elem code path
is called from the helper bpf_*_storage_delete() and the syscall
bpf_map_delete_elem(). This path is an unusual case for local
storage because the common use case is to have the local storage
staying with its owner life time so that the bpf prog and the user
space does not have to monitor the owner's destruction. For the delete
elem path, the selem cannot be reused immediately because there could
be bpf prog using it. It will call bpf_selem_free(..., reuse_now = false)
and it will wait for a rcu tasks trace gp before freeing the elem. The
rcu callback is changed to do bpf_mem_cache_raw_free() instead of kfree().

When 'bpf_ma == false', it should be the same as before.
__bpf_selem_free() is added to do the kfree_rcu and call_tasks_trace_rcu().
A few words on the 'reuse_now == true'. When 'reuse_now == true',
it is still racing with bpf_local_storage_map_free which is under rcu
protection, so it still needs to wait for a rcu gp instead of kfree().
Otherwise, the selem may be reused by slab for a totally different struct
while the bpf_local_storage_map_free() is still using it (as a
rcu reader). For the inode case, there may be other rcu readers also.
In short, when bpf_ma == false and reuse_now == true => vanilla rcu.

[1]: https://lore.kernel.org/bpf/20221118190109.1512674-1-namhyung@kernel.org/

Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This patch uses bpf_mem_cache_alloc/free for allocating and freeing
bpf_local_storage for task and cgroup storage.

The changes are similar to the previous patch. A few things that
worth to mention for bpf_local_storage:

The local_storage is freed when the last selem is deleted.
Before deleting a selem from local_storage, it needs to retrieve the
local_storage->smap because the bpf_selem_unlink_storage_nolock()
may have set it to NULL. Note that local_storage->smap may have
already been NULL when the selem created this local_storage has
been removed. In this case, call_rcu will be used to free the
local_storage.
Also, the bpf_ma (true or false) value is needed before calling
bpf_local_storage_free(). The bpf_ma can either be obtained from
the local_storage->smap (if available) or any of its selem's smap.
A new helper check_storage_bpf_ma() is added to obtain
bpf_ma for a deleting bpf_local_storage.

When bpf_local_storage_alloc getting a reused memory, all
fields are either in the correct values or will be initialized.
'cache[]' must already be all NULLs. 'list' must be empty.
Others will be initialized.

Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
The current sk storage test ensures the memory free works when
the local_storage->smap is NULL.

This patch adds a task storage test to ensure the memory free
code path works when local_storage->smap is NULL.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This patch adds a task storage benchmark to the existing
local-storage-create benchmark.

For task storage,
./bench --storage-type task --batch-size 32:
   bpf_ma: Summary: creates   30.456 ± 0.507k/s ( 30.456k/prod), 6.08 kmallocs/create
no bpf_ma: Summary: creates   31.962 ± 0.486k/s ( 31.962k/prod), 6.13 kmallocs/create

./bench --storage-type task --batch-size 64:
   bpf_ma: Summary: creates   30.197 ± 1.476k/s ( 30.197k/prod), 6.08 kmallocs/create
no bpf_ma: Summary: creates   31.103 ± 0.297k/s ( 31.103k/prod), 6.13 kmallocs/create

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
@kernel-patches-bot
Copy link
Author

Upstream branch: 8d27596
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=732913
version: 3

Pull request is NOT updated. Failed to apply https://patchwork.kernel.org/project/netdevbpf/list/?series=732913
error message:

Cmd('git') failed due to: exit code(128)
  cmdline: git am --3way
  stdout: 'Applying: bpf: Add a few bpf mem allocator functions
Using index info to reconstruct a base tree...
M	include/linux/bpf_mem_alloc.h
M	kernel/bpf/memalloc.c
Falling back to patching base and 3-way merge...
No changes -- Patch already applied.
Applying: bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage_elem
Using index info to reconstruct a base tree...
M	include/linux/bpf_local_storage.h
M	kernel/bpf/bpf_cgrp_storage.c
M	kernel/bpf/bpf_inode_storage.c
M	kernel/bpf/bpf_local_storage.c
M	kernel/bpf/bpf_task_storage.c
M	net/core/bpf_sk_storage.c
Falling back to patching base and 3-way merge...
Auto-merging net/core/bpf_sk_storage.c
Auto-merging kernel/bpf/bpf_task_storage.c
Auto-merging kernel/bpf/bpf_local_storage.c
CONFLICT (content): Merge conflict in kernel/bpf/bpf_local_storage.c
Auto-merging kernel/bpf/bpf_inode_storage.c
Auto-merging kernel/bpf/bpf_cgrp_storage.c
Auto-merging include/linux/bpf_local_storage.h
CONFLICT (content): Merge conflict in include/linux/bpf_local_storage.h
Patch failed at 0002 bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage_elem
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".'
  stderr: 'error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch'

conflict:

diff --cc include/linux/bpf_local_storage.h
index 173ec7f43ed1,30efbcab2798..000000000000
--- a/include/linux/bpf_local_storage.h
+++ b/include/linux/bpf_local_storage.h
@@@ -57,7 -57,6 +57,10 @@@ struct bpf_local_storage_map 
  	u16 elem_size;
  	u16 cache_idx;
  	struct bpf_mem_alloc selem_ma;
++<<<<<<< HEAD
 +	struct bpf_mem_alloc storage_ma;
++=======
++>>>>>>> bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage_elem
  	bool bpf_ma;
  };
  
diff --cc kernel/bpf/bpf_local_storage.c
index dab2ff4c99d9,309ea727a5cb..000000000000
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@@ -142,67 -130,38 +142,89 @@@ static void bpf_local_storage_free_trac
  		call_rcu(rcu, bpf_local_storage_free_rcu);
  }
  
 +/* Handle bpf_ma == false */
 +static void __bpf_local_storage_free(struct bpf_local_storage *local_storage,
 +				     bool vanilla_rcu)
 +{
 +	if (vanilla_rcu)
 +		kfree_rcu(local_storage, rcu);
 +	else
 +		call_rcu_tasks_trace(&local_storage->rcu,
 +				     __bpf_local_storage_free_trace_rcu);
 +}
 +
  static void bpf_local_storage_free(struct bpf_local_storage *local_storage,
 -				   bool reuse_now)
 +				   struct bpf_local_storage_map *smap,
 +				   bool bpf_ma, bool reuse_now)
  {
 -	if (!reuse_now)
 +	if (!bpf_ma) {
 +		__bpf_local_storage_free(local_storage, reuse_now);
 +		return;
 +	}
 +
 +	if (!reuse_now) {
  		call_rcu_tasks_trace(&local_storage->rcu,
  				     bpf_local_storage_free_trace_rcu);
 -	else
 +		return;
 +	}
 +
 +	if (smap) {
 +		migrate_disable();
 +		bpf_mem_cache_free(&smap->storage_ma, local_storage);
 +		migrate_enable();
 +	} else {
 +		/* smap could be NULL if the selem that triggered
 +		 * this 'local_storage' creation had been long gone.
 +		 * In this case, directly do call_rcu().
 +		 */
  		call_rcu(&local_storage->rcu, bpf_local_storage_free_rcu);
 +	}
 +}
 +
 +/* rcu tasks trace callback for bpf_ma == false */
 +static void __bpf_selem_free_trace_rcu(struct rcu_head *rcu)
 +{
 +	struct bpf_local_storage_elem *selem;
 +
 +	selem = container_of(rcu, struct bpf_local_storage_elem, rcu);
 +	if (rcu_trace_implies_rcu_gp())
 +		kfree(selem);
 +	else
 +		kfree_rcu(selem, rcu);
 +}
 +
 +/* Handle bpf_ma == false */
 +static void __bpf_selem_free(struct bpf_local_storage_elem *selem,
 +			     bool vanilla_rcu)
 +{
 +	if (vanilla_rcu)
 +		kfree_rcu(selem, rcu);
 +	else
 +		call_rcu_tasks_trace(&selem->rcu, __bpf_selem_free_trace_rcu);
  }
  
+ /* rcu tasks trace callback for bpf_ma == false */
+ static void __bpf_selem_free_trace_rcu(struct rcu_head *rcu)
+ {
+ 	struct bpf_local_storage_elem *selem;
+ 
+ 	selem = container_of(rcu, struct bpf_local_storage_elem, rcu);
+ 	if (rcu_trace_implies_rcu_gp())
+ 		kfree(selem);
+ 	else
+ 		kfree_rcu(selem, rcu);
+ }
+ 
+ /* Handle bpf_ma == false */
+ static void __bpf_selem_free(struct bpf_local_storage_elem *selem,
+ 			     bool vanilla_rcu)
+ {
+ 	if (vanilla_rcu)
+ 		kfree_rcu(selem, rcu);
+ 	else
+ 		call_rcu_tasks_trace(&selem->rcu, __bpf_selem_free_trace_rcu);
+ }
+ 
  static void bpf_selem_free_rcu(struct rcu_head *rcu)
  {
  	struct bpf_local_storage_elem *selem;
@@@ -847,12 -755,6 +869,15 @@@ bpf_local_storage_map_alloc(union bpf_a
  		err = bpf_mem_alloc_init(&smap->selem_ma, smap->elem_size, false);
  		if (err)
  			goto free_smap;
++<<<<<<< HEAD
 +
 +		err = bpf_mem_alloc_init(&smap->storage_ma, sizeof(struct bpf_local_storage), false);
 +		if (err) {
 +			bpf_mem_alloc_destroy(&smap->selem_ma);
 +			goto free_smap;
 +		}
++=======
++>>>>>>> bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage_elem
  	}
  
  	smap->cache_idx = bpf_local_storage_cache_idx_get(cache);
@@@ -927,10 -829,8 +952,15 @@@ void bpf_local_storage_map_free(struct 
  	 */
  	synchronize_rcu();
  
++<<<<<<< HEAD
 +	if (smap->bpf_ma) {
 +		bpf_mem_alloc_destroy(&smap->selem_ma);
 +		bpf_mem_alloc_destroy(&smap->storage_ma);
 +	}
++=======
+ 	if (smap->bpf_ma)
+ 		bpf_mem_alloc_destroy(&smap->selem_ma);
++>>>>>>> bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage_elem
  	kvfree(smap->buckets);
  	bpf_map_area_free(smap);
  }

@kernel-patches-bot
Copy link
Author

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=732913 irrelevant now. Closing PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant