Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: Introduce bpf_mem_cache_free_rcu(). #5236

Closed

Conversation

kernel-patches-daemon-bpf[bot]
Copy link

Pull request for series with
subject: bpf: Introduce bpf_mem_cache_free_rcu().
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=758907

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 6d1dd92
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=758907
version: 1

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 970308a
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=758907
version: 1

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 3d5786e
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=758907
version: 1

@kernel-patches-daemon-bpf
Copy link
Author

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=758907 expired. Closing PR.

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: fbc5669
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=759954
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 9ae440b
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=759954
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 771ca3d
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=759954
version: 2

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 771ca3d
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=759954
version: 2

Pull request is NOT updated. Failed to apply https://patchwork.kernel.org/project/netdevbpf/list/?series=759954
error message:

Cmd('git') failed due to: exit code(-15)
  cmdline: git am --3way
  stdout: 'Applying: bpf: Rename few bpf_mem_alloc fields.
Applying: bpf: Simplify code of destroy_mem_alloc() with kmemdup().
Applying: bpf: Let free_all() return the number of freed elements.
Applying: bpf: Refactor alloc_bulk().
Applying: bpf: Further refactor alloc_bulk().
Applying: bpf: Optimize moving objects from free_by_rcu_ttrace to waiting_for_gp_ttrace.
Applying: bpf: Change bpf_mem_cache draining process.
Applying: bpf: Add a hint to allocated objects.
Applying: bpf: Allow reuse from waiting_for_gp_ttrace list.
Applying: rcu: Export rcu_request_urgent_qs_task()
Applying: selftests/bpf: Improve test coverage of bpf_mem_alloc.
Applying: bpf: Introduce bpf_mem_free_rcu() similar to kfree_rcu().
Applying: bpf: Convert bpf_cpumask to bpf_mem_cache_free_rcu.'
  stderr: 'Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.'

conflict:


@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: da1a055
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=760789
version: 3

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 44b8bfa
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=760789
version: 3

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 2597a25
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=760789
version: 3

Alexei Starovoitov and others added 12 commits June 30, 2023 07:13
Rename:
-       struct rcu_head rcu;
-       struct llist_head free_by_rcu;
-       struct llist_head waiting_for_gp;
-       atomic_t call_rcu_in_progress;
+       struct llist_head free_by_rcu_ttrace;
+       struct llist_head waiting_for_gp_ttrace;
+       struct rcu_head rcu_ttrace;
+       atomic_t call_rcu_ttrace_in_progress;
...
-	static void do_call_rcu(struct bpf_mem_cache *c)
+	static void do_call_rcu_ttrace(struct bpf_mem_cache *c)

to better indicate intended use.

The 'tasks trace' is shortened to 'ttrace' to reduce verbosity.
No functional changes.

Later patches will add free_by_rcu/waiting_for_gp fields to be used with normal RCU.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Use kmemdup() to simplify the code.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Let free_all() helper return the number of freed elements.
It's not used in this patch, but helps in debug/development of bpf_mem_alloc.

For example this diff for __free_rcu():
-       free_all(llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size);
+       printk("cpu %d freed %d objs after tasks trace\n", raw_smp_processor_id(),
+       	free_all(llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size));

would show how busy RCU tasks trace is.
In artificial benchmark where one cpu is allocating and different cpu is freeing
the RCU tasks trace won't be able to keep up and the list of objects
would keep growing from thousands to millions and eventually OOMing.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Factor out inner body of alloc_bulk into separate helper.
No functioncal changes.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Factor out local_inc/dec_return(&c->active) into helpers.
No functional changes.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In certain scenarios alloc_bulk() migth be taking free objects mainly from
free_by_rcu_ttrace list. In such case get_memcg() and set_active_memcg() are
redundant, but they show up in perf profile. Split the loop and only set memcg
when allocating from slab. No performance difference in this patch alone, but
it helps in combination with further patches.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The next patch will introduce cross-cpu llist access and existing
irq_work_sync() + drain_mem_cache() + rcu_barrier_tasks_trace() mechanism will
not be enough, since irq_work_sync() + drain_mem_cache() on cpu A won't
guarantee that llist on cpu A are empty. The free_bulk() on cpu B might add
objects back to llist of cpu A. Add 'bool draining' flag.
The modified sequence looks like:
for_each_cpu:
  WRITE_ONCE(c->draining, true); // do_call_rcu_ttrace() won't be doing call_rcu() any more
  irq_work_sync(); // wait for irq_work callback (free_bulk) to finish
  drain_mem_cache(); // free all objects
rcu_barrier_tasks_trace(); // wait for RCU callbacks to execute

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
To address OOM issue when one cpu is allocating and another cpu is freeing add
a target bpf_mem_cache hint to allocated objects and when local cpu free_llist
overflows free to that bpf_mem_cache. The hint addresses the OOM while
maintaing the same performance for common case when alloc/free are done on the
same cpu.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
alloc_bulk() can reuse elements from free_by_rcu_ttrace.
Let it reuse from waiting_for_gp_ttrace as well to avoid unnecessary kmalloc().

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
If a CPU is executing a long series of non-sleeping system calls,
RCU grace periods can be delayed for on the order of a couple hundred
milliseconds.  This is normally not a problem, but if each system call
does a call_rcu(), those callbacks can stack up.  RCU will eventually
notice this callback storm, but use of rcu_request_urgent_qs_task()
allows the code invoking call_rcu() to give RCU a heads up.

This function is not for general use, not yet, anyway.

Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf_obj_new() calls bpf_mem_alloc(), but doing alloc/free of 8 elements
is not triggering watermark conditions in bpf_mem_alloc.
Increase to 200 elements to make sure alloc_bulk/free_bulk is exercised.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Introduce bpf_mem_[cache_]free_rcu() similar to kfree_rcu().
Unlike bpf_mem_[cache_]free() that links objects for immediate reuse into
per-cpu free list the _rcu() flavor waits for RCU grace period and then moves
objects into free_by_rcu_ttrace list where they are waiting for RCU
task trace grace period to be freed into slab.

The life cycle of objects:
alloc: dequeue free_llist
free: enqeueu free_llist
free_rcu: enqueue free_by_rcu -> waiting_for_gp
free_llist above high watermark -> free_by_rcu_ttrace
after RCU GP waiting_for_gp -> free_by_rcu_ttrace
free_by_rcu_ttrace -> waiting_for_gp_ttrace -> slab

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 539c7e6
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=760789
version: 3

Convert bpf_cpumask to bpf_mem_cache_free_rcu.
Note that migrate_disable() in bpf_cpumask_release() is still necessary, since
bpf_cpumask_release() is a dtor. bpf_obj_free_fields() can be converted to do
migrate_disable() there in a follow up.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: David Vernet <void@manifault.com>
@kernel-patches-daemon-bpf
Copy link
Author

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=760789 expired. Closing PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant