-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpf: Introduce bpf_mem_cache_free_rcu(). #5236
Conversation
Upstream branch: 6d1dd92 |
a91273e
to
a951bb1
Compare
Upstream branch: 970308a |
3e6d1d5
to
53dd4cf
Compare
a951bb1
to
0d1a06b
Compare
Upstream branch: 3d5786e |
53dd4cf
to
423c3a7
Compare
At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=758907 expired. Closing PR. |
Upstream branch: fbc5669 |
423c3a7
to
f3f63a2
Compare
58f98c2
to
fbc1674
Compare
Upstream branch: 9ae440b |
f3f63a2
to
0e2db83
Compare
fbc1674
to
615c86e
Compare
Upstream branch: 771ca3d |
0e2db83
to
2cf54b4
Compare
Upstream branch: 771ca3d Pull request is NOT updated. Failed to apply https://patchwork.kernel.org/project/netdevbpf/list/?series=759954
conflict:
|
4441a16
to
890e3cd
Compare
Upstream branch: da1a055 |
7a8787b
to
d49ba75
Compare
890e3cd
to
761437a
Compare
Upstream branch: 44b8bfa |
d49ba75
to
bbabad8
Compare
761437a
to
53e5945
Compare
Upstream branch: 2597a25 |
bbabad8
to
1b54852
Compare
53e5945
to
3393350
Compare
Rename: - struct rcu_head rcu; - struct llist_head free_by_rcu; - struct llist_head waiting_for_gp; - atomic_t call_rcu_in_progress; + struct llist_head free_by_rcu_ttrace; + struct llist_head waiting_for_gp_ttrace; + struct rcu_head rcu_ttrace; + atomic_t call_rcu_ttrace_in_progress; ... - static void do_call_rcu(struct bpf_mem_cache *c) + static void do_call_rcu_ttrace(struct bpf_mem_cache *c) to better indicate intended use. The 'tasks trace' is shortened to 'ttrace' to reduce verbosity. No functional changes. Later patches will add free_by_rcu/waiting_for_gp fields to be used with normal RCU. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Use kmemdup() to simplify the code. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Let free_all() helper return the number of freed elements. It's not used in this patch, but helps in debug/development of bpf_mem_alloc. For example this diff for __free_rcu(): - free_all(llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size); + printk("cpu %d freed %d objs after tasks trace\n", raw_smp_processor_id(), + free_all(llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size)); would show how busy RCU tasks trace is. In artificial benchmark where one cpu is allocating and different cpu is freeing the RCU tasks trace won't be able to keep up and the list of objects would keep growing from thousands to millions and eventually OOMing. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Factor out inner body of alloc_bulk into separate helper. No functioncal changes. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Factor out local_inc/dec_return(&c->active) into helpers. No functional changes. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In certain scenarios alloc_bulk() migth be taking free objects mainly from free_by_rcu_ttrace list. In such case get_memcg() and set_active_memcg() are redundant, but they show up in perf profile. Split the loop and only set memcg when allocating from slab. No performance difference in this patch alone, but it helps in combination with further patches. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The next patch will introduce cross-cpu llist access and existing irq_work_sync() + drain_mem_cache() + rcu_barrier_tasks_trace() mechanism will not be enough, since irq_work_sync() + drain_mem_cache() on cpu A won't guarantee that llist on cpu A are empty. The free_bulk() on cpu B might add objects back to llist of cpu A. Add 'bool draining' flag. The modified sequence looks like: for_each_cpu: WRITE_ONCE(c->draining, true); // do_call_rcu_ttrace() won't be doing call_rcu() any more irq_work_sync(); // wait for irq_work callback (free_bulk) to finish drain_mem_cache(); // free all objects rcu_barrier_tasks_trace(); // wait for RCU callbacks to execute Signed-off-by: Alexei Starovoitov <ast@kernel.org>
To address OOM issue when one cpu is allocating and another cpu is freeing add a target bpf_mem_cache hint to allocated objects and when local cpu free_llist overflows free to that bpf_mem_cache. The hint addresses the OOM while maintaing the same performance for common case when alloc/free are done on the same cpu. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
alloc_bulk() can reuse elements from free_by_rcu_ttrace. Let it reuse from waiting_for_gp_ttrace as well to avoid unnecessary kmalloc(). Signed-off-by: Alexei Starovoitov <ast@kernel.org>
If a CPU is executing a long series of non-sleeping system calls, RCU grace periods can be delayed for on the order of a couple hundred milliseconds. This is normally not a problem, but if each system call does a call_rcu(), those callbacks can stack up. RCU will eventually notice this callback storm, but use of rcu_request_urgent_qs_task() allows the code invoking call_rcu() to give RCU a heads up. This function is not for general use, not yet, anyway. Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf_obj_new() calls bpf_mem_alloc(), but doing alloc/free of 8 elements is not triggering watermark conditions in bpf_mem_alloc. Increase to 200 elements to make sure alloc_bulk/free_bulk is exercised. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Introduce bpf_mem_[cache_]free_rcu() similar to kfree_rcu(). Unlike bpf_mem_[cache_]free() that links objects for immediate reuse into per-cpu free list the _rcu() flavor waits for RCU grace period and then moves objects into free_by_rcu_ttrace list where they are waiting for RCU task trace grace period to be freed into slab. The life cycle of objects: alloc: dequeue free_llist free: enqeueu free_llist free_rcu: enqueue free_by_rcu -> waiting_for_gp free_llist above high watermark -> free_by_rcu_ttrace after RCU GP waiting_for_gp -> free_by_rcu_ttrace free_by_rcu_ttrace -> waiting_for_gp_ttrace -> slab Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Upstream branch: 539c7e6 |
Convert bpf_cpumask to bpf_mem_cache_free_rcu. Note that migrate_disable() in bpf_cpumask_release() is still necessary, since bpf_cpumask_release() is a dtor. bpf_obj_free_fields() can be converted to do migrate_disable() there in a follow up. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: David Vernet <void@manifault.com>
1b54852
to
3d22fea
Compare
At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=760789 expired. Closing PR. |
Pull request for series with
subject: bpf: Introduce bpf_mem_cache_free_rcu().
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=758907