forked from svenkatr/linux
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
kprobes: Introduce kprobe cache to reduce cache misshits
Introduce kprobe cache to reduce cache misshits for massive multiple kprobes. For stress testing kprobes, we need to activate kprobes as many as possible. This situation causes cache miss hit storm on kprobe hash-list. kprobe hashlist is already enlarged to 512 entries and this is still small for 40k kprobes. For example, when registering 40k probes on the hlist and enabling 20k probes, perf tools shows still a lot of cache-misses are on the get_kprobe. ---- Samples: 633 of event 'cache-misses', Event count (approx.): 3414776 + 68.13% [k] get_kprobe + 4.38% [k] ftrace_lookup_ip + 2.54% [k] kprobe_ftrace_handler ---- Also, I found that the most of the kprobes are not hit. In that case, to reduce cache-misses, we can reduce the random memory access by introducing a per-cpu cache which caches the address of frequently used kprobe data structure and its probe address. With kpcache enabled, the get_kprobe_cached goes down to around 4-5% of cache-misses with 20k probes. ---- Samples: 729 of event 'cache-misses', Event count (approx.): 690125 + 14.49% [k] ftrace_lookup_ip + 5.61% [k] kprobe_trace_func + 5.17% [k] kprobe_ftrace_handler + 4.62% [k] get_kprobe_cached ---- Of course this reduces the enabling time too. Without this fix (just enlarge hash table): (2934 sec, 1 min intervals for each 2000 probes enabled) ---- Enabling trace events: start at 1393921862 0 1393921864 a2mp_chan_alloc_skb_cb_38581 ... 19999 1393924928 nfs4_open_confirm_done_11785 ---- With this fix: (2025 sec, 1 min intervals for each 2000 probes enabled) ---- Enabling trace events: start at 1393912623 0 1393912625 a2mp_chan_alloc_skb_cb_38800 .... 19999 1393914648 nfs2_xdr_dec_readlinkres_11628 ---- This patch implements a simple per-cpu 4way/512entry cache for kprobes hlist. All get_kprobe on hot-path uses the cache and if the cache miss-hit, it searches kprobes on the hlist and inserts the found kprobes to the cache entry. When removing kprobes, it clears cache entries by using IPI, because it is per-cpu cache. Note that this consumes some amount of memory (34KB per cpu) compared with previous one (4KB total) only for kprobes. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
- Loading branch information
Showing
4 changed files
with
117 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters