Permalink
Cannot retrieve contributors at this time
Fetching contributors…
| Demonstrations of stackcount, the Linux eBPF/bcc version. | |
| This program traces functions and frequency counts them with their entire | |
| stack trace, summarized in-kernel for efficiency. For example, counting | |
| stack traces that led to submit_bio(), which creates block device I/O: | |
| # ./stackcount submit_bio | |
| Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end. | |
| ^C | |
| submit_bio | |
| submit_bh | |
| journal_submit_commit_record.isra.13 | |
| jbd2_journal_commit_transaction | |
| kjournald2 | |
| kthread | |
| ret_from_fork | |
| mb_cache_list | |
| 1 | |
| submit_bio | |
| __block_write_full_page.constprop.39 | |
| block_write_full_page | |
| blkdev_writepage | |
| __writepage | |
| write_cache_pages | |
| generic_writepages | |
| do_writepages | |
| __writeback_single_inode | |
| writeback_sb_inodes | |
| __writeback_inodes_wb | |
| 2 | |
| submit_bio | |
| __block_write_full_page.constprop.39 | |
| block_write_full_page | |
| blkdev_writepage | |
| __writepage | |
| write_cache_pages | |
| generic_writepages | |
| do_writepages | |
| __filemap_fdatawrite_range | |
| filemap_fdatawrite | |
| fdatawrite_one_bdev | |
| 36 | |
| submit_bio | |
| submit_bh | |
| jbd2_journal_commit_transaction | |
| kjournald2 | |
| kthread | |
| ret_from_fork | |
| mb_cache_list | |
| 38 | |
| submit_bio | |
| ext4_writepages | |
| do_writepages | |
| __filemap_fdatawrite_range | |
| filemap_flush | |
| ext4_alloc_da_blocks | |
| ext4_rename | |
| ext4_rename2 | |
| vfs_rename | |
| sys_rename | |
| entry_SYSCALL_64_fastpath | |
| 79 | |
| Detaching... | |
| The output shows unique stack traces, in order from leaf (on-CPU) to root, | |
| followed by their occurrence count. The last stack trace in the above output | |
| shows syscall handling, ext4_rename(), and filemap_flush(): looks like an | |
| application issued file rename has caused back end disk I/O due to ext4 | |
| block allocation and a filemap_flush(). I'd have to browse the code to those | |
| functions to confirm! | |
| The order of printed stack traces is from least to most frequent. The most | |
| frequent in this case, the ext4_rename() stack, was taken 79 times during | |
| tracing. | |
| It can be useful to trace the path to submit_bio to explain unusual rates of | |
| disk IOPS. These could have in-kernel origins (eg, background scrub). | |
| As another example, here are the code paths that led to ip_output(), which | |
| sends a packet at the IP level: | |
| # ./stackcount ip_output | |
| Tracing 1 functions for "ip_output"... Hit Ctrl-C to end. | |
| ^C | |
| ip_output | |
| ip_queue_xmit | |
| tcp_transmit_skb | |
| tcp_write_xmit | |
| tcp_tsq_handler.part.32 | |
| tcp_tasklet_func | |
| tasklet_action | |
| __do_softirq | |
| do_softirq_own_stack | |
| do_softirq | |
| __local_bh_enable_ip | |
| 1 | |
| ip_output | |
| ip_queue_xmit | |
| tcp_transmit_skb | |
| tcp_send_ack | |
| tcp_send_delayed_ack | |
| __tcp_ack_snd_check | |
| tcp_rcv_established | |
| tcp_v4_do_rcv | |
| tcp_v4_rcv | |
| ip_local_deliver_finish | |
| ip_local_deliver | |
| 1 | |
| ip_output | |
| ip_queue_xmit | |
| tcp_transmit_skb | |
| tcp_send_ack | |
| tcp_send_delayed_ack | |
| __tcp_ack_snd_check | |
| tcp_rcv_established | |
| tcp_v4_do_rcv | |
| tcp_v4_rcv | |
| ip_local_deliver_finish | |
| ip_local_deliver | |
| 1 | |
| ip_output | |
| ip_queue_xmit | |
| tcp_transmit_skb | |
| tcp_send_ack | |
| tcp_delack_timer_handler | |
| tcp_delack_timer | |
| call_timer_fn | |
| run_timer_softirq | |
| __do_softirq | |
| irq_exit | |
| xen_evtchn_do_upcall | |
| 1 | |
| ip_output | |
| ip_queue_xmit | |
| tcp_transmit_skb | |
| tcp_write_xmit | |
| __tcp_push_pending_frames | |
| tcp_rcv_established | |
| tcp_v4_do_rcv | |
| release_sock | |
| tcp_sendmsg | |
| inet_sendmsg | |
| sock_sendmsg | |
| 3 | |
| ip_output | |
| ip_queue_xmit | |
| tcp_transmit_skb | |
| tcp_write_xmit | |
| tcp_tsq_handler.part.32 | |
| tcp_tasklet_func | |
| tasklet_action | |
| __do_softirq | |
| run_ksoftirqd | |
| smpboot_thread_fn | |
| kthread | |
| 3 | |
| ip_output | |
| ip_queue_xmit | |
| tcp_transmit_skb | |
| tcp_write_xmit | |
| __tcp_push_pending_frames | |
| tcp_rcv_established | |
| tcp_v4_do_rcv | |
| tcp_v4_rcv | |
| ip_local_deliver_finish | |
| ip_local_deliver | |
| ip_rcv_finish | |
| 4 | |
| ip_output | |
| ip_queue_xmit | |
| tcp_transmit_skb | |
| tcp_send_ack | |
| tcp_cleanup_rbuf | |
| tcp_recvmsg | |
| inet_recvmsg | |
| sock_recvmsg | |
| sock_read_iter | |
| __vfs_read | |
| vfs_read | |
| 5 | |
| ip_output | |
| ip_queue_xmit | |
| tcp_transmit_skb | |
| tcp_write_xmit | |
| __tcp_push_pending_frames | |
| tcp_rcv_established | |
| tcp_v4_do_rcv | |
| release_sock | |
| tcp_sendmsg | |
| inet_sendmsg | |
| sock_sendmsg | |
| 9 | |
| ip_output | |
| ip_queue_xmit | |
| tcp_transmit_skb | |
| tcp_write_xmit | |
| __tcp_push_pending_frames | |
| tcp_push | |
| tcp_sendmsg | |
| inet_sendmsg | |
| sock_sendmsg | |
| sock_write_iter | |
| __vfs_write | |
| 51 | |
| ip_output | |
| ip_queue_xmit | |
| tcp_transmit_skb | |
| tcp_write_xmit | |
| __tcp_push_pending_frames | |
| tcp_rcv_established | |
| tcp_v4_do_rcv | |
| tcp_v4_rcv | |
| ip_local_deliver_finish | |
| ip_local_deliver | |
| ip_rcv_finish | |
| 171 | |
| ip_output | |
| ip_queue_xmit | |
| tcp_transmit_skb | |
| tcp_write_xmit | |
| __tcp_push_pending_frames | |
| tcp_rcv_established | |
| tcp_v4_do_rcv | |
| tcp_v4_rcv | |
| ip_local_deliver_finish | |
| ip_local_deliver | |
| ip_rcv_finish | |
| 994 | |
| ip_output | |
| ip_queue_xmit | |
| tcp_transmit_skb | |
| tcp_write_xmit | |
| tcp_tsq_handler.part.32 | |
| tcp_tasklet_func | |
| tasklet_action | |
| __do_softirq | |
| irq_exit | |
| xen_evtchn_do_upcall | |
| xen_do_hypervisor_callback | |
| 1002 | |
| Detaching... | |
| The last two most frequent stack traces are via tcp_transmit_skb(). Note the | |
| send last begins with ip_rcv_finish(), for a local receive, which then becomes | |
| a transmit: likely pushing more frames when processing the newly received ones. | |
| As may be obvious, this is a great tool for quickly understanding kernel code | |
| flow. | |
| User-space functions can also be traced if a library name is provided. For | |
| example, to quickly identify code locations that allocate heap memory: | |
| # ./stackcount -l c -p 4902 malloc | |
| Tracing 1 functions for "malloc"... Hit Ctrl-C to end. | |
| ^C | |
| malloc | |
| rbtree_new | |
| main | |
| [unknown] | |
| 12 | |
| malloc | |
| _rbtree_node_new_internal | |
| _rbtree_node_insert | |
| rbtree_insert | |
| main | |
| [unknown] | |
| 1189 | |
| Detaching... | |
| Note that user-space uses of stackcount can be somewhat more limited because | |
| a lot of user-space libraries and binaries are compiled without debuginfo, or | |
| with frame-pointer omission (-fomit-frame-pointer), which makes it impossible | |
| to reliably obtain the stack trace. | |
| In addition to kernel and user-space functions, kernel tracepoints and USDT | |
| tracepoints are also supported. | |
| For example, to determine where threads are being created in a particular | |
| process, use the pthread_create USDT tracepoint: | |
| # ./stackcount -p $(pidof parprimes) u:pthread:pthread_create | |
| Tracing 1 functions for "u:pthread:pthread_create"... Hit Ctrl-C to end. | |
| ^C | |
| parprimes [11923] | |
| pthread_create@@GLIBC_2.2.5 | |
| main | |
| __libc_start_main | |
| [unknown] | |
| 7 | |
| Similarly, to determine where context switching is happening in the kernel, | |
| use the sched:sched_switch kernel tracepoint: | |
| # ./stackcount t:sched:sched_switch | |
| ... (omitted for brevity) | |
| __schedule | |
| schedule | |
| schedule_hrtimeout_range_clock | |
| schedule_hrtimeout_range | |
| poll_schedule_timeout | |
| do_select | |
| core_sys_select | |
| SyS_select | |
| entry_SYSCALL_64_fastpath | |
| 40 | |
| __schedule | |
| schedule | |
| schedule_preempt_disabled | |
| cpu_startup_entry | |
| start_secondary | |
| 85 | |
| A -i option can be used to set an output interval, and -T to include a | |
| timestamp. For example: | |
| # ./stackcount -Ti 1 submit_bio | |
| Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end. | |
| 01:11:37 | |
| submit_bio | |
| submit_bh | |
| journal_submit_commit_record.isra.13 | |
| jbd2_journal_commit_transaction | |
| kjournald2 | |
| kthread | |
| ret_from_fork | |
| mb_cache_list | |
| 1 | |
| submit_bio | |
| ext4_writepages | |
| do_writepages | |
| __filemap_fdatawrite_range | |
| filemap_flush | |
| ext4_alloc_da_blocks | |
| ext4_rename | |
| ext4_rename2 | |
| vfs_rename | |
| sys_rename | |
| entry_SYSCALL_64_fastpath | |
| 20 | |
| submit_bio | |
| submit_bh | |
| jbd2_journal_commit_transaction | |
| kjournald2 | |
| kthread | |
| ret_from_fork | |
| mb_cache_list | |
| 39 | |
| 01:11:38 | |
| submit_bio | |
| ext4_writepages | |
| do_writepages | |
| __filemap_fdatawrite_range | |
| filemap_flush | |
| ext4_alloc_da_blocks | |
| ext4_rename | |
| ext4_rename2 | |
| vfs_rename | |
| sys_rename | |
| entry_SYSCALL_64_fastpath | |
| 20 | |
| 01:11:39 | |
| submit_bio | |
| ext4_writepages | |
| do_writepages | |
| __filemap_fdatawrite_range | |
| filemap_flush | |
| ext4_alloc_da_blocks | |
| ext4_rename | |
| ext4_rename2 | |
| vfs_rename | |
| sys_rename | |
| entry_SYSCALL_64_fastpath | |
| 20 | |
| ^C | |
| 01:11:39 | |
| submit_bio | |
| ext4_writepages | |
| do_writepages | |
| __filemap_fdatawrite_range | |
| filemap_flush | |
| ext4_alloc_da_blocks | |
| ext4_rename | |
| ext4_rename2 | |
| vfs_rename | |
| sys_rename | |
| entry_SYSCALL_64_fastpath | |
| 20 | |
| Detaching... | |
| The -s output prints the return instruction offset for each function (aka | |
| symbol offset). Eg: | |
| # ./stackcount -s tcp_sendmsg | |
| Tracing 1 functions for "tcp_sendmsg"... Hit Ctrl-C to end. | |
| ^C | |
| tcp_sendmsg0x1 | |
| sock_sendmsg0x38 | |
| sock_write_iter0x78 | |
| __vfs_write0xaa | |
| vfs_write0xa9 | |
| sys_write0x46 | |
| entry_SYSCALL_64_fastpath0x16 | |
| 29 | |
| Detaching... | |
| If it wasn't clear how one function called another, knowing the instruction | |
| offset can help you locate the lines of code from a disassembly dump. | |
| A wildcard can also be used. Eg, all functions beginning with "tcp_send": | |
| # ./stackcount -s 'tcp_send*' | |
| Tracing 14 functions for "tcp_send*"... Hit Ctrl-C to end. | |
| ^C | |
| tcp_send_delayed_ack0x1 | |
| tcp_rcv_established0x3b1 | |
| tcp_v4_do_rcv0x130 | |
| tcp_v4_rcv0x8e0 | |
| ip_local_deliver_finish0x9f | |
| ip_local_deliver0x51 | |
| ip_rcv_finish0x8a | |
| ip_rcv0x29d | |
| __netif_receive_skb_core0x637 | |
| __netif_receive_skb0x18 | |
| netif_receive_skb_internal0x23 | |
| 1 | |
| tcp_send_delayed_ack0x1 | |
| tcp_rcv_established0x222 | |
| tcp_v4_do_rcv0x130 | |
| tcp_v4_rcv0x8e0 | |
| ip_local_deliver_finish0x9f | |
| ip_local_deliver0x51 | |
| ip_rcv_finish0x8a | |
| ip_rcv0x29d | |
| __netif_receive_skb_core0x637 | |
| __netif_receive_skb0x18 | |
| netif_receive_skb_internal0x23 | |
| 4 | |
| tcp_send_mss0x1 | |
| inet_sendmsg0x67 | |
| sock_sendmsg0x38 | |
| sock_write_iter0x78 | |
| __vfs_write0xaa | |
| vfs_write0xa9 | |
| sys_write0x46 | |
| entry_SYSCALL_64_fastpath0x16 | |
| 7 | |
| tcp_sendmsg0x1 | |
| sock_sendmsg0x38 | |
| sock_write_iter0x78 | |
| __vfs_write0xaa | |
| vfs_write0xa9 | |
| sys_write0x46 | |
| entry_SYSCALL_64_fastpath0x16 | |
| 7 | |
| Detaching... | |
| Use -r to allow regular expressions. | |
| USAGE message: | |
| # ./stackcount -h | |
| usage: stackcount [-h] [-p PID] [-i INTERVAL] [-T] [-r] [-s] | |
| [-l LIBRARY] [-v] [-d] pattern | |
| Count function calls and their stack traces | |
| positional arguments: | |
| pattern search expression for functions | |
| optional arguments: | |
| -h, --help show this help message and exit | |
| -p PID, --pid PID trace this PID only | |
| -i INTERVAL, --interval INTERVAL | |
| summary interval, seconds | |
| -T, --timestamp include timestamp on output | |
| -r, --regexp use regular expressions. Default is "*" wildcards | |
| only. | |
| -s, --offset show address offsets | |
| -l, --library trace user-space functions from this library or executable | |
| -v, --verbose show raw addresses | |
| -d, --debug print BPF program before starting (for debugging purposes) | |
| examples: | |
| ./stackcount submit_bio # count kernel stack traces for submit_bio | |
| ./stackcount ip_output # count kernel stack traces for ip_output | |
| ./stackcount -s ip_output # show symbol offsets | |
| ./stackcount -sv ip_output # show offsets and raw addresses (verbose) | |
| ./stackcount 'tcp_send*' # count stacks for funcs matching tcp_send* | |
| ./stackcount -r '^tcp_send.*' # same as above, using regular expressions | |
| ./stackcount -Ti 5 ip_output # output every 5 seconds, with timestamps | |
| ./stackcount -p 185 ip_output # count ip_output stacks for PID 185 only | |
| ./stackcount -p 185 -l c malloc # count stacks for malloc in PID 185 | |
| ./stackcount t:sched:sched_fork # count stacks for the sched_fork tracepoint | |
| ./stackcount -p 185 u:node:* # count stacks for all USDT probes in node |