perf
is a tool to analyze the performance of applications and of the kernel, on Linux-based systems. It relies on syscall perf_event_open
(http://man7.org/linux/man-pages/man2/perf_event_open.2.html) to access performance monitoring facilities provided by the kernel. These facilities consist in:
- tracepoints (probes) in the kernel, the C library (glibc), some interpreters, etc.
- processor counters from the PMU (Performance Instrumentation Unit), like Intel PMC (Performance Monitoring Counter), replaced by Intel PCM (Processor Counter Monitor), or PIC (Performance Instrumentation Counter)
- hardware-assisted tracing, like Intel PT (Processor Tracing)
The access of the performance events system by unprivileged users is configured through sysctl kernel.perf_event_paranoid
(file /proc/sys/kernel/perf_event_paranoid
). The value of this setting is documented on https://www.kernel.org/doc/Documentation/sysctl/kernel.txt:
-1
: Allow use of (almost) all events by all users. Ignoremlock
limit afterperf_event_mlock_kb
withoutCAP_IPC_LOCK
>= 0
: Disallowftrace
function tracepoint by users withoutCAP_SYS_ADMIN
. Disallow raw tracepoint access by users withoutCAP_SYS_ADMIN
>= 1
: Disallow CPU event access by users withoutCAP_SYS_ADMIN
>= 2
: Disallow kernel profiling by users withoutCAP_SYS_ADMIN
The tool named perf
works with subcommands (stat
, record
, report
...).
# Enumerate all symbolic event types
perf list
# Look for events related to KVM hypervisor
perf list 'kvm:*'
In order to collect several statistics about a command:
perf stat $COMMAND
Example with uname
:
# perf stat uname
Linux
Performance counter stats for 'uname':
0.50 msec task-clock # 0.551 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
67 page-faults # 0.133 M/sec
1,837,945 cycles # 3.656 GHz
1,266,497 instructions # 0.69 insn per cycle
284,608 branches # 566.071 M/sec
8,956 branch-misses # 3.15% of all branches
0.000911814 seconds time elapsed
0.001001000 seconds user
0.000000000 seconds sys
In order to record a trace of a command:
perf record $COMMAND
# --branch-any: enable taken branch stack sampling
# --call-graph=dwarf: enable call-graph (stack chain/backtrace) recording with DWARF information
perf record --branch-any --call-graph=dwarf $COMMAND
# Record a running process during 30 seconds
# -a = --all-cpus: system-wide collection from all CPUs
# -g (like --call-graph=fp): enable call-graph (stack chain/backtrace) recording
# -p = --pid: record events on existing process ID (comma separated list)
timeout 30s perf record -a -g -p $(pidof $MYPROCESS)
This creates a file named perf.data
, that can be analyzed with other subcommands.
# Show perf.data in an ncurses browser (TUI) if possible
perf report
# Dump the raw trace in ASCII
perf report -D
perf report --dump-raw-trace
# Display the trace output
perf script
# Show perf.data as:
# * a text report
# * with a column for sample count
# * with call stacks
# * with data coalesced and percentages
perf report --stdio -n -g folded
# List fields of header if the record was done with option -a
perf script --header -F comm,pid,tid,cpu,time,event,ip,sym,dso
The trace can also be analyzed with a GUI such as https://github.com/KDAB/hotspot.
When Intel PT (Processor Tracing) is available on the CPU, the following commands can be used to trace a program (from https://lkml.org/lkml/2019/11/27/160):
perf record -e '{intel_pt//,cpu/mem_inst_retired.all_loads,aux-sample-size=8192/pp}:u' $COMMAND
perf script -F +brstackinsn --xed --itrace=i1usl100
More recent versions of perf
introduced an equivalent of strace
without using the ptrace
syscall:
perf trace --call-graph=dwarf $COMMAND
# Or, with perf record:
perf record -e 'raw_syscalls:*' $COMMAND
# Trace with "augmented syscalls" (in order to see string parameters, for example)
perf trace -e /usr/lib/perf/examples/bpf/augmented_raw_syscalls.c $COMMAND
Using https://github.com/brendangregg/FlameGraph, it is very simple to produce a flamegraph out of a trace. This can be useful for example to find in a program what functions take much time and need to be better optimized.
# Record stack samples at 99 Hertz during 60 seconds
# (both userspace and kernel-space stacks, all processes)
perf record -F 99 -a -g -- sleep 60
# Fold the stacks into a text file
perf script | ./stackcollapse-perf.pl --all > out.folded
# Filter on names of processes, functions... and create a flamegraph
grep my_application < out.folded | ./flamegraph.pl --color=java > graph.svg
Another project enables producing flamegraphs for Rust projects: https://github.com/ferrous-systems/flamegraph
- https://perf.wiki.kernel.org/index.php/Tutorial perf Wiki - Tutorial
- http://www.brendangregg.com/perf.html Linux perf Examples, documentation, links, and more!
- http://www.brendangregg.com/flamegraphs.html Flame Graphs
- https://github.com/brendangregg/perf-tools perf-tools GitHub project
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/perf-record.txt perf-record man page
- https://alexandrnikitin.github.io/blog/transparent-hugepages-measuring-the-performance-impact/ Transparent Hugepages: measuring the performance impact
- https://twitter.com/b0rk/status/945900285460926464 perf cheat sheet by ulia Evans