Skip to content

Commit

Permalink
Merge pull request #353 from brendangregg/master
Browse files Browse the repository at this point in the history
3 tools: fsslower, execsnoop, runqlat
  • Loading branch information
drzaeus77 committed Feb 8, 2016
2 parents 919045a + c4fa80e commit 59d6c35
Show file tree
Hide file tree
Showing 12 changed files with 1,520 additions and 2 deletions.
3 changes: 3 additions & 0 deletions README.md
Expand Up @@ -71,6 +71,8 @@ Tools:
- tools/[biosnoop](tools/biosnoop.py): Trace block device I/O with PID and latency. [Examples](tools/biosnoop_example.txt).
- tools/[bitesize](tools/bitesize.py): Show per process I/O size histogram. [Examples](tools/bitesize_example.txt).
- tools/[cachestat](tools/cachestat.py): Trace page cache hit/miss ratio. [Examples](tools/cachestat_example.txt).
- tools/[execsnoop](tools/execsnoop.py): Trace new processes via exec() syscalls. [Examples](tools/execsnoop_example.txt).
- tools/[fileslower](tools/fileslower.py): Trace slow synchronous file reads and writes. [Examples](tools/fileslower_example.txt).
- tools/[funccount](tools/funccount.py): Count kernel function calls. [Examples](tools/funccount_example.txt).
- tools/[funclatency](tools/funclatency.py): Time kernel functions and show their latency distribution. [Examples](tools/funclatency_example.txt).
- tools/[gethostlatency](tools/gethostlatency.py): Show latency for getaddrinfo/gethostbyname[2] calls. [Examples](tools/gethostlatency_example.txt).
Expand All @@ -80,6 +82,7 @@ Tools:
- tools/[offwaketime](tools/offwaketime.py): Summarize blocked time by kernel off-CPU stack and waker stack. [Examples](tools/offwaketime_example.txt).
- tools/[opensnoop](tools/opensnoop.py): Trace open() syscalls. [Examples](tools/opensnoop_example.txt).
- tools/[pidpersec](tools/pidpersec.py): Count new processes (via fork). [Examples](tools/pidpersec_example.txt).
- tools/[runqlat](tools/runqlat.py): Run queue (scheduler) latency as a histogram. [Examples](tools/runqlat_example.txt).
- tools/[softirqs](tools/softirqs.py): Measure soft IRQ (soft interrupt) event time. [Examples](tools/softirqs_example.txt).
- tools/[stackcount](tools/stackcount.py): Count kernel function calls and their stack traces. [Examples](tools/stackcount_example.txt).
- tools/[stacksnoop](tools/stacksnoop.py): Trace a kernel function and print all kernel stack traces. [Examples](tools/stacksnoop_example.txt).
Expand Down
2 changes: 1 addition & 1 deletion man/man8/biolatency.8
Expand Up @@ -65,7 +65,7 @@ Show a latency histogram for each disk device separately:
usecs
Microsecond range
.TP
mecs
msecs
Millisecond range
.TP
count
Expand Down
88 changes: 88 additions & 0 deletions man/man8/execsnoop.8
@@ -0,0 +1,88 @@
.TH execsnoop 8 "2016-02-07" "USER COMMANDS"
.SH NAME
execsnoop \- Trace new processes via exec() syscalls. Uses Linux eBPF/bcc.
.SH SYNOPSIS
.B execsnoop [\-h] [\-t] [\-X] [\-n NAME]
.SH DESCRIPTION
execsnoop traces new processes, showing the filename executed, argument
list, and return value (0 for success).

It works by traces the execve() system call (commonly used exec() variant).
This catches new processes that follow the fork->exec sequence, as well as
processes that re-exec() themselves. Some applications fork() but do not
exec(), eg, for worker processes, which won't be included in the execsnoop
output.

This works by tracing the kernel sys_execve() function using dynamic tracing,
and will need updating to match any changes to this function.

Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bcc.
.SH OPTIONS
.TP
\-h
Print usage message.
.TP
\-t
Include a timestamp column.
.TP
\-X
Exclude failed exec()s
.TP
\-n NAME
Only print command lines matching this name (regex), matched anywhere
.SH EXAMPLES
.TP
Trace all exec() syscalls:
#
.B execsnoop
.TP
Trace all exec() syscalls, and include timestamps:
#
.B execsnoop \-t
.TP
Only trace successful exec()s:
#
.B execsnoop \-X
.TP
Only trace exec()s where the filename or arguments contain "mount":
#
.B opensnoop \-n mount
.SH FIELDS
.TP
TIME(s)
Time of exec() return, in seconds.
.TP
PCOMM
Parent process/command name.
.TP
PID
Process ID
.TP
RET
Return value of exec(). 0 == successs.
.TP
ARGS
Filename for the exec(), followed be up to 19 arguments. An ellipsis "..." is
shown if the argument list is known to be truncated.
.SH OVERHEAD
This traces the kernel execve function and prints output for each event. As the
rate of this is generally expected to be low (< 1000/s), the overhead is also
expected to be negligible. If you have an application that is calling a high
rate of exec()s, then test and understand overhead before use.
.SH SOURCE
This is from bcc.
.IP
https://github.com/iovisor/bcc
.PP
Also look in the bcc distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
opensnoop(1)
111 changes: 111 additions & 0 deletions man/man8/fileslower.8
@@ -0,0 +1,111 @@
.TH fileslower 8 "2016-02-07" "USER COMMANDS"
.SH NAME
fileslower \- Trace slow synchronous file reads and writes.
.SH SYNOPSIS
.B fileslower [\-h] [\-p PID] [min_ms]
.SH DESCRIPTION
This script uses kernel dynamic tracing of synchronous reads and writes
at the VFS interface, to identify slow file reads and writes for any file
system.

This version traces __vfs_read() and __vfs_write() and only showing
synchronous I/O (the path to new_sync_read() and new_sync_write()), and
I/O with filenames. This approach provides a view of just two file
system request types: file reads and writes. There are typically many others:
asynchronous I/O, directory operations, file handle operations, file open()s,
fflush(), etc.

WARNING: See the OVERHEAD section.

By default, a minimum millisecond threshold of 10 is used.

Since this works by tracing various kernel __vfs_*() functions using dynamic
tracing, it will need updating to match any changes to these functions. A
future version should switch to using FS tracepoints instead.

Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bcc.
.SH OPTIONS
\-p PID
Trace this PID only.
.TP
min_ms
Minimum I/O latency (duration) to trace, in milliseconds. Default is 10 ms.
.SH EXAMPLES
.TP
Trace synchronous file reads and writes slower than 10 ms:
#
.B fileslower
.TP
Trace slower than 1 ms:
#
.B fileslower 1
.TP
Trace slower than 1 ms, for PID 181 only:
#
.B fileslower \-p 181 1
.SH FIELDS
.TP
TIME(s)
Time of I/O completion since the first I/O seen, in seconds.
.TP
COMM
Process name.
.TP
PID
Process ID.
.TP
D
Direction of I/O. R == read, W == write.
.TP
BYTES
Size of I/O, in bytes.
.TP
LAT(ms)
Latency (duration) of I/O, measured from when the application issued it to VFS
to when it completed. This time is inclusive of block device I/O, file system
CPU cycles, file system locks, run queue latency, etc. It's a more accurate
measure of the latency suffered by applications performing file system I/O,
than to measure this down at the block device interface.
.TP
FILENAME
A cached kernel file name (comes from dentry->d_iname).
.SH OVERHEAD
Depending on the frequency of application reads and writes, overhead can become
severe, in the worst case slowing applications by 2x. In the best case, the
overhead is negligible. Hopefully for real world workloads the overhead is
often at the lower end of the spectrum -- test before use. The reason for
high overhead is that this traces VFS reads and writes, which includes FS
cache reads and writes, and can exceed one million events per second if the
application is I/O heavy. While the instrumentation is extremely lightweight,
and uses in-kernel eBPF maps for efficient timing and filtering, multiply that
cost by one million events per second and that cost becomes a million times
worse. You can get an idea of the possible cost by just counting the
instrumented events using the bcc funccount tool, eg:
.PP
# ./funccount.py -i 1 -r '^__vfs_(read|write)$'
.PP
This also costs overhead, but is somewhat less than fileslower.
.PP
If the overhead is prohibitive for your workload, I'd recommend moving
down-stack a little from VFS into the file system functions (ext4, xfs, etc).
Look for updates to bcc for specific file system tools that do this. The
advantage of a per-file system approach is that we can trace post-cache,
greatly reducing events and overhead. The disadvantage is needing custom
tracing approaches for each different file system (whereas VFS is generic).
.SH SOURCE
This is from bcc.
.IP
https://github.com/iovisor/bcc
.PP
Also look in the bcc distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
biosnoop(8), funccount(8)
2 changes: 1 addition & 1 deletion man/man8/funclatency.8
Expand Up @@ -88,7 +88,7 @@ Nanosecond range
usecs
Microsecond range
.TP
mecs
msecs
Millisecond range
.TP
count
Expand Down
105 changes: 105 additions & 0 deletions man/man8/runqlat.8
@@ -0,0 +1,105 @@
.TH runqlat 8 "2016-02-07" "USER COMMANDS"
.SH NAME
runqlat \- Run queue (scheduler) latency as a histogram.
.SH SYNOPSIS
.B runqlat [\-h] [\-T] [\-m] [\-P] [\-p PID] [interval] [count]
.SH DESCRIPTION
This measures the time a task spends waiting on a run queue (or equivalent
scheduler data structure) for a turn on-CPU, and shows this time as a
histogram. This time should be small, but a task may need to wait its turn due
to CPU load. The higher the CPU load, the longer a task will generally need to
wait its turn.

This tool measures two types of run queue latency:

1. The time from a task being enqueued on a run queue to its context switch
and execution. This traces enqueue_task_*() -> finish_task_switch(),
and instruments the run queue latency after a voluntary context switch.

2. The time from when a task was involuntary context switched and still
in the runnable state, to when it next executed. This is instrumented
from finish_task_switch() alone.

This tool uses in-kernel eBPF maps for storing timestamps and the histogram,
for efficiency. Despite this, the overhead of this tool may become significant
for some workloads: see the OVERHEAD section.

This works by tracing various kernel scheduler functions using dynamic tracing,
and will need updating to match any changes to these functions.

Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bcc.
.SH OPTIONS
.TP
\-h
Print usage message.
.TP
\-T
Include timestamps on output.
.TP
\-m
Output histogram in milliseconds.
.TP
\-P
Print a histogram for each PID.
.TP
\-p PID
Only show this PID (filtered in kernel for efficiency).
.TP
interval
Output interval, in seconds.
.TP
count
Number of outputs.
.SH EXAMPLES
.TP
Summarize run queue latency as a histogram:
#
.B runqlat
.TP
Print 1 second summaries, 10 times:
#
.B runqlat 1 10
.TP
Print 1 second summaries, using milliseconds as units for the histogram, and include timestamps on output:
#
.B runqlat \-mT 1
.TP
Trace PID 186 only, 1 second summaries:
#
.B runqlat -P 185 1
.SH FIELDS
.TP
usecs
Microsecond range
.TP
msecs
Millisecond range
.TP
count
How many times a task event fell into this range
.TP
distribution
An ASCII bar chart to visualize the distribution (count column)
.SH OVERHEAD
This traces scheduler functions, which can become very frequent. While eBPF
has very low overhead, and this tool uses in-kernel maps for efficiency, the
frequency of scheduler events for some workloads may be high enough that the
overhead of this tool becomes significant. Measure in a lab environment
to quantify the overhead before use.
.SH SOURCE
This is from bcc.
.IP
https://github.com/iovisor/bcc
.PP
Also look in the bcc distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
pidstat(1)

0 comments on commit 59d6c35

Please sign in to comment.