uniprof is a stack tracer/profiler for Xen domains, primarily, but not exclusively, for unikernels. It allows producing stack traces of a running domU, at specified sampling rates, from outside (typically dom0). These traces can then be aggregated to produce profiling information and visualization, for example, via flame graphs.
If you already have Xen and its tools installed (and why wouldn't you if you want to use this tool), then you can simply do the following:
./configure
make
Afterwards, run ./uniprof --help for an overview of the command line.
uniprof supports both x86 and ARM.
The configure script has several options that you can set to influence the build.
These two options control which library uniprof is built against. There are two options: either libxc, the classic, but less performant approach, and libxencall/libxenforeignmemory, introduced with Xen 4.7, that are faster, lower-level libraries to issue hypercalls and map memory between domains. By default, libxencall/libxenforeignmemory are used if they are available. You can enforce using libxc by giving the --with-libxc option to configure. Note that this will not work when building on ARM, because libxc does not provide the necessary memory address translation functionality for ARM. (This also means that you have to use Xen 4.7 or newer on ARM.)
Instead of using the system headers and libraries installed by Xen, you can also build against a Xen source tree. Make sure you at the very least ran a "make tools" inside that source tree, so that the required libraries are available.
If the configure script finds the libunwind-xen library, it will by default add support for it to uniprof. If, for some reason, you do not want this behavior, you can disable building against libunwind.
As a first test, start a unikernel domain, note its domid, and run
./uniprof -F 1 -T 1 - [domid]
You should see an output of a stack trace. If you don't, then you probably compiled your unikernel with -fomit-frame-pointer. Either recompile your unikernel with -fno-omit-frame-pointer, or see below on how to use a specially patched libunwind to unwind the stack.
However, even then, you will notice that you only get a bunch of memory addresses as output, similar to this:
0x422c3c
0x413930
0x41953c
This isn't exactly helpful. Indeed, you need a symbol table to translate
addresses into function names. To create that, locate your unikernel binary
and run nm -n [image] > [image].syms
on it. For this, your binary must NOT
be stripped! You are, however, welcome to strip it after running nm
, if
you prefer.
You now have two options: online resolution or offline resolution. Online
resolution means to do the symbol resolution during the stack trace. This is
helpful for rapid checking of functionality and the occasional single stack
trace. To do so, add the -s [symbolfile]
command line option to
uniprof. You will now see more helpful output:
block_domain+0x88
schedule+0x230
xenbus_thread_func+0x128
However, this incurs a slight performance overhead, since each
address must be resolved to a symbol during the stack walk. For performance
profiling, you might therefore prefer to only record the addresses and resolve
them offline after finishing the profiling run. To do so, use the
symbolize
tool provided.
If you cannot or do not want to use the frame pointer register to unwind the
stack, you can use a specially patched version of libunwind (available at
https://github.com/sysml/libunwind) to do the unwinding. This requires
you to have the ELF binary available, and that binary to contain an .eh_frame
section (which generally is there by default). uniprof then compares the
current IP register to information in the .eh_frame
to assess the size of the
currently running function's stack frame. It then restores the return address
and iterates the lookup process until the end of the stack is reached.
Note that this is currently significantly slower than the frame pointer method, due to overhead introduced by the libunwind core functions. However, it allows you to create stack traces for VMs that don't use the frame pointer.
To use this feature, use the -e
or -E
option when starting uniprof,
providing the ELF binary of the kernel as parameter. If the binary is
unstripped, the -E
option will also give you symbol resolution.
If your Xen domain is not a unikernel, but rather a standard kernel with
user-space tools running concurrently, you can still use uniprof, albeit with
some limitations. First of all, you will need to acquire a symbol table, which
you will need to resolve the stack addresses to function names. Especially in
case of dynamically-loaded parts of the kernel (e.g., Linux kernel modules),
you cannot simply rely on a statically-created symbol table via the nm
method. For Linux, you will instead need the dynamic symbol table, which you
can retrieve from /proc/kallsyms if the kernel was compiled with
CONFIG_KALLSYMS. Note that this only allows you to resolve kernel symbols;
userspace symbols are still non-resolved, so you will need to filter those out
from your traces to create useful profiling. When using the Linux kernel, you
will probably have more success and more meaningful results if you set up
perf
profiling inside your domain, but uniprof can still help traving from
the outside in case tracing from the inside is impossible (e.g., tracking down
insidious kernel bugs) or when using operating systems with less-developed
profiling tools.
I presented uniprof at the Xen Developer and Design Summit 2017. Here's a video of the talk, and the slide set.
You can also get a more general overview and problem statement from the two-page summary of uniprof written as a poster abstract for SIGCOMM 2017.