Krun is a framework for running high-quality software benchmarking experiments. Krun experiments consist of a configuration file, a carefully configured benchmarking machine, and the benchmarks themselves.
Krun currently only runs on Debian Linux and OpenBSD. Porting it to other Unix variants is unlikely to be difficult, and we welcome patches.
You need to have the following programs installed:
- sudo
- Python-2.7
- Python dependencies (see requirements.txt)
- GNU make, a C compiler and libc (
build-essential
package in Debian) - cpufrequtils (Linux only.
cpufrequtils
package in Debian) - cset (for pinning on Linux only.
cpuset
package in Debian) - virt-what (Linux only.
virt-what
package in Debian) - Optionally, our custom Linux kernel (Linux only, see below)
- Linux kernel headers (Linux only.
linux-headers-X.Y
package in Debian) - taskset (Linux only.
util-linux
package in Debian) - msr-tools (Linux only.
msr-tools
package in Debian) - policykit (Linux only, only if you want to use
systemctl start/stop
inPRE/POST_EXECUTION_CMDS
. See Benchmarking for Reliable Results below)
If you choose to use pip
for the Python dependencies, you can install them
using pip -r requirements.txt
.
If you want to benchmark Java, you will also need:
- A Java SDK (
openjdk-*-jdk
package in Debian).
Benchmarking is at its most accurate when Intel p-states are disabled in the kernel. If you are using Grub, this can be achieved as follows:
- Edit
/etc/default/grub
so that theGRUB_CMDLINE_LINUX_DEFAULT
variable includesintel_pstate=disable
. - Run
sudo update-grub
If you are unable to do this, you can disable Krun's P-state check with
--disable-pstate-check
, but be aware that this degrades the quality of the
resulting benchmarking numbers.
If you want low latency access to the following counters:
IA32_PERF_FIXED_CTR1
(the core cycle counter)IA32_APERF
countsIA32_MPERF
counts
you will need to use the (now rather out of date) custom Linux kernel at:
https://github.com/softdevteam/krun-linux-kernel
If you do use this kernel you will need to set MSRS=1
in your Unix
environment when building Krun (see later).
Unless --no-tickless-check
is passed to Krun, all cores but the boot core
are expected to be in full adaptive ticks mode (tickless mode).
On older kernels which still have the CONFIG_NO_HZ_FULL_ALL
config option,
build the kernel with this set to "y".
On newer kernels, where CONFIG_NO_HZ_FULL_ALL
is absent, you will have to add
a nohz_full=1-N
kernel command line option (where N is the number of cores
your system has, minus one).
If your system has only one core then tickless mode is not checked, as the boot core (i.e. your only core) cannot be placed into adaptive ticks mode.
First fetch Krun:
$ git clone --recursive https://github.com/softdevteam/krun.git
$ cd krun
Then run make
(or gmake
on OpenBSD). The Krun Makefile honours the standard
variables: CC
, CPPFLAGS
, CFLAGS
and LDFLAGS
.
If you want to benchmark Java programs, you need to set the
JAVA_HOME
environment variable to point to your JDK installation, and
set several other flags:
$ make clean
$ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ make \
JAVA_CPPFLAGS='"-I${JAVA_HOME}/include -I${JAVA_HOME}/include/linux"' \
JAVA_LDFLAGS=-L${JAVA_HOME}/lib ENABLE_JAVA=1
Background services (e.g. cron
or sendmail
) can interfere with benchmarking.
The more services that you are able to switch off, the less interference is
likely to occur. Some services are best disabled at boot and/or permanently
(depending on your OS) and must be done manually. However, you may wish
to disable some services only during benchmarking (e.g. you may wish to have a mail server
running before and after benchmarking to inform you of benchmarking progress),
which can be specified in the PRE_EXECUTION_CMDS
and POST_EXECUTION_CMDS
settings in your Krun config file.
Commands in each list are run, in order, using the krun
user's shell (e.g.
/bin/sh
). If a command fails, Krun stops execution immediately without running
subsequent commands. If you wish execution to continue even if a command fails
you can use standard shell idioms: e.g. cmd || true
guarantees that the
overall command succeeds even if cmd
fails.
For example on a systemd Linux you may turn daemons off before execution with:
PRE_EXECUTION_CMDS = [
"sudo systemctl stop cron",
"sudo systemctl stop atd",
...
]
and turn them back on with:
POST_EXECUTION_CMDS = [
"sudo systemctl start cron || true",
"sudo systemctl start atd || true",
...
]
In general it is best practise to turn things back on explicitly, because after
Krun runs the final benchmark it will not reboot the machine. If, for example,
you put network interfaces down in PRE_EXECUTION_CMDS
, you should put them
back up in POST_EXECUTION_CMDS
so that you can login to the machine after the
final benchmark has been run. We urge you to check such commands carefully:
small oversights can easily lead to you locking yourself out of the system.
Krun can also copy intermediate results to a remote host and query that host to
see whether it should suspend benchmarking. See
https://github.com/softdevteam/warmup_experiment/blob/master/warmup.krun
for
more advanced options.
Note that Debian has, from a benchmarking perspective, the unfortunate habit of automatically starting daemons which get pulled in by dependencies.
On Linux, list services with:
# systemctl | grep running
Permanently disable services (including after system reset) with:
# systemctl stop <service>
# systemctl disable <service>
Commonly enabled services that you may wish to disable:
- apache2
- memcached
- nfs-common
On OpenBSD, list services with:
# doas rcctl ls started
Permanently disable services (including after system reset) with:
# rcctl stop <service>
# rcctl disable <service>
Commonly enabled services that you may wish to disable:
- pflogd
- sndiod
Note that Krun is unable to check whether turbo mode is enabled on OpenBSD or not, and is also unable to use APERF/MPERF ratios to indirectly check whether turbo mode was used. You should therefore be particularly careful to check that turbo mode is disabled when benchmarking on OpenBSD.
The examples
directory contains the example.krun
experiment. This contains
two benchmark programs (nbody and dummy), both of which are run on C and
PyPy. Each benchmark is run for 5 in-process iterations (where the
benchmark is repeated 5 times within a for loop within a single process) across
2 process executions (where the entire VM is restarted).
First build the examples:
$ cd examples/benchmarks
$ pwd
.../krun/examples/benchmarks
$ make
If you also want to try the example Java benchmarks, you must build them as a separate step. Additionally, be sure to have compiled Krun with Java support as described in Step 3
$ pwd
.../krun/examples/benchmarks
$ make java-bench
Then run the example:
$ cd krun/examples
$ ../krun.py example.krun
You should see a log scroll past, and results will be stored in the file:
../krun/examples/example_results.json.bz2
.
If you want to try the example Java benchmarks, there is a separate
configuration file called java.krun
, which contains configuration for the Java
and Python examples:
$ ../krun.py java.krun
Note, this will only work if you have followed the earlier steps to compile Krun with Java support.
Krun runs benchmarks under a new Unix user krun
, which is wiped and re-added
before every experiment. Your Krun build and your experiment must both be
readable by the krun
user for the experiment to run.
It is easiest to use examples/example.krun
as a template for your own,
new, experiments. Note that: the benchmarks referenced in the config file
must be in a benchmarks
subdirectory; and that each benchmark in
the config must be in a subsubdirectory with a matching name. A typical
directory structure is therefore as follows:
experiment/
experiment.krun
benchmarks/
Makefile
benchmark_1/
language_1/
benchmark_file.lang1
...
The top-level Makefile
should build the VMs and benchmarks needed for the
experiment.
Each benchmark should expose a function (or method) called run_iter
which is
the entry point to the benchmark. To preserve source code history, it can
be easiest to put this function in a new file, which then imports the benchmark.
With regards to VM/compiler support, there are two ways Krun can invoke a benchmark:
- Via a dedicated "VM definition" (e.g.
JavaVMDef
). - Via an external benchmark suite (
ExternalSuiteVMDef
).
The former option is best, as it supports Krun's core-cycle counting and APERF/MPERF ratio features. The following compilers/VMs are currently supported for this mode:
- Native code languages (
NativeCodeVMDef
). - OpenJDK. (i.e. Hotspot) (
JavaVMDef
). - GraalVM (
GraalVMDef
). - cPython (
PythonVMDef
). - Lua (
LuaVMDef
). - PHP (
PHPVMDef
). - Ruby (
RubyVMDef
). - TruffleRuby (
TruffleRubyVMDef
). - Javascript V8 (
V8VMDef
).
If your VM isn't listed, you can either add it to Krun, or use the external
suite definition (see below). To add a new VM definition, add a new class to
krun/vm_defs.py
and a new iteration runner to the iterations_runners
directory.
The latter option -- ExternalSuiteVMDef
-- is useful if you want to quickly
wrap an existing benchmark suite. For an example see examples/ext.krun
and
examples/ext_script.py
.
To add a new platform definition, add a new class to krun/platform.py
.
Before doing a full run of an experiment, you should perform a quick(ish) test of your configuration. This can be achieved with:
$ /path/to/krun/krun.py --dry-run --quick --debug=INFO config.krun
See the "Development and Debug Switches" section for a description of these switches.
Achieving the highest possible benchmarking quality requires more care. First,
none of Krun's debug or development switches must be used. Second, Krun needs to
run in "reboot mode" where each process execution will be run after the machine
has (automatically) rebooted. The simplest way to do this is to have your init
system invoke scripts/run_krun_at_boot
via an rc.local
script.
Your /etc/rc.local
should look like this:
#!/bin/sh
/usr/bin/sudo -u someuser /path/to/krun/scripts/run_krun_at_boot /path/to/your/config.krun
exit 0
Make sure you replace the paths as appropriate and substitute someuser
with
the name of a normal unprivileged user that you wish to use to kick off Krun.
Make sure /etc/rc.local
is executable and that it only contains absolute
paths. Note that sudo(8)
is installed in different places on different
operating systems (for OpenBSD, it's /usr/local/bin/sudo
).
Any arguments supplied after the config file path are passed to Krun unchanged.
You can then start the experiment by manually running the command from your new
rc.local
(i.e. sudo -u ...
).
Whilst benchmarking is occuring, you must not log in to the machine (indeed,
hopefully sshd
, or equivalent, has been disabled!). To monitor progress, and
be informed of errors, you you should add a MAIL_TO
list of emails to your
Krun config file:
MAIL_TO = ["me@mydomain.com", "other_person@herdomain.com"]
Krun uses sendmail(8)
to send email, so you will need to make sure that
you have a functional SMTP server installed (and don't forget to switch it off
during benchmarking!).
For each platform, Krun has a default built-in dmesg whitelist. The whitelist is a collection of regex patterns which are used to decide if a line in the dmesg buffer is a cause for concern. If a new line appears in the dmesg during benchmarking, and the line is not matched by at least one whitelist pattern, then Krun will flag the process execution as ``errored'' and email the user.
From time to time you may find that you need to customise the whitelist. This
is achieved by adding a callback named custom_dmesg_whitelist
into your
config file. The callback is passed the default list of patterns for your
platform and must return a new list of patterns. In the implementation of your
callback you have the choice to base your custom whitelist on the defaults or
to define your own patterns from scratch.
For example, to add a pattern, you would add a callback like:
def custom_dmesg_whitelist(defaults):
return defaults + ["^.*your+regex.*pattern$"]
Krun uses Python's re
module to compile regex patterns. Consult the Python
docs for more information on the regex syntax.
Bear in mind that Linux dmesg lines start with a time code which custom dmesg lines will need to match.
If you have added custom patterns which you think would be useful for other users of Krun, please raise an issue (or pull request) to have the patterns added to the defaults.
If you are making changes to Krun itself (for example, to add a new platform or virtual machine definition), there are a few switches which can make your life easier.
-
--debug=<level>
: Sets the verbosity of Krun. Valid debug levels are:DEBUG
,INFO
,WARN
,DEBUG
,CRITICAL
andERROR
. The default isWARN
. Production quality benchmarking should use the default. -
--quick
: There are several places where Krun pauses to allow the system to stabilise. In testing these pauses can be burdensome and can thus be skipped with--quick
. -
--no-user-change
: Without this flag, For each process execution, Krun will use a fresh user account called 'krun'. This involves deleting any existing user account (withuserdel -r
) and creating a new user account (withuseradd -m
). This switch disables the use of a fresh user account, meaning thatuserdel
anduseradd
are not invoked, nor does Krun switch user; the user Krun was invoked with is used for benchmarking. -
--dry-run
: Fakes actual benchmark processes, making them finish instantaneously. -
--no-tickless-check
: Do not crash out if the Linux kernel is not tickless. -
--no-pstate-check
: Do not crash out if Intel P-states are not disabled.
Krun generates a bzipped JSON file containing results of all process executions. The structure of the JSON results is as follows:
{
'audit': '', # A dict containing platform information
'config': '', # A unicode object containing your Krun configuration
'wallclock_times': { # A dict containing timing results
'bmark:VM:variant': [ # A list of lists of in-process iteration times
[ ... ], ... # One list per process execution
]
},
'core_cycle_counts': { # Per-core core cycle counter deltas
'bmark:VM:variant': [
[ # One list per process execution
[...], ... # One list per core
]
},
'aperf_counts': {...} # Per-core APERF deltas
# (structure same as 'core_cycle_counts')
'mperf_counts': {...} # Per-core MPERF deltas
# (structure same as 'core_cycle_counts')
'pexec_flags': {...} # A flag for each process execution:
# 'C' completed OK.
# 'E' benchmark crashed.
# 'T' benchmark timed out.
'eta_estimates': {u"bmark:VM:variant": [t_0, t_1, ...], ...} # A dict mapping
# benchmark keys to rough process execution times. Used internally:
# users can ignore this.
}
Some options exist to help inspect the results file:
--dump-reboots
--dump-etas
--dump-config
--dump-audits
--dump-temps
--dump-data
$ python krun.py --dump-config examples/example_results.json.bz2
INFO:root:Krun starting...
[2015-11-02 14:23:31: INFO] Krun starting...
import os
from krun.vm_defs import (PythonVMDef, NativeVMDef)
from krun import EntryPoint
# Who to mail
MAIL_TO = []
...
$ python krun.py --dump-audit examples/example_results.json.bz2
{
"cpuinfo": "processor\t: 0\nvendor_id\t: GenuineIntel\ncpu family\t:
...
$ python krun.py --dump-reboots examples/example_results.json.bz2
[2015-11-06 13:14:35: INFO] Krun starting...
8
The following error in the log file is indicative that Krun has not been compiled with Java support:
Exception in thread "main" java.lang.UnsatisfiedLinkError: IterationsRunner.JNI_krun_init()V
at IterationsRunner.JNI_krun_init(Native Method)
at IterationsRunner.main(iterations_runner.java:248)
See Step 3 on how to build with Java support.
Krun has a pytest suite which can be run by executing py.test
in the
top-level source directory.
Krun is not intended to be run on a secure multi-user system, as it uses sudo
to elevate privileges. It also uses files with fixed names in /tmp/
which means
that only one instance of Krun should be run at any one time (running more
than one leads to undefined behaviour).
sudo
is used to:
- Add and remove a fresh benchmarking user for each process execution.
- Switch users.
- Change the CPU speed.
- Set the perf sample rate (Linux only)
- Automatically reboot the system (
--hardware-reboots
only). - Set process priorities.
- Create cgroup shields (Linux only, off by default)
- Detect virtualised hosts.
- Unrestrict the dmesg buffer (Linux Kernel 4.8+)
- Turn off "turbo boost" (Linux only)
- Turn off memory over-commit (Linux only).
Please make sure you understand the implications of this.
Krun is licensed under the UPL license.
The nbody benchmark is licensed under a revised BSD license: http://shootout.alioth.debian.org/license.php