Skip to content
schoebel edited this page Jul 6, 2012 · 5 revisions

Full documentation:

Database of loads:

blkreplay - a Testing and Benchmarking Toolkit

with special focus on servers and data centers

Example (natural load, run on a commercial storage box)

Sonar Diagram: Example Sonar Diagram


blkreplay is a GPL’ed userspace toolkit, driving the block layer of Linux (or other Unix-like OSes) while measuring latency and throughput of IO operations for later visualization (so-called “sonar diagrams” and others).

In addition to artificial loads like random read-write sweeps and various kinds of overload tests, blkreplay can replay natural loads which have been recorded by blktrace (or Windows DiskMon) at heavily-loaded production servers at big data centers.

blkreplay comes with a large collection of natural loads from a wide spectrum of applications (such as web servers, databases, dedicated servers, etc) which have been released to the public by 1&1 under GPL. See . Some of these natural loads have recorded the real-life disk access behaviour from servers serving thousands of customers in parallel.

Static analysis of the workingset behaviour of such natural loads is also implemented.

blkreplay comes with a modular and extensible test suite, automating large projects for testing and/or benchmarking.

blkreplay can be used to test physical hardware, e.g. compare different brands of hard disks, or RAID controllers / their settings / RAID rebuild performance degradation, or to evaluate the effect of SSD caching, or to compare different block level transports like iSCSI vs Fibrechannel (over different kinds of storage networks).

It can compare virtual hardware (like vmware or XenServer block devices, or any type of block-level storage virtualization) to each other or to physical hardware, provided the test setup is handled very carefully.

blkreplay can compare commercial storage boxes from vendors like EMC, NetApp, IBM, Hitachi, etc to each other or to cheap off-the-shelf hardware (in order to determine the price/performance ratio), provided the test setup is also handled very carefully.

Furthermore, it can be used for tests of the Linux kernel, e.g. for testing device drivers, comparing different IO schedulers at different load patterns, determining the overhead of Linux dm targets, determining the impact of network problems to DRBD, and much more.

At 1&1, blkreplay has even been used as a tool for root cause analysis of incidents: for example, high load peaks presumably stemming from traffic jam (or other sources of overload) were recorded at production sites in real time by blktrace, and later replayed in the laboratory (without causing customer impact) seeking for the cause of trouble, or improving the safety margins by choice of better hardware.

For experts in IO subsystems, visualization techniques like “sonar diagrams” can reveal (parts of) the internal structure of complex IO systems, such as cache hierarchies or other hierarchical storage systems.

As a community project under GPL, blkreplay is open to contributions from hardware vendors, other data centers, the kernel hacker community, etc.

Full documentation:


Database of loads:

Example (continued)

Latency Histogram:

Example: Latency Histogram

Throughput Diagram:

Example: Throughput Diagram

Remark: here you can see why it is better to use diagrams showing the timely behaviour of throughput, instead of delivering just a single IOPS number!

Number of Requests on the Fly during Replay:

Example: Flying Requests = Actual Request Queue Depth

Static Analysis: Workingset Behaviour Example: Workingset Behaviour

In order to understand the implications of the static workingset behaviour and its theory (proposed by Denning), please read the documentation at


Full documentation:

Database of loads: