Tiny benchmarking library for Zig.
- CPU Counters: Measures CPU cycles, instructions, IPC, and cache misses directly from the kernel (Linux only).
- Argument Support: Pass pre-calculated data to your functions to separate setup overhead from the benchmark loop.
- Baseline Comparison: Easily compare multiple implementations against a reference function to see relative speedups or regressions.
- Flexible Reporting: Access raw metric data programmatically to generate custom reports (JSON, CSV) or assert performance limits in CI.
- Easy Throughput Metrics: Automatically calculates operations per second and data throughput (MB/s, GB/s) when payload size is provided.
- Robust Statistics: Uses median and standard deviation to provide reliable metrics despite system noise.
- Zero Dependencies: Implemented in pure Zig using only the standard library.
Fetch latest version:
zig fetch --save=bench https://github.com/pyk/bench/archive/main.tar.gzAdd bench as a dependency to your build.zig.
If you are using it only for tests/benchmarks, it is recommended to mark it as lazy:
.dependencies = .{
.bench = .{
.url = "...",
.hash = "...",
.lazy = true, // here
},
}To benchmark a single function, pass the allocator, a name, and the function
pointer to run.
const res = try bench.run(allocator, "My Function", myFn, .{});
try bench.report({ .metrics = &.{res} });You can generate test data before the benchmark starts and pass it via a tuple. This ensures the setup cost doesn't pollute your measurements.
// Setup data outside the benchmark
const input = try generateLargeString(allocator, 10_000);
// Pass input as a tuple
const res = try bench.run(allocator, "Parser", parseFn, .{input}, .{});You can run multiple benchmarks and compare them against a baseline. The
baseline_index determines which result is used as the reference (1.00x).
const a = try bench.run(allocator, "Implementation A", implA, .{});
const b = try bench.run(allocator, "Implementation B", implB, .{});
try bench.report(.{
.metrics = &.{ a, b },
// Use the first metric (Implementation A) as the baseline
.baseline_index = 0,
});If your function processes data (like copying memory or parsing strings),
provide bytes_per_op to get throughput metrics (MB/s or GB/s).
const size = 1024 * 1024;
const res = try bench.run(allocator, "Memcpy 1MB", copyFn, .{
.bytes_per_op = size,
});
// Report will now show GB/s instead of just Ops/s
try bench.report({ .metrics = &.{res} });You can tune the benchmark behavior by modifying the Options struct.
const res = try bench.run(allocator, "Heavy Task", heavyFn, .{
.warmup_iters = 10, // Default: 100
.sample_size = 50, // Default: 1000
});The default bench.report prints a human-readable table to stdout. It handles
units (ns, us, ms, s) and coloring automatically.
Benchmark Summary: 3 benchmarks run
├─ NoOp 60ns 16.80M/s [baseline]
│ └─ cycles: 14 instructions: 36 ipc: 2.51 miss: 0
├─ Sleep 1.06ms 944/s 17648.20x slower
│ └─ cycles: 4.1k instructions: 2.9k ipc: 0.72 miss: 17
└─ Busy 32.38us 30.78K/s 539.68x slower
└─ cycles: 150.1k instructions: 700.1k ipc: 4.67 miss: 0The run function returns a Metrics struct containing all raw data (min, max,
median, variance, cycles, etc.). You can use this to generate JSON, CSV, or
assert performance limits in CI.
const metrics = try bench.run(allocator, "MyFn", myFn, .{});
// Access raw fields directly
std.debug.print("Median: {d}ns, Max: {d}ns\n", .{
metrics.median_ns,
metrics.max_ns
});This tiny benchmark library support (✅) various metrics:
| Category | Metric | Description |
|---|---|---|
| Time | ✅ Mean / Average | Arithmetic average of all runs |
| Time | ✅ Median | The middle value (less sensitive to outliers) |
| Time | ✅ Min / Max | The absolute fastest and slowest runs |
| Time | CPU vs Wall Time | CPU time (active processing) vs Wall time (real world) |
| Throughput | ✅ Ops/sec | Operations per second |
| Throughput | ✅ Bytes/sec | Data throughput (MB/s, GB/s) |
| Throughput | Items/sec | Discrete items processed per second |
| Latency | Percentiles | p75, p99, p99.9. (e.g. "99% of requests were faster than X") |
| Latency | ✅ Std Dev / Variance | How much the results deviate from the average |
| Latency | Outliers | Detecting and reporting anomaly runs |
| Latency | Confidence / Margin of Error | e.g. "± 2.5%" |
| Latency | Histogram | Visual distribution of all runs |
| Memory | Bytes Allocated | Total heap memory requested per iteration |
| Memory | Allocation Count | Number of allocation calls |
| CPU | ✅ Cycles | CPU clock cycles used |
| CPU | ✅ Instructions | Total CPU instructions executed |
| CPU | ✅ IPC | Instructions Per Cycle (Efficiency) |
| CPU | ✅ Cache Misses | L1/L2 Cache misses |
| Comparative | ✅ Speedup (x) | "12.5x faster" (Current / Baseline). |
| Comparative | Relative Diff (%) | "+ 50%" or "- 10%". |
| Comparative | Big O | Complexity Analysis (O(n), O(log n)). |
| Comparative | R² (Goodness of Fit) | How well the data fits a linear model. |
Other metrics will be added as needed. Feel free to send a pull request.
-
This library is designed to show you "what", not "why". I recommend using a proper profiling tool such as
perfon linux + Firefox Profiler to answer "why". -
doNotOptimizeAwayis your friend. For example if you are benchmarking some scanner/tokenizer:while (true) { const token = try scanner.next(); if (token == .end) break; total_ops += 1; std.mem.doNotOptimizeAway(token); // CRITICAL }
-
To get
cycles,instructions,ipc(instructions per cycle) andcache_missesmetrics on Linux, you may need to enable thekernel.perf_event_paranoid.
Install the Zig toolchain via mise (optional):
mise trust
mise installRun tests:
zig build test --summary allBuild library:
zig buildEnable/disable kernel.perf_event_paranoid for debugging:
# Disable
sudo sysctl -w kernel.perf_event_paranoid=3
# Enable
sudo sysctl -w kernel.perf_event_paranoid=-1MIT. Use it for whatever.