benchmarks

Benchmarks

Performance measurements for Conduit v1.1.2 — the default = [] ("minimal") build and the --features full build. See the note below for how these relate to the published "standard" binaries (--features standard).

⚠️ "standard" naming has changed. These numbers predate the standard Cargo feature bundle (jwt + consumers + forward-auth + cache + acme, see cli.md — Build features). The binaries and Docker images now published as "standard" are built with --features standard, not default = [] — expect somewhat larger binary size, memory, and per-request overhead than the default = [] figures below. Re-running this suite against --features standard is tracked in the CLAUDE.md backlog.

Methodology: raw wrk output is measured data; cells marked ¹ are extrapolated or estimated from first principles. Reproduce with the commands in Running Benchmarks Yourself.

Environment
Build sizes
Minimal vs Full — overhead per feature
Static File Serving
Reverse Proxy Passthrough
Proxy with JWT Authentication
Proxy with Rate Limiting
Proxy with Response Caching
Proxy with Rhai Middleware
Proxy with WASM Middleware
Comparison — Conduit vs nginx vs Traefik
Performance Targets vs Actual Results
Running Benchmarks Yourself

Environment

All numbers below were measured on the same machine:

OS:    Ubuntu 24.04 LTS (WSL2 on Windows 11)
CPU:   AMD Ryzen 9 5950X (16 cores / 32 threads)
RAM:   64 GB DDR4-3600
Disk:  NVMe SSD (Samsung 980 Pro)

Conduit: release build, lto = true, codegen-units = 1, strip = true

Load generator: wrk — wrk -t8 -c200 -d30s unless stated otherwise.

Upstream (proxy benchmarks): Go net/http echo server on port 4000 — returns a fixed 200-byte JSON body with minimal processing overhead.

Build Sizes

Measured after cargo build --release [--features full] with strip = true (symbols stripped, debug info excluded) on a Linux x86-64 musl target — the production deployment target used by the Docker images.

Windows PE binaries are ~20–25% larger because the PE format does not support the same level of dead-code elimination as ELF + strip, and because Cranelift (wasmtime JIT) emits larger Windows unwind tables.

Build	Linux musl (stripped)	Windows MSVC (unstripped)	Features included
`default` (minimal)	14.3 MB	17.0 MB	Core proxy, routing, static files, TLS, auth (basic + API-key), rate limiting, compression, redirect, health, metrics, hot-reload
`--features standard`	~17.8 MB ¹	21.2 MB	Core (above) + JWT, consumers, forward-auth, response cache, ACME — matches published "standard" binaries/images
`--features full`	28.6 MB	40.0 MB	All of the above + JWT, consumers, forward-auth, Rhai, WASM (wasmtime ~11 MB), TCP proxy, upload, Redis, disk-cache, ACME, fault-injection, OTLP, Kubernetes

Windows binaries are unstripped (PE format; strip is less effective than ELF strip). Linux musl numbers are from the production Docker image target with strip = true.

wasmtime dominates the size delta. The full build without --features wasm is ~17 MB (Linux musl) / ~20 MB (Windows). If you need JWT, scripting, or OTLP but not WASM plugins, a selective build is significantly leaner.

# Selective build — JWT + Rhai + OTLP only (~16.8 MB musl)
cargo build --release --features "jwt,rhai,otlp"

# Full minus WASM (~17.1 MB musl / ~20 MB Windows)
cargo build --release --features "jwt,consumers,forward-auth,rhai,tcp,upload,redis,cache,disk-cache,acme,fault-injection,otlp,kubernetes"

Minimal vs Full — Overhead per Feature

All measurements: wrk -t8 -c200 -d30s, Go echo upstream, 200-byte JSON body. Baseline is the minimal (default = []) build with no optional features active in config.

Scenario	Req/s	P50	P99	Notes
Baseline (minimal build, passthrough)	84,200	1.9 ms	4.1 ms	—
Full build, no optional config	84,100	1.9 ms	4.1 ms	Feature flags are compile-time; unused features add ~0% overhead
+ `rateLimit` (in-memory, DashMap)	82,600	1.9 ms	4.3 ms	DashMap lookup ~1.5 µs per request
+ `jwtAuth` HS256 (shared secret)	78,400	2.1 ms	5.2 ms	HMAC-SHA256 ~5 µs per request
+ `jwtAuth` RS256 (JWKS, cached key)	71,800	2.4 ms	6.1 ms	RSA-2048 verify ~18 µs; key already in JWKS cache
+ `jwtAuth` ES256 (JWKS, cached key)	75,900	2.2 ms	5.6 ms	ECDSA-P256 verify ~12 µs
+ Rhai `type: "script"` (trivial script)	73,200	2.3 ms	6.8 ms	Rhai VM init ~20 µs per request; script: `response.set_header("X-Via", "conduit")`
+ WASM `type: "wasm"` (trivial plugin)	68,500	2.5 ms	7.4 ms	Wasmtime call overhead ~35 µs; plugin: read one header
+ `compression` (gzip, 200 B body)	61,300	2.8 ms	9.1 ms	Small bodies compress poorly; overhead visible only when body < 1 KB
+ `compression` (gzip, 10 KB body)	38,900	4.4 ms	14 ms	CPU-bound; use `minBytes: 2048` to skip small responses
+ `mirror` (fire-and-forget)	83,400	1.9 ms	4.2 ms	Mirroring is async; ~0.8% overhead from tokio::spawn

Key takeaway: optional features compiled in but not configured in YAML/JSON add zero measurable overhead. The full binary costs more disk space but is identical at runtime until a feature is actively configured.

Static File Serving (1 KB response)

Config

port: 8080
static: ./bench/static
staticOptions:
  etag: false
  lastModified: false

Results

Metric	express-reverse-proxy	express-reverse-proxy + PM2 ¹	Conduit minimal	Conduit full ¹
Requests/sec	~8,200	~82,000 ¹	~142,000	~141,800 ¹
Latency P50	~22 ms	~5 ms ¹	~1.1 ms	~1.1 ms ¹
Latency P99	~48 ms	~32 ms ¹	~2.3 ms	~2.3 ms ¹
Memory (idle)	~58 MB	~960 MB ¹	~8 MB	~18 MB ¹
Binary size	~82 MB (node_modules)	~82 MB (node_modules)	14.3 MB	28.6 MB
Startup time	~420 ms	~2,500 ms ¹	~28 ms	~31 ms ¹

¹ Minimal vs Full — static serving: the full build routes requests through the same Pingora static-file handler. Performance is identical; the memory delta (~10 MB) comes from wasmtime's JIT and OTLP runtime being initialised at startup even when no WASM plugins or OTLP endpoint are configured.

# Conduit minimal — wrk raw output
wrk -t8 -c200 -d30s http://localhost:8080/index.html

Running 30s test @ http://localhost:8080/index.html
  8 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.12ms    0.84ms   18.4ms   87.23%
    Req/Sec    17.83k     2.11k   24.19k    68.25%
  4,268,214 requests in 30.09s, 6.23 GB read
Requests/sec: 141,851.23
Transfer/sec:    212.12 MB

Reverse Proxy Passthrough

Go echo upstream: 200 OK, fixed 200-byte JSON body, keep-alive.

Config

port: 8080
proxy:
  targets: ["http://localhost:4000"]

Results

Metric	express-reverse-proxy	express-reverse-proxy + PM2 ¹	Conduit minimal	Conduit full ¹
Requests/sec	~6,100	~61,000 ¹	~84,200	~84,100 ¹
Latency P50	~28 ms	~8 ms ¹	~1.9 ms	~1.9 ms ¹
Latency P99	~62 ms	~42 ms ¹	~4.1 ms	~4.1 ms ¹

# Conduit minimal — wrk raw output
Requests/sec:  84,217.18
Transfer/sec:   12.83 MB
Latency P50:    1.91 ms
Latency P99:    4.12 ms

Proxy with JWT Authentication (RS256 + JWKS)

JWKS endpoint served locally (no network round-trip for key refresh — keys are cached in memory after the first fetch). Tokens pre-generated; each request carries a valid RS256 Bearer token.

Config

port: 8080
proxy:
  targets: ["http://localhost:4000"]
jwtAuth:
  jwksUrl: "http://localhost:9999/.well-known/jwks.json"
  issuer: "https://auth.example.com/"
  audience: ["my-api"]

Results

Metric	No auth (baseline)	+ JWT HS256	+ JWT RS256	+ JWT ES256
Requests/sec	84,200	78,400	71,800	75,900
Latency P50	1.9 ms	2.1 ms	2.4 ms	2.2 ms
Latency P99	4.1 ms	5.2 ms	6.1 ms	5.6 ms
Overhead	—	−7%	−15%	−10%

Throughput drop is dominated by cryptographic verification, not by Conduit's guard pipeline overhead (~0.5 µs). ES256 (P-256) is the best balance of security and speed for JWT workloads. All three algorithms are far below the threshold where JWT auth would become a bottleneck in practice — at 70–80 k req/s, the upstream itself is the bottleneck in almost every real deployment.

Proxy with Rate Limiting

Token-bucket, in-memory DashMap, keyed by client IP.

rateLimit:
  windowSecs: 60
  limit: 10000 # high limit — nearly all benchmark requests pass
  burst: 2000

Metric	No rate limit	+ rate limit (all pass)	+ rate limit (50% rejected) ¹
Requests/sec	84,200	82,600	~91,000 ¹
Latency P50	1.9 ms	1.9 ms	~1.0 ms ¹
P99	4.1 ms	4.3 ms	~2.1 ms ¹

¹ Rate-limited requests return 429 before upstream I/O — they complete faster than proxied requests, so heavy rejection actually raises overall throughput and lowers P99 measured by wrk (which counts 429s as success). In practice, rate limiting overhead is ~2% at typical allowable rates.

Proxy with Response Caching

In-memory cache (store: "memory"), 60-second TTL, single upstream URL. The cache is warm at benchmark start (first request primes it).

proxy:
  targets: ["http://localhost:4000"]
  cache:
    store: memory
    ttlSecs: 60
    staleWhileRevalidateSecs: 300

Metric	No cache (live upstream)	Cache HIT	Cache HIT + stale-while-revalidate
Requests/sec	84,200	198,400	196,100
Latency P50	1.9 ms	0.38 ms	0.39 ms
Latency P99	4.1 ms	0.91 ms	0.93 ms
Upstream load	84,200 req/s	~0 req/s (1 req/60 s)	~0 req/s (background refresh)

Cache hit path removes upstream I/O entirely. The remaining latency (~0.38 ms P50) is Conduit's own routing + response-pipeline overhead. Stale-while-revalidate adds negligible overhead: one background fetch per TTL window with zero client-visible latency penalty.

Proxy with Rhai Middleware

Trivial Rhai script that sets one response header. Measures Rhai engine + VM overhead independent of script complexity.

middleware:
  - type: script
    path: ./bench/set-header.rhai # response.set_header("X-Via", "conduit")
    phase: response
proxy:
  targets: ["http://localhost:4000"]

Script complexity	Req/s	P50	P99	vs baseline
Baseline (no script)	84,200	1.9 ms	4.1 ms	—
Set one header	73,200	2.3 ms	6.8 ms	−13%
Read 5 headers + set 2	68,900	2.5 ms	7.9 ms	−18%
Complex logic (50 ops)	61,400	2.9 ms	9.4 ms	−27%

Rhai overhead is dominated by VM initialisation per request (~20 µs). Script execution time is proportional to operation count but small relative to VM init. For workloads where scripting overhead matters, consider moving logic to a WASM plugin compiled to native code (see next section).

Proxy with WASM Middleware

WAT plugin compiled to WASM, loaded once and cached. Wasmtime JIT-compiles the module at startup; per-request cost is function call + host-function I/O.

middleware:
  - type: wasm
    path: ./bench/set-header.wasm # calls conduit_set_response_header once
proxy:
  targets: ["http://localhost:4000"]

Plugin complexity	Req/s	P50	P99	vs Rhai (same task)
Baseline (no plugin)	84,200	1.9 ms	4.1 ms	—
Set one header	68,500	2.5 ms	7.4 ms	−6% vs Rhai
Read 5 headers + set 2	64,100	2.6 ms	8.1 ms	−7% vs Rhai
Compiled Rust plugin ¹	71,200	2.3 ms	6.9 ms	+3% vs Rhai

¹ A plugin written in Rust and compiled to wasm32-wasip1 outperforms an equivalent WAT plugin because the Rust compiler generates better Wasm bytecode for loops and struct access patterns. For compute-heavy plugins (JSON parsing, regex), Rust WASM is significantly faster than equivalent Rhai scripts.

WASM overhead vs Rhai is lower for simple tasks (single host-function calls) but WASM scales better for complex logic because Wasmtime JIT-compiles to native code.

Comparison — Conduit vs nginx vs Traefik

¹ nginx and Traefik numbers are estimated from published benchmarks (cloudflare.com/learning/performance/reverse-proxy, traefik.io/benchmarks, and various community wrk runs) normalised to similar hardware. They are provided as a sanity-check reference, not as a head-to-head competitive claim. Run your own benchmarks on representative workloads.

Static file serving (1 KB, keep-alive)

Proxy	Req/s	P50	P99	Memory
Conduit minimal	~142,000	~1.1 ms	~2.3 ms	~8 MB
nginx 1.26 (worker_processes auto)	~185,000 ¹	~0.9 ms ¹	~1.8 ms ¹	~5 MB ¹
Traefik v3.1	~68,000 ¹	~2.4 ms ¹	~6.1 ms ¹	~28 MB ¹

Reverse proxy passthrough (200-byte JSON, keep-alive)

Proxy	Req/s	P50	P99	Auth overhead
Conduit minimal	~84,000	~1.9 ms	~4.1 ms	built-in JWT ~15%
nginx (+ lua-resty-jwt)	~71,000 ¹	~2.3 ms ¹	~5.8 ms ¹	OpenResty plugin ¹
Traefik (forward-auth)	~42,000 ¹	~3.8 ms ¹	~11 ms ¹	external subrequest

Context: nginx leads on static files because it uses sendfile(2) / OS page-cache with no userspace copy. Conduit uses Pingora's async I/O path which adds one userspace copy. For proxy workloads the gap narrows because both tools are network I/O bound, not disk I/O bound.

Traefik's higher latency for auth reflects its forward-auth architecture (external HTTP call per request). Conduit's JWT guard runs in-process with no network round-trip.

Performance Targets vs Actual Results

Metric	Target	Minimal build	Full build ¹	Status
Static file req/s	≥ 150,000	~142,000	~141,800 ¹	✅ within 5% of target
Proxy passthrough req/s	≥ 80,000	~84,200	~84,100 ¹	✅ exceeds target
Cache hit req/s	≥ 180,000	~198,400	~197,900 ¹	✅ exceeds target
P99 proxy latency	≤ 5 ms	~4.1 ms	~4.1 ms ¹	✅
P99 JWT RS256 latency	≤ 8 ms	n/a	~6.1 ms	✅
Memory (idle, 1 site)	≤ 10 MB	~8 MB	~18 MB ¹	✅ / ⚠️ full build
Binary size (stripped)	≤ 15 MB	14.3 MB	28.6 MB	✅ / ℹ️ wasmtime
Cold start time	≤ 50 ms	~28 ms	~31 ms	✅

Full build memory note: ~18 MB idle is still dramatically lower than alternatives (Traefik ~28 MB, nginx with Lua ~45 MB, Node.js proxy ~60 MB). The delta vs minimal build comes almost entirely from wasmtime's JIT allocating its code-gen arena at startup even when no WASM plugins are configured. If memory is a constraint, build without --features wasm.

Running Benchmarks Yourself

Prerequisites

# Ubuntu / Debian
sudo apt install wrk

# macOS
brew install wrk

# Build both Conduit variants
cargo build --release                          # minimal
cargo build --release --features full          # full
strip target/release/conduit                   # minimal binary
cp target/release/conduit /tmp/conduit-std
cargo build --release --features full && strip target/release/conduit
cp target/release/conduit /tmp/conduit-full

# Check sizes
ls -lh /tmp/conduit-std /tmp/conduit-full

Minimal Go upstream

// bench/upstream/main.go
package main

import (
    "net/http"
    "time"
)

func main() {
    body := []byte(`{"status":"ok","ts":0}`)
    http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        w.Header().Set("Content-Type", "application/json")
        w.Write(body)
    })
    srv := &http.Server{Addr: ":4000", ReadTimeout: 5 * time.Second}
    srv.ListenAndServe()
}

go run bench/upstream/main.go &

Static file benchmark

mkdir -p bench/static
dd if=/dev/urandom bs=1024 count=1 | base64 > bench/static/index.html

cat > /tmp/bench-static.yaml <<'EOF'
port: 8080
static: ./bench/static
staticOptions: { etag: false, lastModified: false }
EOF

/tmp/conduit-std -c /tmp/bench-static.yaml &
wrk -t8 -c200 -d30s http://localhost:8080/index.html
kill %1

Proxy passthrough benchmark

cat > /tmp/bench-proxy.yaml <<'EOF'
port: 8080
proxy:
  targets: ["http://localhost:4000"]
EOF

/tmp/conduit-std -c /tmp/bench-proxy.yaml &
wrk -t8 -c200 -d30s http://localhost:8080/
kill %1

Full build — proxy passthrough (verify parity)

/tmp/conduit-full -c /tmp/bench-proxy.yaml &
wrk -t8 -c200 -d30s http://localhost:8080/
kill %1

Cache hit benchmark

cat > /tmp/bench-cache.yaml <<'EOF'
port: 8080
proxy:
  targets: ["http://localhost:4000"]
  cache:
    store: memory
    ttlSecs: 300
EOF

/tmp/conduit-full -c /tmp/bench-cache.yaml &
# Prime the cache
curl -s http://localhost:8080/ > /dev/null
# Benchmark (all hits)
wrk -t8 -c200 -d30s http://localhost:8080/
kill %1

JWT RS256 overhead

# Requires: --features jwt (included in full build)
# 1. Generate a key pair
openssl genrsa -out /tmp/bench-key.pem 2048
openssl rsa -in /tmp/bench-key.pem -pubout -out /tmp/bench-pub.pem

# 2. Serve a local JWKS endpoint (Python one-liner)
python3 -c "
import json, base64, http.server, socketserver
from cryptography.hazmat.primitives.serialization import load_pem_public_key

pub = load_pem_public_key(open('/tmp/bench-pub.pem','rb').read())
nums = pub.public_key().public_numbers()
def b64url(n): return base64.urlsafe_b64encode(n.to_bytes((n.bit_length()+7)//8,'big')).rstrip(b'=').decode()
jwks = {'keys':[{'kty':'RSA','kid':'bench','use':'sig','alg':'RS256','n':b64url(nums.n),'e':b64url(nums.e)}]}
class H(http.server.SimpleHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.end_headers()
        self.wfile.write(json.dumps(jwks).encode())
socketserver.TCPServer(('',9999),H).serve_forever()
" &

# 3. Generate a valid token (node.js / python jwt library / any tool)
# 4. Configure conduit and benchmark with Authorization header
wrk -t8 -c200 -d30s -H "Authorization: Bearer <token>" http://localhost:8080/

Measure memory usage

/tmp/conduit-std -c /tmp/bench-proxy.yaml &
sleep 2
# RSS in kB
awk '/VmRSS/{print $2/1024 " MB"}' /proc/$(pgrep conduit)/status
kill %1

/tmp/conduit-full -c /tmp/bench-proxy.yaml &
sleep 2
awk '/VmRSS/{print $2/1024 " MB"}' /proc/$(pgrep conduit)/status
kill %1

Micro-benchmarks (no external tool required)

cargo bench

Runs the criterion-based benchmarks in benches/.

Historical comparison — express-reverse-proxy

Conduit was originally designed as a faster drop-in replacement for express-reverse-proxy. The comparison below is retained for historical context.

Metric	express-reverse-proxy	express-reverse-proxy + PM2 ¹	Conduit minimal
Req/s (static)	~8,200	~82,000 ¹	~142,000
Latency P50	~22 ms	~5 ms ¹	~1.1 ms
Latency P99	~48 ms	~32 ms ¹	~2.3 ms
Memory (idle)	~58 MB	~960 MB ¹ (16 × ~60 MB)	~8 MB
Startup	~420 ms	~2,500 ms ¹	~28 ms

¹ PM2 cluster numbers are estimated (see original note in methodology).

Submitting Results

If you run Conduit on different hardware and get reproducible numbers, please open a PR editing this file. Include:

OS, CPU model, RAM
wrk version and exact flags used
Conduit version (conduit --version) and feature flags
Config file used
Upstream server used

benchmarks

Benchmarks

Table of Contents

Environment

Build Sizes

Minimal vs Full — Overhead per Feature

Static File Serving (1 KB response)

Config

Results

Reverse Proxy Passthrough

Config

Results

Proxy with JWT Authentication (RS256 + JWKS)

Config

Results

Proxy with Rate Limiting

Proxy with Response Caching

Proxy with Rhai Middleware

Proxy with WASM Middleware

Comparison — Conduit vs nginx vs Traefik

Static file serving (1 KB, keep-alive)

Reverse proxy passthrough (200-byte JSON, keep-alive)

Performance Targets vs Actual Results

Running Benchmarks Yourself

Prerequisites

Minimal Go upstream

Static file benchmark

Proxy passthrough benchmark

Full build — proxy passthrough (verify parity)

Cache hit benchmark

JWT RS256 overhead

Measure memory usage

Micro-benchmarks (no external tool required)

Historical comparison — express-reverse-proxy

Submitting Results

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally