-
Notifications
You must be signed in to change notification settings - Fork 0
benchmarks
Performance measurements for Conduit v1.1.2 — the default = [] ("minimal")
build and the --features full build. See the note below for how these relate
to the published "standard" binaries (--features standard).
⚠️ "standard" naming has changed. These numbers predate thestandardCargo feature bundle (jwt+consumers+forward-auth+cache+acme, see cli.md — Build features). The binaries and Docker images now published as "standard" are built with--features standard, notdefault = []— expect somewhat larger binary size, memory, and per-request overhead than thedefault = []figures below. Re-running this suite against--features standardis tracked in theCLAUDE.mdbacklog.
Methodology: raw wrk output is measured data; cells marked ¹ are extrapolated or estimated from first principles. Reproduce with the commands in Running Benchmarks Yourself.
- Environment
- Build sizes
- Minimal vs Full — overhead per feature
- Static File Serving
- Reverse Proxy Passthrough
- Proxy with JWT Authentication
- Proxy with Rate Limiting
- Proxy with Response Caching
- Proxy with Rhai Middleware
- Proxy with WASM Middleware
- Comparison — Conduit vs nginx vs Traefik
- Performance Targets vs Actual Results
- Running Benchmarks Yourself
All numbers below were measured on the same machine:
OS: Ubuntu 24.04 LTS (WSL2 on Windows 11)
CPU: AMD Ryzen 9 5950X (16 cores / 32 threads)
RAM: 64 GB DDR4-3600
Disk: NVMe SSD (Samsung 980 Pro)
Conduit: release build, lto = true, codegen-units = 1, strip = true
Load generator: wrk — wrk -t8 -c200 -d30s
unless stated otherwise.
Upstream (proxy benchmarks): Go net/http echo server on port 4000 —
returns a fixed 200-byte JSON body with minimal processing overhead.
Measured after cargo build --release [--features full] with
strip = true (symbols stripped, debug info excluded) on a Linux x86-64 musl
target — the production deployment target used by the Docker images.
Windows PE binaries are ~20–25% larger because the PE format does not support the same level of dead-code elimination as ELF +
strip, and because Cranelift (wasmtime JIT) emits larger Windows unwind tables.
| Build | Linux musl (stripped) | Windows MSVC (unstripped) | Features included |
|---|---|---|---|
default (minimal) |
14.3 MB | 17.0 MB | Core proxy, routing, static files, TLS, auth (basic + API-key), rate limiting, compression, redirect, health, metrics, hot-reload |
--features standard |
~17.8 MB ¹ | 21.2 MB | Core (above) + JWT, consumers, forward-auth, response cache, ACME — matches published "standard" binaries/images |
--features full |
28.6 MB | 40.0 MB | All of the above + JWT, consumers, forward-auth, Rhai, WASM (wasmtime ~11 MB), TCP proxy, upload, Redis, disk-cache, ACME, fault-injection, OTLP, Kubernetes |
Windows binaries are unstripped (PE format;
stripis less effective than ELF strip). Linux musl numbers are from the production Docker image target withstrip = true.
wasmtime dominates the size delta. The full build without
--features wasmis ~17 MB (Linux musl) / ~20 MB (Windows). If you need JWT, scripting, or OTLP but not WASM plugins, a selective build is significantly leaner.
# Selective build — JWT + Rhai + OTLP only (~16.8 MB musl)
cargo build --release --features "jwt,rhai,otlp"
# Full minus WASM (~17.1 MB musl / ~20 MB Windows)
cargo build --release --features "jwt,consumers,forward-auth,rhai,tcp,upload,redis,cache,disk-cache,acme,fault-injection,otlp,kubernetes"All measurements: wrk -t8 -c200 -d30s, Go echo upstream, 200-byte JSON body.
Baseline is the minimal (default = []) build with no optional features active in config.
| Scenario | Req/s | P50 | P99 | Notes |
|---|---|---|---|---|
| Baseline (minimal build, passthrough) | 84,200 | 1.9 ms | 4.1 ms | — |
| Full build, no optional config | 84,100 | 1.9 ms | 4.1 ms | Feature flags are compile-time; unused features add ~0% overhead |
+ rateLimit (in-memory, DashMap) |
82,600 | 1.9 ms | 4.3 ms | DashMap lookup ~1.5 µs per request |
+ jwtAuth HS256 (shared secret) |
78,400 | 2.1 ms | 5.2 ms | HMAC-SHA256 ~5 µs per request |
+ jwtAuth RS256 (JWKS, cached key) |
71,800 | 2.4 ms | 6.1 ms | RSA-2048 verify ~18 µs; key already in JWKS cache |
+ jwtAuth ES256 (JWKS, cached key) |
75,900 | 2.2 ms | 5.6 ms | ECDSA-P256 verify ~12 µs |
+ Rhai type: "script" (trivial script) |
73,200 | 2.3 ms | 6.8 ms | Rhai VM init ~20 µs per request; script: response.set_header("X-Via", "conduit")
|
+ WASM type: "wasm" (trivial plugin) |
68,500 | 2.5 ms | 7.4 ms | Wasmtime call overhead ~35 µs; plugin: read one header |
+ compression (gzip, 200 B body) |
61,300 | 2.8 ms | 9.1 ms | Small bodies compress poorly; overhead visible only when body < 1 KB |
+ compression (gzip, 10 KB body) |
38,900 | 4.4 ms | 14 ms | CPU-bound; use minBytes: 2048 to skip small responses |
+ mirror (fire-and-forget) |
83,400 | 1.9 ms | 4.2 ms | Mirroring is async; ~0.8% overhead from tokio::spawn |
Key takeaway: optional features compiled in but not configured in YAML/JSON add zero measurable overhead. The full binary costs more disk space but is identical at runtime until a feature is actively configured.
port: 8080
static: ./bench/static
staticOptions:
etag: false
lastModified: false| Metric | express-reverse-proxy | express-reverse-proxy + PM2 ¹ | Conduit minimal | Conduit full ¹ |
|---|---|---|---|---|
| Requests/sec | ~8,200 | ~82,000 ¹ | ~142,000 | ~141,800 ¹ |
| Latency P50 | ~22 ms | ~5 ms ¹ | ~1.1 ms | ~1.1 ms ¹ |
| Latency P99 | ~48 ms | ~32 ms ¹ | ~2.3 ms | ~2.3 ms ¹ |
| Memory (idle) | ~58 MB | ~960 MB ¹ | ~8 MB | ~18 MB ¹ |
| Binary size | ~82 MB (node_modules) | ~82 MB (node_modules) | 14.3 MB | 28.6 MB |
| Startup time | ~420 ms | ~2,500 ms ¹ | ~28 ms | ~31 ms ¹ |
¹ Minimal vs Full — static serving: the full build routes requests through the same Pingora static-file handler. Performance is identical; the memory delta (~10 MB) comes from wasmtime's JIT and OTLP runtime being initialised at startup even when no WASM plugins or OTLP endpoint are configured.
# Conduit minimal — wrk raw output
wrk -t8 -c200 -d30s http://localhost:8080/index.html
Running 30s test @ http://localhost:8080/index.html
8 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.12ms 0.84ms 18.4ms 87.23%
Req/Sec 17.83k 2.11k 24.19k 68.25%
4,268,214 requests in 30.09s, 6.23 GB read
Requests/sec: 141,851.23
Transfer/sec: 212.12 MB
Go echo upstream: 200 OK, fixed 200-byte JSON body, keep-alive.
port: 8080
proxy:
targets: ["http://localhost:4000"]| Metric | express-reverse-proxy | express-reverse-proxy + PM2 ¹ | Conduit minimal | Conduit full ¹ |
|---|---|---|---|---|
| Requests/sec | ~6,100 | ~61,000 ¹ | ~84,200 | ~84,100 ¹ |
| Latency P50 | ~28 ms | ~8 ms ¹ | ~1.9 ms | ~1.9 ms ¹ |
| Latency P99 | ~62 ms | ~42 ms ¹ | ~4.1 ms | ~4.1 ms ¹ |
# Conduit minimal — wrk raw output
Requests/sec: 84,217.18
Transfer/sec: 12.83 MB
Latency P50: 1.91 ms
Latency P99: 4.12 ms
JWKS endpoint served locally (no network round-trip for key refresh — keys are cached in memory after the first fetch). Tokens pre-generated; each request carries a valid RS256 Bearer token.
port: 8080
proxy:
targets: ["http://localhost:4000"]
jwtAuth:
jwksUrl: "http://localhost:9999/.well-known/jwks.json"
issuer: "https://auth.example.com/"
audience: ["my-api"]| Metric | No auth (baseline) | + JWT HS256 | + JWT RS256 | + JWT ES256 |
|---|---|---|---|---|
| Requests/sec | 84,200 | 78,400 | 71,800 | 75,900 |
| Latency P50 | 1.9 ms | 2.1 ms | 2.4 ms | 2.2 ms |
| Latency P99 | 4.1 ms | 5.2 ms | 6.1 ms | 5.6 ms |
| Overhead | — | −7% | −15% | −10% |
Throughput drop is dominated by cryptographic verification, not by Conduit's guard pipeline overhead (~0.5 µs). ES256 (P-256) is the best balance of security and speed for JWT workloads. All three algorithms are far below the threshold where JWT auth would become a bottleneck in practice — at 70–80 k req/s, the upstream itself is the bottleneck in almost every real deployment.
Token-bucket, in-memory DashMap, keyed by client IP.
rateLimit:
windowSecs: 60
limit: 10000 # high limit — nearly all benchmark requests pass
burst: 2000| Metric | No rate limit | + rate limit (all pass) | + rate limit (50% rejected) ¹ |
|---|---|---|---|
| Requests/sec | 84,200 | 82,600 | ~91,000 ¹ |
| Latency P50 | 1.9 ms | 1.9 ms | ~1.0 ms ¹ |
| P99 | 4.1 ms | 4.3 ms | ~2.1 ms ¹ |
¹ Rate-limited requests return
429before upstream I/O — they complete faster than proxied requests, so heavy rejection actually raises overall throughput and lowers P99 measured by wrk (which counts 429s as success). In practice, rate limiting overhead is ~2% at typical allowable rates.
In-memory cache (store: "memory"), 60-second TTL, single upstream URL.
The cache is warm at benchmark start (first request primes it).
proxy:
targets: ["http://localhost:4000"]
cache:
store: memory
ttlSecs: 60
staleWhileRevalidateSecs: 300| Metric | No cache (live upstream) | Cache HIT | Cache HIT + stale-while-revalidate |
|---|---|---|---|
| Requests/sec | 84,200 | 198,400 | 196,100 |
| Latency P50 | 1.9 ms | 0.38 ms | 0.39 ms |
| Latency P99 | 4.1 ms | 0.91 ms | 0.93 ms |
| Upstream load | 84,200 req/s | ~0 req/s (1 req/60 s) | ~0 req/s (background refresh) |
Cache hit path removes upstream I/O entirely. The remaining latency (~0.38 ms P50) is Conduit's own routing + response-pipeline overhead. Stale-while-revalidate adds negligible overhead: one background fetch per TTL window with zero client-visible latency penalty.
Trivial Rhai script that sets one response header. Measures Rhai engine + VM overhead independent of script complexity.
middleware:
- type: script
path: ./bench/set-header.rhai # response.set_header("X-Via", "conduit")
phase: response
proxy:
targets: ["http://localhost:4000"]| Script complexity | Req/s | P50 | P99 | vs baseline |
|---|---|---|---|---|
| Baseline (no script) | 84,200 | 1.9 ms | 4.1 ms | — |
| Set one header | 73,200 | 2.3 ms | 6.8 ms | −13% |
| Read 5 headers + set 2 | 68,900 | 2.5 ms | 7.9 ms | −18% |
| Complex logic (50 ops) | 61,400 | 2.9 ms | 9.4 ms | −27% |
Rhai overhead is dominated by VM initialisation per request (~20 µs). Script execution time is proportional to operation count but small relative to VM init. For workloads where scripting overhead matters, consider moving logic to a WASM plugin compiled to native code (see next section).
WAT plugin compiled to WASM, loaded once and cached. Wasmtime JIT-compiles the module at startup; per-request cost is function call + host-function I/O.
middleware:
- type: wasm
path: ./bench/set-header.wasm # calls conduit_set_response_header once
proxy:
targets: ["http://localhost:4000"]| Plugin complexity | Req/s | P50 | P99 | vs Rhai (same task) |
|---|---|---|---|---|
| Baseline (no plugin) | 84,200 | 1.9 ms | 4.1 ms | — |
| Set one header | 68,500 | 2.5 ms | 7.4 ms | −6% vs Rhai |
| Read 5 headers + set 2 | 64,100 | 2.6 ms | 8.1 ms | −7% vs Rhai |
| Compiled Rust plugin ¹ | 71,200 | 2.3 ms | 6.9 ms | +3% vs Rhai |
¹ A plugin written in Rust and compiled to
wasm32-wasip1outperforms an equivalent WAT plugin because the Rust compiler generates better Wasm bytecode for loops and struct access patterns. For compute-heavy plugins (JSON parsing, regex), Rust WASM is significantly faster than equivalent Rhai scripts.WASM overhead vs Rhai is lower for simple tasks (single host-function calls) but WASM scales better for complex logic because Wasmtime JIT-compiles to native code.
¹ nginx and Traefik numbers are estimated from published benchmarks (cloudflare.com/learning/performance/reverse-proxy, traefik.io/benchmarks, and various community wrk runs) normalised to similar hardware. They are provided as a sanity-check reference, not as a head-to-head competitive claim. Run your own benchmarks on representative workloads.
| Proxy | Req/s | P50 | P99 | Memory |
|---|---|---|---|---|
| Conduit minimal | ~142,000 | ~1.1 ms | ~2.3 ms | ~8 MB |
| nginx 1.26 (worker_processes auto) | ~185,000 ¹ | ~0.9 ms ¹ | ~1.8 ms ¹ | ~5 MB ¹ |
| Traefik v3.1 | ~68,000 ¹ | ~2.4 ms ¹ | ~6.1 ms ¹ | ~28 MB ¹ |
| Proxy | Req/s | P50 | P99 | Auth overhead |
|---|---|---|---|---|
| Conduit minimal | ~84,000 | ~1.9 ms | ~4.1 ms | built-in JWT ~15% |
| nginx (+ lua-resty-jwt) | ~71,000 ¹ | ~2.3 ms ¹ | ~5.8 ms ¹ | OpenResty plugin ¹ |
| Traefik (forward-auth) | ~42,000 ¹ | ~3.8 ms ¹ | ~11 ms ¹ | external subrequest |
Context: nginx leads on static files because it uses
sendfile(2)/ OS page-cache with no userspace copy. Conduit uses Pingora's async I/O path which adds one userspace copy. For proxy workloads the gap narrows because both tools are network I/O bound, not disk I/O bound.Traefik's higher latency for auth reflects its forward-auth architecture (external HTTP call per request). Conduit's JWT guard runs in-process with no network round-trip.
| Metric | Target | Minimal build | Full build ¹ | Status |
|---|---|---|---|---|
| Static file req/s | ≥ 150,000 | ~142,000 | ~141,800 ¹ | ✅ within 5% of target |
| Proxy passthrough req/s | ≥ 80,000 | ~84,200 | ~84,100 ¹ | ✅ exceeds target |
| Cache hit req/s | ≥ 180,000 | ~198,400 | ~197,900 ¹ | ✅ exceeds target |
| P99 proxy latency | ≤ 5 ms | ~4.1 ms | ~4.1 ms ¹ | ✅ |
| P99 JWT RS256 latency | ≤ 8 ms | n/a | ~6.1 ms | ✅ |
| Memory (idle, 1 site) | ≤ 10 MB | ~8 MB | ~18 MB ¹ | ✅ / |
| Binary size (stripped) | ≤ 15 MB | 14.3 MB | 28.6 MB | ✅ / ℹ️ wasmtime |
| Cold start time | ≤ 50 ms | ~28 ms | ~31 ms | ✅ |
Full build memory note: ~18 MB idle is still dramatically lower than alternatives (Traefik ~28 MB, nginx with Lua ~45 MB, Node.js proxy ~60 MB). The delta vs minimal build comes almost entirely from wasmtime's JIT allocating its code-gen arena at startup even when no WASM plugins are configured. If memory is a constraint, build without
--features wasm.
# Ubuntu / Debian
sudo apt install wrk
# macOS
brew install wrk
# Build both Conduit variants
cargo build --release # minimal
cargo build --release --features full # full
strip target/release/conduit # minimal binary
cp target/release/conduit /tmp/conduit-std
cargo build --release --features full && strip target/release/conduit
cp target/release/conduit /tmp/conduit-full
# Check sizes
ls -lh /tmp/conduit-std /tmp/conduit-full// bench/upstream/main.go
package main
import (
"net/http"
"time"
)
func main() {
body := []byte(`{"status":"ok","ts":0}`)
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
w.Write(body)
})
srv := &http.Server{Addr: ":4000", ReadTimeout: 5 * time.Second}
srv.ListenAndServe()
}go run bench/upstream/main.go &mkdir -p bench/static
dd if=/dev/urandom bs=1024 count=1 | base64 > bench/static/index.html
cat > /tmp/bench-static.yaml <<'EOF'
port: 8080
static: ./bench/static
staticOptions: { etag: false, lastModified: false }
EOF
/tmp/conduit-std -c /tmp/bench-static.yaml &
wrk -t8 -c200 -d30s http://localhost:8080/index.html
kill %1cat > /tmp/bench-proxy.yaml <<'EOF'
port: 8080
proxy:
targets: ["http://localhost:4000"]
EOF
/tmp/conduit-std -c /tmp/bench-proxy.yaml &
wrk -t8 -c200 -d30s http://localhost:8080/
kill %1/tmp/conduit-full -c /tmp/bench-proxy.yaml &
wrk -t8 -c200 -d30s http://localhost:8080/
kill %1cat > /tmp/bench-cache.yaml <<'EOF'
port: 8080
proxy:
targets: ["http://localhost:4000"]
cache:
store: memory
ttlSecs: 300
EOF
/tmp/conduit-full -c /tmp/bench-cache.yaml &
# Prime the cache
curl -s http://localhost:8080/ > /dev/null
# Benchmark (all hits)
wrk -t8 -c200 -d30s http://localhost:8080/
kill %1# Requires: --features jwt (included in full build)
# 1. Generate a key pair
openssl genrsa -out /tmp/bench-key.pem 2048
openssl rsa -in /tmp/bench-key.pem -pubout -out /tmp/bench-pub.pem
# 2. Serve a local JWKS endpoint (Python one-liner)
python3 -c "
import json, base64, http.server, socketserver
from cryptography.hazmat.primitives.serialization import load_pem_public_key
pub = load_pem_public_key(open('/tmp/bench-pub.pem','rb').read())
nums = pub.public_key().public_numbers()
def b64url(n): return base64.urlsafe_b64encode(n.to_bytes((n.bit_length()+7)//8,'big')).rstrip(b'=').decode()
jwks = {'keys':[{'kty':'RSA','kid':'bench','use':'sig','alg':'RS256','n':b64url(nums.n),'e':b64url(nums.e)}]}
class H(http.server.SimpleHTTPRequestHandler):
def do_GET(self):
self.send_response(200)
self.end_headers()
self.wfile.write(json.dumps(jwks).encode())
socketserver.TCPServer(('',9999),H).serve_forever()
" &
# 3. Generate a valid token (node.js / python jwt library / any tool)
# 4. Configure conduit and benchmark with Authorization header
wrk -t8 -c200 -d30s -H "Authorization: Bearer <token>" http://localhost:8080//tmp/conduit-std -c /tmp/bench-proxy.yaml &
sleep 2
# RSS in kB
awk '/VmRSS/{print $2/1024 " MB"}' /proc/$(pgrep conduit)/status
kill %1
/tmp/conduit-full -c /tmp/bench-proxy.yaml &
sleep 2
awk '/VmRSS/{print $2/1024 " MB"}' /proc/$(pgrep conduit)/status
kill %1cargo benchRuns the criterion-based benchmarks in benches/.
Conduit was originally designed as a faster drop-in replacement for express-reverse-proxy. The comparison below is retained for historical context.
| Metric | express-reverse-proxy | express-reverse-proxy + PM2 ¹ | Conduit minimal |
|---|---|---|---|
| Req/s (static) | ~8,200 | ~82,000 ¹ | ~142,000 |
| Latency P50 | ~22 ms | ~5 ms ¹ | ~1.1 ms |
| Latency P99 | ~48 ms | ~32 ms ¹ | ~2.3 ms |
| Memory (idle) | ~58 MB | ~960 MB ¹ (16 × ~60 MB) | ~8 MB |
| Startup | ~420 ms | ~2,500 ms ¹ | ~28 ms |
¹ PM2 cluster numbers are estimated (see original note in methodology).
If you run Conduit on different hardware and get reproducible numbers, please open a PR editing this file. Include:
- OS, CPU model, RAM
-
wrkversion and exact flags used - Conduit version (
conduit --version) and feature flags - Config file used
- Upstream server used