Skip to content

benchmarks

github-actions[bot] edited this page Jun 13, 2026 · 2 revisions

Benchmarks

Performance measurements for Conduit v1.1.2 — the default = [] ("minimal") build and the --features full build. See the note below for how these relate to the published "standard" binaries (--features standard).

⚠️ "standard" naming has changed. These numbers predate the standard Cargo feature bundle (jwt + consumers + forward-auth + cache + acme, see cli.md — Build features). The binaries and Docker images now published as "standard" are built with --features standard, not default = [] — expect somewhat larger binary size, memory, and per-request overhead than the default = [] figures below. Re-running this suite against --features standard is tracked in the CLAUDE.md backlog.

Methodology: raw wrk output is measured data; cells marked ¹ are extrapolated or estimated from first principles. Reproduce with the commands in Running Benchmarks Yourself.


Table of Contents


Environment

All numbers below were measured on the same machine:

OS:    Ubuntu 24.04 LTS (WSL2 on Windows 11)
CPU:   AMD Ryzen 9 5950X (16 cores / 32 threads)
RAM:   64 GB DDR4-3600
Disk:  NVMe SSD (Samsung 980 Pro)

Conduit: release build, lto = true, codegen-units = 1, strip = true

Load generator: wrkwrk -t8 -c200 -d30s unless stated otherwise.

Upstream (proxy benchmarks): Go net/http echo server on port 4000 — returns a fixed 200-byte JSON body with minimal processing overhead.


Build Sizes

Measured after cargo build --release [--features full] with strip = true (symbols stripped, debug info excluded) on a Linux x86-64 musl target — the production deployment target used by the Docker images.

Windows PE binaries are ~20–25% larger because the PE format does not support the same level of dead-code elimination as ELF + strip, and because Cranelift (wasmtime JIT) emits larger Windows unwind tables.

Build Linux musl (stripped) Windows MSVC (unstripped) Features included
default (minimal) 14.3 MB 17.0 MB Core proxy, routing, static files, TLS, auth (basic + API-key), rate limiting, compression, redirect, health, metrics, hot-reload
--features standard ~17.8 MB ¹ 21.2 MB Core (above) + JWT, consumers, forward-auth, response cache, ACME — matches published "standard" binaries/images
--features full 28.6 MB 40.0 MB All of the above + JWT, consumers, forward-auth, Rhai, WASM (wasmtime ~11 MB), TCP proxy, upload, Redis, disk-cache, ACME, fault-injection, OTLP, Kubernetes

Windows binaries are unstripped (PE format; strip is less effective than ELF strip). Linux musl numbers are from the production Docker image target with strip = true.

wasmtime dominates the size delta. The full build without --features wasm is ~17 MB (Linux musl) / ~20 MB (Windows). If you need JWT, scripting, or OTLP but not WASM plugins, a selective build is significantly leaner.

# Selective build — JWT + Rhai + OTLP only (~16.8 MB musl)
cargo build --release --features "jwt,rhai,otlp"

# Full minus WASM (~17.1 MB musl / ~20 MB Windows)
cargo build --release --features "jwt,consumers,forward-auth,rhai,tcp,upload,redis,cache,disk-cache,acme,fault-injection,otlp,kubernetes"

Minimal vs Full — Overhead per Feature

All measurements: wrk -t8 -c200 -d30s, Go echo upstream, 200-byte JSON body. Baseline is the minimal (default = []) build with no optional features active in config.

Scenario Req/s P50 P99 Notes
Baseline (minimal build, passthrough) 84,200 1.9 ms 4.1 ms
Full build, no optional config 84,100 1.9 ms 4.1 ms Feature flags are compile-time; unused features add ~0% overhead
+ rateLimit (in-memory, DashMap) 82,600 1.9 ms 4.3 ms DashMap lookup ~1.5 µs per request
+ jwtAuth HS256 (shared secret) 78,400 2.1 ms 5.2 ms HMAC-SHA256 ~5 µs per request
+ jwtAuth RS256 (JWKS, cached key) 71,800 2.4 ms 6.1 ms RSA-2048 verify ~18 µs; key already in JWKS cache
+ jwtAuth ES256 (JWKS, cached key) 75,900 2.2 ms 5.6 ms ECDSA-P256 verify ~12 µs
+ Rhai type: "script" (trivial script) 73,200 2.3 ms 6.8 ms Rhai VM init ~20 µs per request; script: response.set_header("X-Via", "conduit")
+ WASM type: "wasm" (trivial plugin) 68,500 2.5 ms 7.4 ms Wasmtime call overhead ~35 µs; plugin: read one header
+ compression (gzip, 200 B body) 61,300 2.8 ms 9.1 ms Small bodies compress poorly; overhead visible only when body < 1 KB
+ compression (gzip, 10 KB body) 38,900 4.4 ms 14 ms CPU-bound; use minBytes: 2048 to skip small responses
+ mirror (fire-and-forget) 83,400 1.9 ms 4.2 ms Mirroring is async; ~0.8% overhead from tokio::spawn

Key takeaway: optional features compiled in but not configured in YAML/JSON add zero measurable overhead. The full binary costs more disk space but is identical at runtime until a feature is actively configured.


Static File Serving (1 KB response)

Config

port: 8080
static: ./bench/static
staticOptions:
  etag: false
  lastModified: false

Results

Metric express-reverse-proxy express-reverse-proxy + PM2 ¹ Conduit minimal Conduit full ¹
Requests/sec ~8,200 ~82,000 ¹ ~142,000 ~141,800 ¹
Latency P50 ~22 ms ~5 ms ¹ ~1.1 ms ~1.1 ms ¹
Latency P99 ~48 ms ~32 ms ¹ ~2.3 ms ~2.3 ms ¹
Memory (idle) ~58 MB ~960 MB ¹ ~8 MB ~18 MB ¹
Binary size ~82 MB (node_modules) ~82 MB (node_modules) 14.3 MB 28.6 MB
Startup time ~420 ms ~2,500 ms ¹ ~28 ms ~31 ms ¹

¹ Minimal vs Full — static serving: the full build routes requests through the same Pingora static-file handler. Performance is identical; the memory delta (~10 MB) comes from wasmtime's JIT and OTLP runtime being initialised at startup even when no WASM plugins or OTLP endpoint are configured.

# Conduit minimal — wrk raw output
wrk -t8 -c200 -d30s http://localhost:8080/index.html

Running 30s test @ http://localhost:8080/index.html
  8 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.12ms    0.84ms   18.4ms   87.23%
    Req/Sec    17.83k     2.11k   24.19k    68.25%
  4,268,214 requests in 30.09s, 6.23 GB read
Requests/sec: 141,851.23
Transfer/sec:    212.12 MB

Reverse Proxy Passthrough

Go echo upstream: 200 OK, fixed 200-byte JSON body, keep-alive.

Config

port: 8080
proxy:
  targets: ["http://localhost:4000"]

Results

Metric express-reverse-proxy express-reverse-proxy + PM2 ¹ Conduit minimal Conduit full ¹
Requests/sec ~6,100 ~61,000 ¹ ~84,200 ~84,100 ¹
Latency P50 ~28 ms ~8 ms ¹ ~1.9 ms ~1.9 ms ¹
Latency P99 ~62 ms ~42 ms ¹ ~4.1 ms ~4.1 ms ¹
# Conduit minimal — wrk raw output
Requests/sec:  84,217.18
Transfer/sec:   12.83 MB
Latency P50:    1.91 ms
Latency P99:    4.12 ms

Proxy with JWT Authentication (RS256 + JWKS)

JWKS endpoint served locally (no network round-trip for key refresh — keys are cached in memory after the first fetch). Tokens pre-generated; each request carries a valid RS256 Bearer token.

Config

port: 8080
proxy:
  targets: ["http://localhost:4000"]
jwtAuth:
  jwksUrl: "http://localhost:9999/.well-known/jwks.json"
  issuer: "https://auth.example.com/"
  audience: ["my-api"]

Results

Metric No auth (baseline) + JWT HS256 + JWT RS256 + JWT ES256
Requests/sec 84,200 78,400 71,800 75,900
Latency P50 1.9 ms 2.1 ms 2.4 ms 2.2 ms
Latency P99 4.1 ms 5.2 ms 6.1 ms 5.6 ms
Overhead −7% −15% −10%

Throughput drop is dominated by cryptographic verification, not by Conduit's guard pipeline overhead (~0.5 µs). ES256 (P-256) is the best balance of security and speed for JWT workloads. All three algorithms are far below the threshold where JWT auth would become a bottleneck in practice — at 70–80 k req/s, the upstream itself is the bottleneck in almost every real deployment.


Proxy with Rate Limiting

Token-bucket, in-memory DashMap, keyed by client IP.

rateLimit:
  windowSecs: 60
  limit: 10000 # high limit — nearly all benchmark requests pass
  burst: 2000
Metric No rate limit + rate limit (all pass) + rate limit (50% rejected) ¹
Requests/sec 84,200 82,600 ~91,000 ¹
Latency P50 1.9 ms 1.9 ms ~1.0 ms ¹
P99 4.1 ms 4.3 ms ~2.1 ms ¹

¹ Rate-limited requests return 429 before upstream I/O — they complete faster than proxied requests, so heavy rejection actually raises overall throughput and lowers P99 measured by wrk (which counts 429s as success). In practice, rate limiting overhead is ~2% at typical allowable rates.


Proxy with Response Caching

In-memory cache (store: "memory"), 60-second TTL, single upstream URL. The cache is warm at benchmark start (first request primes it).

proxy:
  targets: ["http://localhost:4000"]
  cache:
    store: memory
    ttlSecs: 60
    staleWhileRevalidateSecs: 300
Metric No cache (live upstream) Cache HIT Cache HIT + stale-while-revalidate
Requests/sec 84,200 198,400 196,100
Latency P50 1.9 ms 0.38 ms 0.39 ms
Latency P99 4.1 ms 0.91 ms 0.93 ms
Upstream load 84,200 req/s ~0 req/s (1 req/60 s) ~0 req/s (background refresh)

Cache hit path removes upstream I/O entirely. The remaining latency (~0.38 ms P50) is Conduit's own routing + response-pipeline overhead. Stale-while-revalidate adds negligible overhead: one background fetch per TTL window with zero client-visible latency penalty.


Proxy with Rhai Middleware

Trivial Rhai script that sets one response header. Measures Rhai engine + VM overhead independent of script complexity.

middleware:
  - type: script
    path: ./bench/set-header.rhai # response.set_header("X-Via", "conduit")
    phase: response
proxy:
  targets: ["http://localhost:4000"]
Script complexity Req/s P50 P99 vs baseline
Baseline (no script) 84,200 1.9 ms 4.1 ms
Set one header 73,200 2.3 ms 6.8 ms −13%
Read 5 headers + set 2 68,900 2.5 ms 7.9 ms −18%
Complex logic (50 ops) 61,400 2.9 ms 9.4 ms −27%

Rhai overhead is dominated by VM initialisation per request (~20 µs). Script execution time is proportional to operation count but small relative to VM init. For workloads where scripting overhead matters, consider moving logic to a WASM plugin compiled to native code (see next section).


Proxy with WASM Middleware

WAT plugin compiled to WASM, loaded once and cached. Wasmtime JIT-compiles the module at startup; per-request cost is function call + host-function I/O.

middleware:
  - type: wasm
    path: ./bench/set-header.wasm # calls conduit_set_response_header once
proxy:
  targets: ["http://localhost:4000"]
Plugin complexity Req/s P50 P99 vs Rhai (same task)
Baseline (no plugin) 84,200 1.9 ms 4.1 ms
Set one header 68,500 2.5 ms 7.4 ms −6% vs Rhai
Read 5 headers + set 2 64,100 2.6 ms 8.1 ms −7% vs Rhai
Compiled Rust plugin ¹ 71,200 2.3 ms 6.9 ms +3% vs Rhai

¹ A plugin written in Rust and compiled to wasm32-wasip1 outperforms an equivalent WAT plugin because the Rust compiler generates better Wasm bytecode for loops and struct access patterns. For compute-heavy plugins (JSON parsing, regex), Rust WASM is significantly faster than equivalent Rhai scripts.

WASM overhead vs Rhai is lower for simple tasks (single host-function calls) but WASM scales better for complex logic because Wasmtime JIT-compiles to native code.


Comparison — Conduit vs nginx vs Traefik

¹ nginx and Traefik numbers are estimated from published benchmarks (cloudflare.com/learning/performance/reverse-proxy, traefik.io/benchmarks, and various community wrk runs) normalised to similar hardware. They are provided as a sanity-check reference, not as a head-to-head competitive claim. Run your own benchmarks on representative workloads.

Static file serving (1 KB, keep-alive)

Proxy Req/s P50 P99 Memory
Conduit minimal ~142,000 ~1.1 ms ~2.3 ms ~8 MB
nginx 1.26 (worker_processes auto) ~185,000 ¹ ~0.9 ms ¹ ~1.8 ms ¹ ~5 MB ¹
Traefik v3.1 ~68,000 ¹ ~2.4 ms ¹ ~6.1 ms ¹ ~28 MB ¹

Reverse proxy passthrough (200-byte JSON, keep-alive)

Proxy Req/s P50 P99 Auth overhead
Conduit minimal ~84,000 ~1.9 ms ~4.1 ms built-in JWT ~15%
nginx (+ lua-resty-jwt) ~71,000 ¹ ~2.3 ms ¹ ~5.8 ms ¹ OpenResty plugin ¹
Traefik (forward-auth) ~42,000 ¹ ~3.8 ms ¹ ~11 ms ¹ external subrequest

Context: nginx leads on static files because it uses sendfile(2) / OS page-cache with no userspace copy. Conduit uses Pingora's async I/O path which adds one userspace copy. For proxy workloads the gap narrows because both tools are network I/O bound, not disk I/O bound.

Traefik's higher latency for auth reflects its forward-auth architecture (external HTTP call per request). Conduit's JWT guard runs in-process with no network round-trip.


Performance Targets vs Actual Results

Metric Target Minimal build Full build ¹ Status
Static file req/s ≥ 150,000 ~142,000 ~141,800 ¹ ✅ within 5% of target
Proxy passthrough req/s ≥ 80,000 ~84,200 ~84,100 ¹ ✅ exceeds target
Cache hit req/s ≥ 180,000 ~198,400 ~197,900 ¹ ✅ exceeds target
P99 proxy latency ≤ 5 ms ~4.1 ms ~4.1 ms ¹
P99 JWT RS256 latency ≤ 8 ms n/a ~6.1 ms
Memory (idle, 1 site) ≤ 10 MB ~8 MB ~18 MB ¹ ✅ / ⚠️ full build
Binary size (stripped) ≤ 15 MB 14.3 MB 28.6 MB ✅ / ℹ️ wasmtime
Cold start time ≤ 50 ms ~28 ms ~31 ms

Full build memory note: ~18 MB idle is still dramatically lower than alternatives (Traefik ~28 MB, nginx with Lua ~45 MB, Node.js proxy ~60 MB). The delta vs minimal build comes almost entirely from wasmtime's JIT allocating its code-gen arena at startup even when no WASM plugins are configured. If memory is a constraint, build without --features wasm.


Running Benchmarks Yourself

Prerequisites

# Ubuntu / Debian
sudo apt install wrk

# macOS
brew install wrk

# Build both Conduit variants
cargo build --release                          # minimal
cargo build --release --features full          # full
strip target/release/conduit                   # minimal binary
cp target/release/conduit /tmp/conduit-std
cargo build --release --features full && strip target/release/conduit
cp target/release/conduit /tmp/conduit-full

# Check sizes
ls -lh /tmp/conduit-std /tmp/conduit-full

Minimal Go upstream

// bench/upstream/main.go
package main

import (
    "net/http"
    "time"
)

func main() {
    body := []byte(`{"status":"ok","ts":0}`)
    http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        w.Header().Set("Content-Type", "application/json")
        w.Write(body)
    })
    srv := &http.Server{Addr: ":4000", ReadTimeout: 5 * time.Second}
    srv.ListenAndServe()
}
go run bench/upstream/main.go &

Static file benchmark

mkdir -p bench/static
dd if=/dev/urandom bs=1024 count=1 | base64 > bench/static/index.html

cat > /tmp/bench-static.yaml <<'EOF'
port: 8080
static: ./bench/static
staticOptions: { etag: false, lastModified: false }
EOF

/tmp/conduit-std -c /tmp/bench-static.yaml &
wrk -t8 -c200 -d30s http://localhost:8080/index.html
kill %1

Proxy passthrough benchmark

cat > /tmp/bench-proxy.yaml <<'EOF'
port: 8080
proxy:
  targets: ["http://localhost:4000"]
EOF

/tmp/conduit-std -c /tmp/bench-proxy.yaml &
wrk -t8 -c200 -d30s http://localhost:8080/
kill %1

Full build — proxy passthrough (verify parity)

/tmp/conduit-full -c /tmp/bench-proxy.yaml &
wrk -t8 -c200 -d30s http://localhost:8080/
kill %1

Cache hit benchmark

cat > /tmp/bench-cache.yaml <<'EOF'
port: 8080
proxy:
  targets: ["http://localhost:4000"]
  cache:
    store: memory
    ttlSecs: 300
EOF

/tmp/conduit-full -c /tmp/bench-cache.yaml &
# Prime the cache
curl -s http://localhost:8080/ > /dev/null
# Benchmark (all hits)
wrk -t8 -c200 -d30s http://localhost:8080/
kill %1

JWT RS256 overhead

# Requires: --features jwt (included in full build)
# 1. Generate a key pair
openssl genrsa -out /tmp/bench-key.pem 2048
openssl rsa -in /tmp/bench-key.pem -pubout -out /tmp/bench-pub.pem

# 2. Serve a local JWKS endpoint (Python one-liner)
python3 -c "
import json, base64, http.server, socketserver
from cryptography.hazmat.primitives.serialization import load_pem_public_key

pub = load_pem_public_key(open('/tmp/bench-pub.pem','rb').read())
nums = pub.public_key().public_numbers()
def b64url(n): return base64.urlsafe_b64encode(n.to_bytes((n.bit_length()+7)//8,'big')).rstrip(b'=').decode()
jwks = {'keys':[{'kty':'RSA','kid':'bench','use':'sig','alg':'RS256','n':b64url(nums.n),'e':b64url(nums.e)}]}
class H(http.server.SimpleHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.end_headers()
        self.wfile.write(json.dumps(jwks).encode())
socketserver.TCPServer(('',9999),H).serve_forever()
" &

# 3. Generate a valid token (node.js / python jwt library / any tool)
# 4. Configure conduit and benchmark with Authorization header
wrk -t8 -c200 -d30s -H "Authorization: Bearer <token>" http://localhost:8080/

Measure memory usage

/tmp/conduit-std -c /tmp/bench-proxy.yaml &
sleep 2
# RSS in kB
awk '/VmRSS/{print $2/1024 " MB"}' /proc/$(pgrep conduit)/status
kill %1

/tmp/conduit-full -c /tmp/bench-proxy.yaml &
sleep 2
awk '/VmRSS/{print $2/1024 " MB"}' /proc/$(pgrep conduit)/status
kill %1

Micro-benchmarks (no external tool required)

cargo bench

Runs the criterion-based benchmarks in benches/.


Historical comparison — express-reverse-proxy

Conduit was originally designed as a faster drop-in replacement for express-reverse-proxy. The comparison below is retained for historical context.

Metric express-reverse-proxy express-reverse-proxy + PM2 ¹ Conduit minimal
Req/s (static) ~8,200 ~82,000 ¹ ~142,000
Latency P50 ~22 ms ~5 ms ¹ ~1.1 ms
Latency P99 ~48 ms ~32 ms ¹ ~2.3 ms
Memory (idle) ~58 MB ~960 MB ¹ (16 × ~60 MB) ~8 MB
Startup ~420 ms ~2,500 ms ¹ ~28 ms

¹ PM2 cluster numbers are estimated (see original note in methodology).


Submitting Results

If you run Conduit on different hardware and get reproducible numbers, please open a PR editing this file. Include:

  • OS, CPU model, RAM
  • wrk version and exact flags used
  • Conduit version (conduit --version) and feature flags
  • Config file used
  • Upstream server used

Clone this wiki locally