Scientific benchmarking of programming language performance in Docker containerized environments using instrumented measurement to isolate application performance from container overhead.
Status: Complete - Fibonacci Benchmark with NASA Optimization Analysis
Ruchy v3.209.0 achieves world-class performance:
- Within 9% of Rust performance (21.0ms vs 19.28ms compute time)
- Produces smallest compiled Docker images (312-314 KB) - 26% smaller than Rust, 55% smaller than C
- 12.2× binary size reduction via
--optimize nasa(3.8 MB → 314 KB) - default in Dockerfiles - 37% of C's speed on recursive compute workloads
Source Code:
- Ruchy:
benchmarks/fibonacci/main.ruchy - C:
benchmarks/fibonacci/main.c - Rust:
benchmarks/fibonacci/main.rs - Go:
benchmarks/fibonacci/main.go - Deno:
benchmarks/fibonacci/main.ts - Python:
benchmarks/fibonacci/main.py - Julia:
benchmarks/fibonacci/main.jl
Environment: AMD Ryzen Threadripper 7960X (24-core), 125GB RAM, Ubuntu 22.04 Ruchy Version: v3.209.0 (NASA-grade optimization presets) Methodology: Instrumented measurement (embedded timing code) + CLI invocation benchmarks
| Language | Mean Time | Std Dev | vs C | Binary Size | Base Image | Docker Image |
|---|---|---|---|---|---|---|
| C (gcc -O3) | 10.77ms | ±0.59ms | 1.00× | 695 KB | scratch | 695 KB |
| Rust (rustc opt-3) | 21.81ms | ±0.65ms | 2.02× | 424 KB | scratch | 424 KB |
| Ruchy (compiled, NASA) | 22.47ms | ±0.56ms | 2.09× | 314 KB | scratch | 314 KB |
| Ruchy (transpiled) | 23.68ms | ±0.60ms | 2.20× | 312 KB | scratch | 312 KB |
| Go (1.23) | 38.04ms | ±0.71ms | 3.53× | 1.41 MB | scratch | 1.41 MB |
| Deno (2.1.4, TypeScript) | 70.11ms | ±3.40ms | 6.51× | 87 MB | debian:12-slim | 256 MB |
| Julia (1.10) 🧪 | 252.91ms | ±4.58ms | 23.47× | N/A | julia:1.10-bullseye | 711 MB |
| Python (3.12) | 697.49ms | ±14.94ms | 64.78× | 1.3 KB | python:3.12-slim | 119 MB |
🧪 Julia is EXPERIMENTAL: JIT warmup affects startup time
| Language | Compute Time | Slowdown vs C | Slowdown vs Rust | Binary Size |
|---|---|---|---|---|
| C (gcc -O3) | 7.83ms | 1.00× | 0.41× (2.46× faster) | 695 KB |
| Rust (rustc opt-3) | 19.28ms | 2.46× | 1.00× | 424 KB |
| Ruchy (transpiled) | 21.11ms | 2.70× | 1.09× | 312 KB |
| Ruchy (compiled, NASA) | 21.00ms | 2.68× | 1.09× | 314 KB |
| Deno (2.1.4, TypeScript) | 69.28ms | 8.85× | 3.59× | 87 MB |
Ruchy achieves within 9% of Rust performance (21.0ms vs 19.28ms compute time) with smallest compiled Docker images (312-314 KB)
Performance Visualization (Execution Time):
C █ 10.77ms
Rust ██ 21.81ms
Ruchy (C) ██ 22.47ms
Ruchy (T) ██ 23.68ms
Go ███ 38.04ms
Deno ███████ 70.11ms
Julia 🧪 █████████████████████████ 252.91ms
Python ██████████████████████████████████████████████████████████████████████ 697.49ms
|---------|---------|---------|---------|---------|---------|---------
100 200 300 400 500 600ms
Binary Size Comparison (Docker images):
Ruchy (T) █ 312 KB
Ruchy (C) █ 314 KB
Rust █ 424 KB
C █ 695 KB
Go █ 1 MB
Python ███████████ 119 MB
Deno █████████████████████████ 256 MB
Julia 🧪 ██████████████████████████████████████████████████████████████████████ 711 MB
----------------------------------------------------------------------
177 355 533 MB
Observations:
- Ruchy achieves within 9% of Rust performance (21.0ms vs 19.28ms compute time)
- Ruchy produces smallest compiled Docker images (312-314 KB) - 26% smaller than Rust, 55% smaller than C
- 12.2× binary size reduction with
--optimize nasa(3.8 MB → 314 KB) - default in Dockerfiles - Ruchy compiled achieves 37% of C's speed on recursive algorithms
- scratch base migration reduced image sizes by 60-83% (e.g., Rust: 2.51 MB → 424 KB)
- Deno bundles V8 runtime in standalone binary (87 MB), achieving 70.11ms execution time (6.5× slower than C, 3.6× slower than Rust)
- Python requires full interpreter runtime (119 MB image)
- Julia (EXPERIMENTAL) has large JIT runtime (711 MB) and slower startup due to JIT warmup
- All compiled languages achieve sub-40ms CLI invocation times, Deno achieves sub-75ms
- Low standard deviations (< ±4ms) indicate stable, reproducible measurements
- Instrumented measurement isolates pure compute time from startup overhead
Ruchy v3.209.0 introduces NASA-grade optimization presets:
# Development/debugging (fastest compilation)
ruchy compile main.ruchy --optimize none # 3.7 MB
# Production default (balanced)
ruchy compile main.ruchy --optimize balanced # 1.9 MB (48.6% reduction)
# Lambda/Docker (aggressive)
ruchy compile main.ruchy --optimize aggressive # 327 KB (91.2% reduction)
# Maximum optimization
ruchy compile main.ruchy --optimize nasa # 327 KB (91.2% reduction)
# Show applied flags
ruchy compile main.ruchy --optimize nasa --verbose
# Export metrics for CI/CD
ruchy compile main.ruchy --optimize nasa --json metrics.json| Level | Binary Size | Reduction | Use Case |
|---|---|---|---|
--optimize none |
3.7 MB | 0% | Development/debugging |
--optimize balanced |
1.9 MB | 48.6% | Production default |
--optimize aggressive |
327 KB | 91.2% | Lambda/Docker |
--optimize nasa |
327 KB | 91.2% | Maximum optimization |
Key Finding: NASA/aggressive optimization provides 11.3× binary size reduction (3.7 MB → 327 KB) with minimal performance trade-off
Two complementary benchmarking modes:
- Docker Benchmarks: Instrumented measurement inside containers isolates application compute time from Docker daemon overhead
- CLI Benchmarks: Full process invocation using
bashrs benchmeasures startup + compute + teardown (3 warmup runs, 10 measured iterations)
- Instrumented Timing: Embedded measurement code separates startup from compute phases
- Statistical Analysis: Multiple aggregation metrics (geometric mean, arithmetic mean, harmonic mean), MAD-based outlier detection
- Multi-Stage Docker Builds:
FROM scratchbase images for statically-linked binaries (60-83% size reduction from distroless) - NASA-Grade Optimization: Ruchy v3.209.0 optimization presets achieve 91.2% binary size reduction
- Reproducibility: Fixed compiler versions via rust-toolchain.toml, locked dependencies
-
docker-runtime-benchmarking-spec.md (1,321 lines)
- Complete benchmarking methodology
- Instrumented measurement approach
- Statistical rigor (geometric/arithmetic/harmonic mean)
- EXTREME TDD strategy
- 10 peer-reviewed citations
-
PEER_REVIEW_RESPONSE.md (452 lines)
- Critical peer review by Gemini AI (November 5, 2025)
- Toyota Way principles (Genchi Genbutsu, Kaizen, Jidoka)
- 7 major improvements implemented
-
README.md (Quick reference)
Exploration-phase analysis documents that informed the design of this framework:
-
RUCHY_BENCHMARK_ANALYSIS.md (24KB)
- Comprehensive technical analysis of the original ruchy-book benchmark suite
- 12 sections covering methodology, infrastructure, and results
- DLS 2016 "Are We Fast Yet?" methodology reference
-
- Quick reference for all 10 benchmarks from the original suite
- Performance categories and key results
-
BENCHMARK_EXPLORATION_SUMMARY.txt
- Exploration summary of 69 benchmark program files
- Rationale for selecting Fibonacci and Prime Sieve
-
README.md - Guide to analysis documents
- Docker v24.0+
- Rust v1.83+
- Go v1.23+
- Python v3.12+
- GCC v13+
- Julia v1.10+
- Deno v2.1.4+
make build-imagesmake bench BENCHMARK=fibonacci LANGUAGE=ruchy-transpiledmake bench-all# Install bashrs (scientific CLI benchmarking tool)
make install-tools
# Run CLI benchmarks (measures full process invocation)
make bench-cliDocker vs CLI Benchmarking:
- Docker benchmarks: Measure compute time inside containers (isolates application from Docker overhead)
- CLI benchmarks: Measure full process invocation (startup + compute + teardown) using bashrs bench
- bashrs: Scientific benchmarking with warmup runs, statistical analysis, memory measurement, and determinism verification
make quality # format, lint, test, coverage, mutation, complexityThis section provides exact steps to replicate all benchmark results from scratch.
Required:
# Docker (v24.0+)
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
# Rust toolchain (v1.83+)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Go (v1.23+)
sudo apt-get install golang-go
# Python (v3.12+)
sudo apt-get install python3.12 python3.12-dev
# GCC (v13+)
sudo apt-get install build-essential gcc-13
# Deno (v2.1.4+)
curl -fsSL https://deno.land/install.sh | shOptional (for full language coverage):
# Julia (v1.10+) - EXPERIMENTAL
wget https://julialang-s3.julialang.org/bin/linux/x64/1.10/julia-1.10.0-linux-x86_64.tar.gz
tar zxvf julia-1.10.0-linux-x86_64.tar.gz
sudo ln -s $(pwd)/julia-1.10.0/bin/julia /usr/local/bin/julia
# bashrs (for CLI benchmarking)
cargo install bashrsgit clone https://github.com/paiml/ruchy-docker.git
cd ruchy-dockerOption A: Build all images (recommended for full comparison)
# Builds all 16 containers (2 benchmarks × 8 languages)
make build-images
# This runs:
# - docker build for each Dockerfile
# - Multi-stage builds with FROM scratch optimization
# - Estimated time: 10-15 minutes (first run)Option B: Build specific benchmark images
# Build just Fibonacci benchmark (8 languages)
make build-fibonacci
# Or build individual images
docker build -f docker/c/fibonacci.Dockerfile -t c:fibonacci .
docker build -f docker/rust/fibonacci.Dockerfile -t rust:fibonacci .
docker build -f docker/deno/fibonacci.Dockerfile -t deno:fibonacci .
docker build -f docker/go/fibonacci.Dockerfile -t go:fibonacci .
docker build -f docker/python/fibonacci.Dockerfile -t python:fibonacci .
docker build -f docker/julia/fibonacci.Dockerfile -t julia:fibonacci .
# Ruchy images (requires ruchy repository as sibling directory)
cd .. && docker build -f ruchy-docker/docker/ruchy-transpiled/fibonacci.Dockerfile -t ruchy-transpiled:fibonacci .
cd .. && docker build -f ruchy-docker/docker/ruchy-compiled/fibonacci.Dockerfile -t ruchy-compiled:fibonacci .
cd ruchy-docker# Check all images exist
docker images | grep -E "(fibonacci|primes)"
# Expected output:
# c fibonacci <hash> ... 695 KB
# rust fibonacci <hash> ... 424 KB
# deno fibonacci <hash> ... 256 MB
# go fibonacci <hash> ... 1.41 MB
# python fibonacci <hash> ... 119 MB
# julia fibonacci <hash> ... 711 MB
# ruchy-transpiled fibonacci <hash> ... 312 KB
# ruchy-compiled fibonacci <hash> ... 314 KBSingle benchmark:
# Run specific language/benchmark combination
make bench BENCHMARK=fibonacci LANGUAGE=c
# Output shows instrumented timing:
# STARTUP_TIME_US: 20
# COMPUTE_TIME_US: 7872
# RESULT: 9227465Run all combinations:
# Runs all 16 containers (2 benchmarks × 8 languages)
make bench-all
# This executes:
# - docker run for each image
# - Captures instrumented output (STARTUP_TIME_US, COMPUTE_TIME_US, RESULT)
# - Saves results to results/ directoryManual single run (for verification):
# Run any benchmark manually
docker run --rm c:fibonacci
docker run --rm rust:fibonacci
docker run --rm deno:fibonacci
# Compare results - all should output:
# RESULT: 9227465 (for fibonacci)
# RESULT: 9592 (for primes)CLI benchmarks measure the complete process lifecycle: startup + compute + teardown.
Install bashrs (if not already installed):
cargo install bashrs
# Or use project Makefile
make install-toolsExtract binaries from Docker images:
# This extracts compiled binaries from containers to bin/ directory
make extract-binaries
# Binaries are copied from:
# - c:fibonacci → bin/fibonacci_c
# - rust:fibonacci → bin/fibonacci_rust
# - deno:fibonacci → bin/fibonacci_deno
# - etc.Run CLI benchmarks:
# Run full CLI benchmark suite with bashrs
make bench-cli
# This executes:
# - 3 warmup iterations (disk cache, page cache)
# - 10 measured iterations
# - Statistical analysis (mean, std dev, outliers)
# - Results saved to results/cli/fibonacci-cli-bench.json
# Expected output:
# 📊 Benchmarking: bin/fibonacci_c.sh
# 🔥 Warmup (3 iterations)...
# ⏱️ Measuring (10 iterations)...
# 📈 Results: Mean: 10.77ms ± 0.59msManual CLI run (for verification):
# Run extracted binary directly
./bin/fibonacci_c
# Or via wrapper scripts
./bin/fibonacci_c.sh
./bin/fibonacci_rust.sh
./bin/fibonacci_deno.shView JSON results:
# CLI benchmark results
cat results/cli/fibonacci-cli-bench.json
# Docker benchmark results
ls results/*.jsonGenerate charts:
# Preview ASCII charts (stdout)
make charts
# Update README.md with new charts (creates backup)
make update-readmeCompare your results with the published data in README.md tables:
Expected compute times (±10% tolerance for hardware differences):
- C: ~7-10ms
- Rust: ~18-22ms
- Ruchy (compiled/transpiled): ~20-24ms
- Deno: ~65-75ms
Expected CLI times (±15% tolerance):
- C: ~10-12ms
- Rust: ~20-24ms
- Ruchy: ~22-26ms
- Deno: ~65-80ms
Binary sizes (should match exactly):
- Ruchy (transpiled): 312 KB
- Ruchy (compiled): 314 KB
- Rust: 424 KB
- C: 695 KB
Issue: Docker build fails with "Illegal option -o pipefail"
# This should be fixed in latest Dockerfiles
# If you see this error, ensure you have latest version:
git pull origin mainIssue: Ruchy images fail to build
# Ruchy Dockerfiles expect ruchy repository as sibling directory
cd ..
git clone https://github.com/paiml/ruchy.git
cd ruchy-docker
# Build from parent directory
cd .. && docker build -f ruchy-docker/docker/ruchy-transpiled/fibonacci.Dockerfile -t ruchy-transpiled:fibonacci .Issue: bashrs not found
# Install bashrs from crates.io
cargo install bashrs
# Or build from source
git clone https://github.com/paiml/rash.git
cd rash
cargo install --path .Issue: Results differ significantly from README
- Hardware differences are expected (±10-15% variation)
- Ensure CPU governor is set to "performance" mode:
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
- Close background applications to reduce noise
- Disable swap for consistent measurements:
sudo swapoff -a
# Set CPU governor to performance (not powersave)
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# Disable swap
sudo swapoff -a
# Disable turbo boost for consistent measurements
echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo
# Capture environment metadata
./scripts/capture-environment.sh > results/environment.jsonCompleted:
- ✅ Recursive Fibonacci (fib(35) = 9,227,465) - 8 languages (C, Rust, Go, Python, Julia, Deno, Ruchy compiled, Ruchy transpiled)
- ✅ Prime Sieve (100,000 primes = 9,592) - 8 languages
In Progress:
- 🔄 Array Sum (10M elements)
- 🔄 Matrix Multiply (128×128)
- 🔄 HashMap Operations (100K inserts/lookups)
- 🔄 File I/O (100MB, fio-based)
- 🔄 Startup Time ("hello world")
- 🔄 HTTP Server (wrk load testing)
Implementation Target: 8 benchmarks × 8 languages = 64 Docker containers Current Progress: 2 benchmarks × 8 languages = 16 Docker containers
- Blackburn et al. (OOPSLA 2007) - Benchmark evaluation
- Felter et al. (USENIX ATC 2015) - Container performance
- Gregg (2020) - BPF Performance Tools
- Kalibera & Jones (PLDI 2013) - JIT benchmarking
- Fleming & Wallace (1986) - Benchmark summarization
- Eyerman & Eeckhout (IEEE CL 2018) - Geometric mean critique
- Akinshin (2021) - Performance analysis
- Chen & Patterson (1994) - I/O evaluation
- Lockwood (2016) - I/O benchmarking
- Mytkowicz et al. (ASPLOS 2009) - Benchmark pitfalls
- Genchi Genbutsu (Go and See): Instrumented measurement, perf stat
- Kaizen (Continuous Improvement): Multiple metrics, enhanced I/O
- Jidoka (Automation with Human Touch): Andon Cord quality gates
- Test Coverage: ≥85%
- Mutation Score: ≥85%
- Property-Based Testing: proptest (Rust), Hypothesis (Python)
- Fuzz Testing: cargo-fuzz, 1M executions, zero crashes
- Complexity: ≤15 (cyclomatic), ≤20 (cognitive)
make quality # Must pass before merge
├── format # cargo fmt, black
├── lint # clippy -D warnings, pylint
├── test # cargo test, pytest
├── coverage # ≥85% or fail
└── mutation # ≥85% or fail- ✅ Specification v2.0.0 (peer-reviewed, 10 citations)
- ✅ Fibonacci benchmark (8 languages: C, Rust, Go, Python, Julia, Deno, Ruchy compiled, Ruchy transpiled)
- ✅ Prime Sieve benchmark (8 languages: C, Rust, Go, Python, Julia, Deno, Ruchy compiled, Ruchy transpiled)
- ✅ CLI benchmarking integration (bashrs bench)
- ✅ Docker multi-stage builds (distroless runtime)
- 🔄 Remaining benchmarks (6 workloads)
- ⏳ Test framework implementation
- ⏳ Statistical analysis tooling
- ruchy-lambda: AWS Lambda cold start benchmarking
- ruchy-book: Local execution benchmarking
This framework builds on peer-reviewed research:
- Blackburn et al. (OOPSLA 2007) - Benchmark evaluation methodology
- Felter et al. (USENIX ATC 2015) - Container performance analysis
- Kalibera & Jones (PLDI 2013) - Rigorous JIT benchmarking
- Gregg (2020) - Performance measurement tools
- Mytkowicz et al. (ASPLOS 2009) - Measurement pitfalls
Complete citations in docker-runtime-benchmarking-spec.md.
Understanding what you're actually shipping:
Key Takeaway: Compiled languages ship only your code as machine instructions (<1 MB). Interpreted languages ship the entire runtime + interpreter + libraries (177+ MB for Python+NumPy).
This is why our benchmarks show:
- C, Rust, Ruchy, Go: <2 MB Docker images (FROM scratch)
- Deno: 87 MB (embedded V8 runtime)
- Python: 119 MB (interpreter + stdlib + OS)
- Julia: 711 MB (JIT compiler + runtime)
MIT (see LICENSE file)
Version: 1.0.0 Last Updated: November 7, 2025 Status: Complete - 2 benchmarks × 8 languages = 16 Docker containers
