Skip to content

paiml/ruchy-docker

Repository files navigation

ruchy-docker

Docker Runtime Benchmarking Framework

Scientific benchmarking of programming language performance in Docker containerized environments using instrumented measurement to isolate application performance from container overhead.

Status: Complete - Fibonacci Benchmark with NASA Optimization Analysis


Key Findings

Ruchy v3.209.0 achieves world-class performance:

  • Within 9% of Rust performance (21.0ms vs 19.28ms compute time)
  • Produces smallest compiled Docker images (312-314 KB) - 26% smaller than Rust, 55% smaller than C
  • 12.2× binary size reduction via --optimize nasa (3.8 MB → 314 KB) - default in Dockerfiles
  • 37% of C's speed on recursive compute workloads

Benchmark Results

Fibonacci (fib(35) = 9,227,465)

Source Code:

Environment: AMD Ryzen Threadripper 7960X (24-core), 125GB RAM, Ubuntu 22.04 Ruchy Version: v3.209.0 (NASA-grade optimization presets) Methodology: Instrumented measurement (embedded timing code) + CLI invocation benchmarks

CLI Invocation (Full Process with bashrs bench)

Language Mean Time Std Dev vs C Binary Size Base Image Docker Image
C (gcc -O3) 10.77ms ±0.59ms 1.00× 695 KB scratch 695 KB
Rust (rustc opt-3) 21.81ms ±0.65ms 2.02× 424 KB scratch 424 KB
Ruchy (compiled, NASA) 22.47ms ±0.56ms 2.09× 314 KB scratch 314 KB
Ruchy (transpiled) 23.68ms ±0.60ms 2.20× 312 KB scratch 312 KB
Go (1.23) 38.04ms ±0.71ms 3.53× 1.41 MB scratch 1.41 MB
Deno (2.1.4, TypeScript) 70.11ms ±3.40ms 6.51× 87 MB debian:12-slim 256 MB
Julia (1.10) 🧪 252.91ms ±4.58ms 23.47× N/A julia:1.10-bullseye 711 MB
Python (3.12) 697.49ms ±14.94ms 64.78× 1.3 KB python:3.12-slim 119 MB

🧪 Julia is EXPERIMENTAL: JIT warmup affects startup time

Pure Compute Time (Instrumented Measurement)

Language Compute Time Slowdown vs C Slowdown vs Rust Binary Size
C (gcc -O3) 7.83ms 1.00× 0.41× (2.46× faster) 695 KB
Rust (rustc opt-3) 19.28ms 2.46× 1.00× 424 KB
Ruchy (transpiled) 21.11ms 2.70× 1.09× 312 KB
Ruchy (compiled, NASA) 21.00ms 2.68× 1.09× 314 KB
Deno (2.1.4, TypeScript) 69.28ms 8.85× 3.59× 87 MB

Ruchy achieves within 9% of Rust performance (21.0ms vs 19.28ms compute time) with smallest compiled Docker images (312-314 KB)

Performance Visualization (Execution Time):

C           █ 10.77ms
Rust        ██ 21.81ms
Ruchy (C)   ██ 22.47ms
Ruchy (T)   ██ 23.68ms
Go          ███ 38.04ms
Deno        ███████ 70.11ms
Julia 🧪    █████████████████████████ 252.91ms
Python      ██████████████████████████████████████████████████████████████████████ 697.49ms
            |---------|---------|---------|---------|---------|---------|---------
                     100       200       300       400       500       600ms

Binary Size Comparison (Docker images):

Ruchy (T)   █ 312 KB
Ruchy (C)   █ 314 KB
Rust        █ 424 KB
C           █ 695 KB
Go          █ 1 MB
Python      ███████████ 119 MB
Deno        █████████████████████████ 256 MB
Julia 🧪    ██████████████████████████████████████████████████████████████████████ 711 MB
            ----------------------------------------------------------------------
                             177               355              533 MB

Observations:

  • Ruchy achieves within 9% of Rust performance (21.0ms vs 19.28ms compute time)
  • Ruchy produces smallest compiled Docker images (312-314 KB) - 26% smaller than Rust, 55% smaller than C
  • 12.2× binary size reduction with --optimize nasa (3.8 MB → 314 KB) - default in Dockerfiles
  • Ruchy compiled achieves 37% of C's speed on recursive algorithms
  • scratch base migration reduced image sizes by 60-83% (e.g., Rust: 2.51 MB → 424 KB)
  • Deno bundles V8 runtime in standalone binary (87 MB), achieving 70.11ms execution time (6.5× slower than C, 3.6× slower than Rust)
  • Python requires full interpreter runtime (119 MB image)
  • Julia (EXPERIMENTAL) has large JIT runtime (711 MB) and slower startup due to JIT warmup
  • All compiled languages achieve sub-40ms CLI invocation times, Deno achieves sub-75ms
  • Low standard deviations (< ±4ms) indicate stable, reproducible measurements
  • Instrumented measurement isolates pure compute time from startup overhead

Ruchy Optimization Levels (v3.209.0)

Ruchy v3.209.0 introduces NASA-grade optimization presets:

# Development/debugging (fastest compilation)
ruchy compile main.ruchy --optimize none         # 3.7 MB

# Production default (balanced)
ruchy compile main.ruchy --optimize balanced     # 1.9 MB (48.6% reduction)

# Lambda/Docker (aggressive)
ruchy compile main.ruchy --optimize aggressive   # 327 KB (91.2% reduction)

# Maximum optimization
ruchy compile main.ruchy --optimize nasa         # 327 KB (91.2% reduction)

# Show applied flags
ruchy compile main.ruchy --optimize nasa --verbose

# Export metrics for CI/CD
ruchy compile main.ruchy --optimize nasa --json metrics.json
Level Binary Size Reduction Use Case
--optimize none 3.7 MB 0% Development/debugging
--optimize balanced 1.9 MB 48.6% Production default
--optimize aggressive 327 KB 91.2% Lambda/Docker
--optimize nasa 327 KB 91.2% Maximum optimization

Key Finding: NASA/aggressive optimization provides 11.3× binary size reduction (3.7 MB → 327 KB) with minimal performance trade-off


Methodology

Measurement Approach

Two complementary benchmarking modes:

  1. Docker Benchmarks: Instrumented measurement inside containers isolates application compute time from Docker daemon overhead
  2. CLI Benchmarks: Full process invocation using bashrs bench measures startup + compute + teardown (3 warmup runs, 10 measured iterations)

Framework Features

  • Instrumented Timing: Embedded measurement code separates startup from compute phases
  • Statistical Analysis: Multiple aggregation metrics (geometric mean, arithmetic mean, harmonic mean), MAD-based outlier detection
  • Multi-Stage Docker Builds: FROM scratch base images for statically-linked binaries (60-83% size reduction from distroless)
  • NASA-Grade Optimization: Ruchy v3.209.0 optimization presets achieve 91.2% binary size reduction
  • Reproducibility: Fixed compiler versions via rust-toolchain.toml, locked dependencies

Documentation

Specifications (docs/specifications/)

  1. docker-runtime-benchmarking-spec.md (1,321 lines)

    • Complete benchmarking methodology
    • Instrumented measurement approach
    • Statistical rigor (geometric/arithmetic/harmonic mean)
    • EXTREME TDD strategy
    • 10 peer-reviewed citations
  2. PEER_REVIEW_RESPONSE.md (452 lines)

    • Critical peer review by Gemini AI (November 5, 2025)
    • Toyota Way principles (Genchi Genbutsu, Kaizen, Jidoka)
    • 7 major improvements implemented
  3. README.md (Quick reference)

Analysis & Historical Context (docs/analysis/)

Exploration-phase analysis documents that informed the design of this framework:

  1. RUCHY_BENCHMARK_ANALYSIS.md (24KB)

    • Comprehensive technical analysis of the original ruchy-book benchmark suite
    • 12 sections covering methodology, infrastructure, and results
    • DLS 2016 "Are We Fast Yet?" methodology reference
  2. BENCHMARK_QUICK_REFERENCE.md

    • Quick reference for all 10 benchmarks from the original suite
    • Performance categories and key results
  3. BENCHMARK_EXPLORATION_SUMMARY.txt

    • Exploration summary of 69 benchmark program files
    • Rationale for selecting Fibonacci and Prime Sieve
  4. README.md - Guide to analysis documents


Quick Start

Prerequisites

  • Docker v24.0+
  • Rust v1.83+
  • Go v1.23+
  • Python v3.12+
  • GCC v13+
  • Julia v1.10+
  • Deno v2.1.4+

Build All Images

make build-images

Run Single Docker Benchmark

make bench BENCHMARK=fibonacci LANGUAGE=ruchy-transpiled

Run Full Docker Suite

make bench-all

Run CLI Benchmarks (bashrs bench)

# Install bashrs (scientific CLI benchmarking tool)
make install-tools

# Run CLI benchmarks (measures full process invocation)
make bench-cli

Docker vs CLI Benchmarking:

  • Docker benchmarks: Measure compute time inside containers (isolates application from Docker overhead)
  • CLI benchmarks: Measure full process invocation (startup + compute + teardown) using bashrs bench
  • bashrs: Scientific benchmarking with warmup runs, statistical analysis, memory measurement, and determinism verification

Quality Gates

make quality  # format, lint, test, coverage, mutation, complexity

Reproducing These Benchmarks

Complete Reproduction Steps

This section provides exact steps to replicate all benchmark results from scratch.

1. Prerequisites Installation

Required:

# Docker (v24.0+)
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh

# Rust toolchain (v1.83+)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Go (v1.23+)
sudo apt-get install golang-go

# Python (v3.12+)
sudo apt-get install python3.12 python3.12-dev

# GCC (v13+)
sudo apt-get install build-essential gcc-13

# Deno (v2.1.4+)
curl -fsSL https://deno.land/install.sh | sh

Optional (for full language coverage):

# Julia (v1.10+) - EXPERIMENTAL
wget https://julialang-s3.julialang.org/bin/linux/x64/1.10/julia-1.10.0-linux-x86_64.tar.gz
tar zxvf julia-1.10.0-linux-x86_64.tar.gz
sudo ln -s $(pwd)/julia-1.10.0/bin/julia /usr/local/bin/julia

# bashrs (for CLI benchmarking)
cargo install bashrs

2. Clone and Setup Repository

git clone https://github.com/paiml/ruchy-docker.git
cd ruchy-docker

3. Build Docker Images

Option A: Build all images (recommended for full comparison)

# Builds all 16 containers (2 benchmarks × 8 languages)
make build-images

# This runs:
# - docker build for each Dockerfile
# - Multi-stage builds with FROM scratch optimization
# - Estimated time: 10-15 minutes (first run)

Option B: Build specific benchmark images

# Build just Fibonacci benchmark (8 languages)
make build-fibonacci

# Or build individual images
docker build -f docker/c/fibonacci.Dockerfile -t c:fibonacci .
docker build -f docker/rust/fibonacci.Dockerfile -t rust:fibonacci .
docker build -f docker/deno/fibonacci.Dockerfile -t deno:fibonacci .
docker build -f docker/go/fibonacci.Dockerfile -t go:fibonacci .
docker build -f docker/python/fibonacci.Dockerfile -t python:fibonacci .
docker build -f docker/julia/fibonacci.Dockerfile -t julia:fibonacci .

# Ruchy images (requires ruchy repository as sibling directory)
cd .. && docker build -f ruchy-docker/docker/ruchy-transpiled/fibonacci.Dockerfile -t ruchy-transpiled:fibonacci .
cd .. && docker build -f ruchy-docker/docker/ruchy-compiled/fibonacci.Dockerfile -t ruchy-compiled:fibonacci .
cd ruchy-docker

4. Verify Images Built Successfully

# Check all images exist
docker images | grep -E "(fibonacci|primes)"

# Expected output:
# c                    fibonacci    <hash>   ...   695 KB
# rust                 fibonacci    <hash>   ...   424 KB
# deno                 fibonacci    <hash>   ...   256 MB
# go                   fibonacci    <hash>   ...   1.41 MB
# python               fibonacci    <hash>   ...   119 MB
# julia                fibonacci    <hash>   ...   711 MB
# ruchy-transpiled     fibonacci    <hash>   ...   312 KB
# ruchy-compiled       fibonacci    <hash>   ...   314 KB

5. Run Docker Benchmarks (Instrumented Measurement)

Single benchmark:

# Run specific language/benchmark combination
make bench BENCHMARK=fibonacci LANGUAGE=c

# Output shows instrumented timing:
# STARTUP_TIME_US: 20
# COMPUTE_TIME_US: 7872
# RESULT: 9227465

Run all combinations:

# Runs all 16 containers (2 benchmarks × 8 languages)
make bench-all

# This executes:
# - docker run for each image
# - Captures instrumented output (STARTUP_TIME_US, COMPUTE_TIME_US, RESULT)
# - Saves results to results/ directory

Manual single run (for verification):

# Run any benchmark manually
docker run --rm c:fibonacci
docker run --rm rust:fibonacci
docker run --rm deno:fibonacci

# Compare results - all should output:
# RESULT: 9227465  (for fibonacci)
# RESULT: 9592     (for primes)

6. Run CLI Benchmarks (Full Process Invocation)

CLI benchmarks measure the complete process lifecycle: startup + compute + teardown.

Install bashrs (if not already installed):

cargo install bashrs

# Or use project Makefile
make install-tools

Extract binaries from Docker images:

# This extracts compiled binaries from containers to bin/ directory
make extract-binaries

# Binaries are copied from:
# - c:fibonacci → bin/fibonacci_c
# - rust:fibonacci → bin/fibonacci_rust
# - deno:fibonacci → bin/fibonacci_deno
# - etc.

Run CLI benchmarks:

# Run full CLI benchmark suite with bashrs
make bench-cli

# This executes:
# - 3 warmup iterations (disk cache, page cache)
# - 10 measured iterations
# - Statistical analysis (mean, std dev, outliers)
# - Results saved to results/cli/fibonacci-cli-bench.json

# Expected output:
# 📊 Benchmarking: bin/fibonacci_c.sh
# 🔥 Warmup (3 iterations)...
# ⏱️  Measuring (10 iterations)...
# 📈 Results: Mean: 10.77ms ± 0.59ms

Manual CLI run (for verification):

# Run extracted binary directly
./bin/fibonacci_c

# Or via wrapper scripts
./bin/fibonacci_c.sh
./bin/fibonacci_rust.sh
./bin/fibonacci_deno.sh

7. Analyze Results

View JSON results:

# CLI benchmark results
cat results/cli/fibonacci-cli-bench.json

# Docker benchmark results
ls results/*.json

Generate charts:

# Preview ASCII charts (stdout)
make charts

# Update README.md with new charts (creates backup)
make update-readme

8. Validate Results Match README

Compare your results with the published data in README.md tables:

Expected compute times (±10% tolerance for hardware differences):

  • C: ~7-10ms
  • Rust: ~18-22ms
  • Ruchy (compiled/transpiled): ~20-24ms
  • Deno: ~65-75ms

Expected CLI times (±15% tolerance):

  • C: ~10-12ms
  • Rust: ~20-24ms
  • Ruchy: ~22-26ms
  • Deno: ~65-80ms

Binary sizes (should match exactly):

  • Ruchy (transpiled): 312 KB
  • Ruchy (compiled): 314 KB
  • Rust: 424 KB
  • C: 695 KB

Troubleshooting

Issue: Docker build fails with "Illegal option -o pipefail"

# This should be fixed in latest Dockerfiles
# If you see this error, ensure you have latest version:
git pull origin main

Issue: Ruchy images fail to build

# Ruchy Dockerfiles expect ruchy repository as sibling directory
cd ..
git clone https://github.com/paiml/ruchy.git
cd ruchy-docker

# Build from parent directory
cd .. && docker build -f ruchy-docker/docker/ruchy-transpiled/fibonacci.Dockerfile -t ruchy-transpiled:fibonacci .

Issue: bashrs not found

# Install bashrs from crates.io
cargo install bashrs

# Or build from source
git clone https://github.com/paiml/rash.git
cd rash
cargo install --path .

Issue: Results differ significantly from README

  • Hardware differences are expected (±10-15% variation)
  • Ensure CPU governor is set to "performance" mode:
    echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
  • Close background applications to reduce noise
  • Disable swap for consistent measurements:
    sudo swapoff -a

System Configuration (Optional, for Maximum Reproducibility)

# Set CPU governor to performance (not powersave)
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

# Disable swap
sudo swapoff -a

# Disable turbo boost for consistent measurements
echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo

# Capture environment metadata
./scripts/capture-environment.sh > results/environment.json

Planned Benchmarks

Completed:

  • ✅ Recursive Fibonacci (fib(35) = 9,227,465) - 8 languages (C, Rust, Go, Python, Julia, Deno, Ruchy compiled, Ruchy transpiled)
  • ✅ Prime Sieve (100,000 primes = 9,592) - 8 languages

In Progress:

  • 🔄 Array Sum (10M elements)
  • 🔄 Matrix Multiply (128×128)
  • 🔄 HashMap Operations (100K inserts/lookups)
  • 🔄 File I/O (100MB, fio-based)
  • 🔄 Startup Time ("hello world")
  • 🔄 HTTP Server (wrk load testing)

Implementation Target: 8 benchmarks × 8 languages = 64 Docker containers Current Progress: 2 benchmarks × 8 languages = 16 Docker containers


Scientific Rigor

Academic Foundation (10 Citations)

  1. Blackburn et al. (OOPSLA 2007) - Benchmark evaluation
  2. Felter et al. (USENIX ATC 2015) - Container performance
  3. Gregg (2020) - BPF Performance Tools
  4. Kalibera & Jones (PLDI 2013) - JIT benchmarking
  5. Fleming & Wallace (1986) - Benchmark summarization
  6. Eyerman & Eeckhout (IEEE CL 2018) - Geometric mean critique
  7. Akinshin (2021) - Performance analysis
  8. Chen & Patterson (1994) - I/O evaluation
  9. Lockwood (2016) - I/O benchmarking
  10. Mytkowicz et al. (ASPLOS 2009) - Benchmark pitfalls

Toyota Way Principles

  • Genchi Genbutsu (Go and See): Instrumented measurement, perf stat
  • Kaizen (Continuous Improvement): Multiple metrics, enhanced I/O
  • Jidoka (Automation with Human Touch): Andon Cord quality gates

Quality Standards

Testing (EXTREME TDD)

  • Test Coverage: ≥85%
  • Mutation Score: ≥85%
  • Property-Based Testing: proptest (Rust), Hypothesis (Python)
  • Fuzz Testing: cargo-fuzz, 1M executions, zero crashes
  • Complexity: ≤15 (cyclomatic), ≤20 (cognitive)

Andon Cord (Fail-Fast)

make quality  # Must pass before merge
  ├── format   # cargo fmt, black
  ├── lint     # clippy -D warnings, pylint
  ├── test     # cargo test, pytest
  ├── coverage # ≥85% or fail
  └── mutation # ≥85% or fail

Implementation Status

  • ✅ Specification v2.0.0 (peer-reviewed, 10 citations)
  • ✅ Fibonacci benchmark (8 languages: C, Rust, Go, Python, Julia, Deno, Ruchy compiled, Ruchy transpiled)
  • ✅ Prime Sieve benchmark (8 languages: C, Rust, Go, Python, Julia, Deno, Ruchy compiled, Ruchy transpiled)
  • ✅ CLI benchmarking integration (bashrs bench)
  • ✅ Docker multi-stage builds (distroless runtime)
  • 🔄 Remaining benchmarks (6 workloads)
  • ⏳ Test framework implementation
  • ⏳ Statistical analysis tooling

Related Projects


Citations

This framework builds on peer-reviewed research:

  1. Blackburn et al. (OOPSLA 2007) - Benchmark evaluation methodology
  2. Felter et al. (USENIX ATC 2015) - Container performance analysis
  3. Kalibera & Jones (PLDI 2013) - Rigorous JIT benchmarking
  4. Gregg (2020) - Performance measurement tools
  5. Mytkowicz et al. (ASPLOS 2009) - Measurement pitfalls

Complete citations in docker-runtime-benchmarking-spec.md.


Why Docker Image Size Matters

Understanding what you're actually shipping:

Runtime Size Comparison

Key Takeaway: Compiled languages ship only your code as machine instructions (<1 MB). Interpreted languages ship the entire runtime + interpreter + libraries (177+ MB for Python+NumPy).

This is why our benchmarks show:

  • C, Rust, Ruchy, Go: <2 MB Docker images (FROM scratch)
  • Deno: 87 MB (embedded V8 runtime)
  • Python: 119 MB (interpreter + stdlib + OS)
  • Julia: 711 MB (JIT compiler + runtime)

License

MIT (see LICENSE file)


Version: 1.0.0 Last Updated: November 7, 2025 Status: Complete - 2 benchmarks × 8 languages = 16 Docker containers

About

Benchmarks Ruchy runtime in Docker

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •