Radon Benchmark Suite

A comprehensive, scalable benchmark framework for comparing Radon language performance against Python and Go. Features regression testing, version comparison, trend tracking, and beautiful HTML reports.

Features

Multi-Runtime Support - Compare Radon, Python, and Go side-by-side
Regression Testing - Detect performance regressions across versions
Version Comparison - Track Radon's improvement over time
Trend Analysis - Visualize how the gap to Python closes
Beautiful Reports - Interactive HTML reports with Chart.js visualizations
CI Ready - Exit codes for regression detection in pipelines
Configurable Thresholds - Customize what counts as a regression
Multiple Profiles - Smoke, standard, and deep benchmark modes

Quick Start

cd benchmarks

# Run a quick smoke test (fastest)
python runner/main.py --profile smoke

# Run standard benchmarks with all runtimes
python runner/main.py --profile standard --runtimes radon python go

# Save as baseline and view HTML report
python runner/main.py --profile standard --baseline radon-v0.1.0
# Open results/latest/report.html in your browser

Or use Make:

make smoke       # Quick sanity check
make standard    # Development runs
make deep        # Release benchmarks

Installation

Requirements

Python 3.10+ (for running the benchmark harness)
Radon (the language being benchmarked)
Go 1.20+ (optional, for Go comparisons)

Dependencies

pip install psutil matplotlib

Verify Setup

cd benchmarks
python runner/main.py --profile smoke --scenarios simple_sum

Usage

Running Benchmarks

Basic Usage

# Run all scenarios with all runtimes
python runner/main.py --profile standard

# Run specific runtimes
python runner/main.py --profile standard --runtimes radon python

# Run specific scenarios
python runner/main.py --profile smoke --scenarios simple_sum fib_20

# Combine options
python runner/main.py --profile standard --runtimes radon python --scenarios simple_sum

Profiles

Profile	Warmups	Repeats	Use Case
`smoke`	1	3	Quick sanity check (~30s)
`standard`	2	10	Development runs (~2-5min)
`deep`	5	30	Release benchmarks (~10-15min)

python runner/main.py --profile smoke      # Fast
python runner/main.py --profile standard   # Balanced
python runner/main.py --profile deep       # Thorough

Saving Baselines

Save benchmark results as a named baseline for future comparison:

# Save after a benchmark run
python runner/main.py --profile standard --baseline radon-v0.1.0

# List all saved baselines
python runner/main.py --list-baselines

# Delete a baseline
python runner/main.py --delete-baseline radon-v0.0.1

Baseline files are stored in baselines/ as JSON and include:

Runtime versions (Radon, Python, Go)
Git commit hash
Per-scenario timing data
Cross-language ratios

Comparing Versions

Compare Against Baseline

Run new benchmarks and compare against a saved baseline:

python runner/main.py --profile standard --compare radon-v0.0.1

Output:

📊 Version Comparison: 0.0.1 → 0.1.0
==================================================
✅ Overall: 8.5% faster

  Regressions: 0
  Warnings: 1
  Improvements: 3
  Stable: 1

Diff Two Baselines (No Benchmark Run)

Compare two saved baselines without running new benchmarks:

python runner/main.py --diff radon-v0.0.1 radon-v0.1.0

Matrix Comparison

Compare multiple versions at once:

python runner/main.py --matrix radon-v0.0.1 radon-v0.1.0 radon-v0.2.0

Trend Analysis

View performance trends across all saved baselines:

python runner/main.py --trends

Output:

📈 Performance Trends (3 versions)
============================================================

  RADON vs python:
    First: 6.0x slower
    Latest: 4.2x slower
    ✅ Gap closed by 30%!

Export trend data:

python runner/main.py --trends --export trends.json

CI Integration

Regression Check

Check if the latest run has regressions (for CI pipelines):

# Exit code 0 = no regressions, 1 = regressions found
python runner/main.py --check-regressions --against radon-v0.0.1

# Custom threshold (default: 15%)
python runner/main.py --check-regressions --threshold 10

GitHub Actions Example

name: Performance Regression Check

on:
  push:
    tags: ['v*']

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      
      - name: Install dependencies
        run: pip install psutil matplotlib
      
      - name: Run benchmarks
        run: |
          cd benchmarks
          python runner/main.py --profile standard --baseline current
      
      - name: Check for regressions
        run: |
          cd benchmarks
          python runner/main.py --check-regressions --against previous-release --threshold 15
      
      - name: Upload results
        uses: actions/upload-artifact@v4
        with:
          name: benchmark-results
          path: benchmarks/results/latest/

Configuration

Profiles

Edit config/profiles.json:

{
  "smoke": {
    "description": "Quick sanity check",
    "warmups": 1,
    "repeats": 3,
    "timeout_ms": 300000
  },
  "standard": {
    "description": "Default developer run",
    "warmups": 2,
    "repeats": 10,
    "timeout_ms": 300000
  },
  "deep": {
    "description": "Release-quality with more samples",
    "warmups": 5,
    "repeats": 30,
    "timeout_ms": 600000
  }
}

Scenarios

Edit config/scenarios.json:

{
  "scenarios": [
    {
      "id": "simple_sum",
      "family": "algorithm",
      "description": "Sum 1 to 1000",
      "expected_output": "500500",
      "comparable": true,
      "tags": ["loop", "arithmetic"]
    }
  ],
  "runtimes": ["radon", "python", "go"],
  "baseline_runtime": "python"
}

Thresholds

Edit config/thresholds.json:

{
  "improved_threshold": -10.0,
  "stable_threshold": 5.0,
  "warning_threshold": 15.0
}

Change	Status	CI Action
≤ -10%	✅ Improved	Pass
-10% to +5%	⚡ Stable	Pass
+5% to +15%	⚠️ Warning	Pass + Alert
> +15%	🔴 Regression	Fail CI

Output Formats

Results are saved to results/latest/:

File	Description
`results.json`	Raw benchmark data (machine-readable)
`summary.md`	Markdown report (human-readable)
`report.html`	Interactive HTML report with charts
`chart_execution_time.png`	Bar chart of execution times
`chart_performance_ratio.png`	Ratio chart vs Python
`chart_overview.png`	Overview doughnut chart

HTML Report Features

🌙 Dark/Light theme toggle
📊 Interactive Chart.js visualizations
💻 Detailed system information
📋 Per-scenario results table
📈 Version comparison section (when using --compare)

GitHub Pages

The benchmark suite includes a landing page (index.html) for browsing results on GitHub Pages.

Features:

Modern UI with Space Grotesk font
Latest benchmark run summary with runtime versions
History list auto-populated from results/history/index.json
Dark/Light theme toggle
Direct links to each run's HTML report

How History Works:

The runner/main.py script automatically:

Saves each run to results/history/<timestamp>/
Updates results/history/index.json with metadata

To enable GitHub Pages:

Go to repo Settings → Pages
Set source to "Deploy from a branch" → main / / (root)
Visit https://radon-project.github.io/benchmark/

Manual History Rebuild:

If history entries are missing from index.json:

python -c "
import json
from pathlib import Path

history = Path('results/history')
entries = []
for folder in sorted(history.iterdir()):
    if not folder.is_dir(): continue
    rf = folder / 'results.json'
    if rf.exists():
        data = json.loads(rf.read_text())
        entries.append({
            'id': folder.name,
            'label': f\"{data['profile'].capitalize()} benchmark run\",
            'radon_version': data['host'].get('radon_version', '-'),
            'python_version': '.'.join(data['host'].get('python_version', '').split('.')[:2]),
            'go_version': data['host'].get('go_version') or '-',
            'scenarios': len(set(c['scenario_id'] for c in data.get('cases', [])))
        })
(history / 'index.json').write_text(json.dumps({'entries': entries}, indent=2))
print(f'Indexed {len(entries)} entries')
"

Adding New Scenarios

1. Define the Scenario

Add to config/scenarios.json:

{
  "id": "my_scenario",
  "family": "algorithm",
  "description": "Description of what it tests",
  "expected_output": "expected stdout",
  "comparable": true,
  "tags": ["cpu_bound", "loop"]
}

2. Create Fixture Files

Create matching implementations in each runtime:

fixtures/radon/my_scenario.rn

# My benchmark scenario - Radon
# Expected output: 12345

var result = 0
# ... benchmark code ...
print(result)

fixtures/python/my_scenario.py

# My benchmark scenario - Python
# Expected output: 12345

result = 0
# ... benchmark code ...
print(result)

fixtures/go/my_scenario.go

// My benchmark scenario - Go
// Expected output: 12345
package main

import "fmt"

func main() {
    result := 0
    // ... benchmark code ...
    fmt.Println(result)
}

3. Run and Verify

python runner/main.py --profile smoke --scenarios my_scenario

Architecture

benchmarks/
├── index.html             # GitHub Pages landing page
├── runner/                 # Benchmark harness
│   ├── main.py            # CLI entry point
│   ├── orchestrator.py    # Runs benchmarks across runtimes
│   ├── reporter.py        # Generates reports (JSON, MD, HTML, PNG)
│   └── comparator.py      # Version comparison engine
├── adapters/              # Runtime adapters
│   ├── base_adapter.py    # Abstract base class
│   ├── radon_adapter.py   # Radon runtime
│   ├── python_adapter.py  # Python runtime
│   └── go_adapter.py      # Go runtime
├── config/                # Configuration
│   ├── profiles.json      # Benchmark profiles
│   ├── scenarios.json     # Scenario definitions
│   └── thresholds.json    # Regression thresholds
├── fixtures/              # Benchmark code
│   ├── radon/             # Radon implementations
│   ├── python/            # Python implementations
│   └── go/                # Go implementations
├── baselines/             # Saved baselines for comparison
├── trends/                # Aggregated trend data
├── results/               # Benchmark output
│   ├── latest/            # Most recent run
│   └── history/           # Historical runs
│       └── index.json     # History registry for GitHub Pages
└── Makefile               # Convenience targets

Examples

Example 1: Track Radon Improvement Over Releases

# Before release v0.1.0
python runner/main.py --profile standard --baseline radon-v0.0.1

# ... make improvements to Radon ...

# After release v0.1.0
python runner/main.py --profile standard --baseline radon-v0.1.0 --compare radon-v0.0.1

# View trends
python runner/main.py --trends

Example 2: Quick Pre-Commit Check

# Fast smoke test before committing
python runner/main.py --profile smoke --runtimes radon python --scenarios simple_sum fib_20

Example 3: Full Release Benchmark

# Deep benchmark with all runtimes, save as release baseline
python runner/main.py --profile deep --runtimes radon python go --baseline radon-v1.0.0

# Generate comparison against previous release
python runner/main.py --diff radon-v0.9.0 radon-v1.0.0

Example 4: CI Pipeline Integration

# Run benchmarks
python runner/main.py --profile standard --baseline current-run

# Check for regressions (fails if >10% slower)
python runner/main.py --check-regressions --against previous-release --threshold 10
echo "Exit code: $?"  # 0 = pass, 1 = regression detected

Makefile Reference

The Makefile provides convenient shortcuts for common operations.

Benchmark Profiles

make smoke              # Quick sanity check (1 warmup, 3 repeats)
make standard           # Default run (2 warmups, 10 repeats)
make deep               # Release-quality (5 warmups, 30 repeats)

# Run with specific scenarios
make smoke SCENARIOS='simple_sum fib_20'

# Save result as baseline
make standard NAME=radon-v0.1.0

Runtime Selection

make radon-only         # Benchmark only Radon
make python-only        # Benchmark only Python  
make compare-all        # Benchmark all runtimes (radon, python, go)

# Runtime targets also support NAME=
make radon-only NAME=radon-only-v0.1.0

Baseline & Comparison

# Save a baseline
make baseline NAME=radon-v0.1.0

# List all saved baselines
make list-baselines

# Run benchmarks and compare against baseline
make compare BASELINE=radon-v0.0.1

# Diff two baselines (no new benchmark run)
make diff BASE1=radon-v0.0.1 BASE2=radon-v0.1.0

# View trends across all baselines
make trends

# CI regression check
make check-regressions BASELINE=radon-v0.0.1
make check-regressions BASELINE=radon-v0.0.1 THRESHOLD=10

Maintenance

make clean              # Remove results/latest/*
make clean-go           # Remove compiled Go binaries
make clean-baselines    # Remove all saved baselines
make help               # Show all available targets

Variables

Variable	Description	Example
`NAME`	Baseline name for saving	`NAME=radon-v0.1.0`
`BASELINE`	Baseline to compare against	`BASELINE=radon-v0.0.1`
`BASE1`, `BASE2`	Baselines for diff	`BASE1=v0.0.1 BASE2=v0.1.0`
`SCENARIOS`	Scenarios to run	`SCENARIOS='simple_sum fib_20'`
`THRESHOLD`	Regression threshold (%)	`THRESHOLD=10`

CLI Reference

python runner/main.py [OPTIONS]

Benchmark Execution:
  --profile, -p {smoke,standard,deep}  Benchmark profile (default: standard)
  --runtimes, -r RUNTIME [...]         Runtimes to benchmark (default: all)
  --scenarios, -s SCENARIO [...]       Scenarios to run (default: all)
  --output-dir, -o DIR                 Output directory (default: results/latest)
  --no-html                            Skip HTML report generation

Baseline Management:
  --baseline, -b NAME                  Save current run as named baseline
  --list-baselines                     List all saved baselines
  --delete-baseline NAME               Delete a saved baseline

Comparison:
  --compare, -c BASELINE               Compare against a saved baseline
  --diff BASELINE1 BASELINE2           Diff two baselines (no benchmark run)
  --matrix BASELINE [...]              Multi-version matrix comparison

Trend Analysis:
  --trends                             Show performance trends
  --export FILE                        Export trend data to JSON

CI Integration:
  --check-regressions                  Check for regressions (exit code 1 if found)
  --threshold PERCENT                  Regression threshold (default: 15)
  --against BASELINE                   Baseline to compare against

License

Part of the Radon Programming Language project.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
adapters		adapters
baselines		baselines
config		config
fixtures		fixtures
results		results
runner		runner
trends		trends
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
index.html		index.html

Folders and files

Latest commit

History

Repository files navigation

Radon Benchmark Suite

Table of Contents

Features

Quick Start

Installation

Requirements

Dependencies

Verify Setup

Usage

Running Benchmarks

Basic Usage

Profiles

Saving Baselines

Comparing Versions

Compare Against Baseline

Diff Two Baselines (No Benchmark Run)

Matrix Comparison

Trend Analysis

CI Integration

Regression Check

GitHub Actions Example

Configuration

Profiles

Scenarios

Thresholds

Output Formats

HTML Report Features

GitHub Pages

Adding New Scenarios

1. Define the Scenario

2. Create Fixture Files

3. Run and Verify

Architecture

Examples

Example 1: Track Radon Improvement Over Releases

Example 2: Quick Pre-Commit Check

Example 3: Full Release Benchmark

Example 4: CI Pipeline Integration

Makefile Reference

Benchmark Profiles

Runtime Selection

Baseline & Comparison

Maintenance

Variables

CLI Reference

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages