Skip to content

radon-project/benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Radon Benchmark Suite

A comprehensive, scalable benchmark framework for comparing Radon language performance against Python and Go. Features regression testing, version comparison, trend tracking, and beautiful HTML reports.

Table of Contents


Features

  • Multi-Runtime Support - Compare Radon, Python, and Go side-by-side
  • Regression Testing - Detect performance regressions across versions
  • Version Comparison - Track Radon's improvement over time
  • Trend Analysis - Visualize how the gap to Python closes
  • Beautiful Reports - Interactive HTML reports with Chart.js visualizations
  • CI Ready - Exit codes for regression detection in pipelines
  • Configurable Thresholds - Customize what counts as a regression
  • Multiple Profiles - Smoke, standard, and deep benchmark modes

Quick Start

cd benchmarks

# Run a quick smoke test (fastest)
python runner/main.py --profile smoke

# Run standard benchmarks with all runtimes
python runner/main.py --profile standard --runtimes radon python go

# Save as baseline and view HTML report
python runner/main.py --profile standard --baseline radon-v0.1.0
# Open results/latest/report.html in your browser

Or use Make:

make smoke       # Quick sanity check
make standard    # Development runs
make deep        # Release benchmarks

Installation

Requirements

  • Python 3.10+ (for running the benchmark harness)
  • Radon (the language being benchmarked)
  • Go 1.20+ (optional, for Go comparisons)

Dependencies

pip install psutil matplotlib

Verify Setup

cd benchmarks
python runner/main.py --profile smoke --scenarios simple_sum

Usage

Running Benchmarks

Basic Usage

# Run all scenarios with all runtimes
python runner/main.py --profile standard

# Run specific runtimes
python runner/main.py --profile standard --runtimes radon python

# Run specific scenarios
python runner/main.py --profile smoke --scenarios simple_sum fib_20

# Combine options
python runner/main.py --profile standard --runtimes radon python --scenarios simple_sum

Profiles

Profile Warmups Repeats Use Case
smoke 1 3 Quick sanity check (~30s)
standard 2 10 Development runs (~2-5min)
deep 5 30 Release benchmarks (~10-15min)
python runner/main.py --profile smoke      # Fast
python runner/main.py --profile standard   # Balanced
python runner/main.py --profile deep       # Thorough

Saving Baselines

Save benchmark results as a named baseline for future comparison:

# Save after a benchmark run
python runner/main.py --profile standard --baseline radon-v0.1.0

# List all saved baselines
python runner/main.py --list-baselines

# Delete a baseline
python runner/main.py --delete-baseline radon-v0.0.1

Baseline files are stored in baselines/ as JSON and include:

  • Runtime versions (Radon, Python, Go)
  • Git commit hash
  • Per-scenario timing data
  • Cross-language ratios

Comparing Versions

Compare Against Baseline

Run new benchmarks and compare against a saved baseline:

python runner/main.py --profile standard --compare radon-v0.0.1

Output:

📊 Version Comparison: 0.0.1 → 0.1.0
==================================================
✅ Overall: 8.5% faster

  Regressions: 0
  Warnings: 1
  Improvements: 3
  Stable: 1

Diff Two Baselines (No Benchmark Run)

Compare two saved baselines without running new benchmarks:

python runner/main.py --diff radon-v0.0.1 radon-v0.1.0

Matrix Comparison

Compare multiple versions at once:

python runner/main.py --matrix radon-v0.0.1 radon-v0.1.0 radon-v0.2.0

Trend Analysis

View performance trends across all saved baselines:

python runner/main.py --trends

Output:

📈 Performance Trends (3 versions)
============================================================

  RADON vs python:
    First: 6.0x slower
    Latest: 4.2x slower
    ✅ Gap closed by 30%!

Export trend data:

python runner/main.py --trends --export trends.json

CI Integration

Regression Check

Check if the latest run has regressions (for CI pipelines):

# Exit code 0 = no regressions, 1 = regressions found
python runner/main.py --check-regressions --against radon-v0.0.1

# Custom threshold (default: 15%)
python runner/main.py --check-regressions --threshold 10

GitHub Actions Example

name: Performance Regression Check

on:
  push:
    tags: ['v*']

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      
      - name: Install dependencies
        run: pip install psutil matplotlib
      
      - name: Run benchmarks
        run: |
          cd benchmarks
          python runner/main.py --profile standard --baseline current
      
      - name: Check for regressions
        run: |
          cd benchmarks
          python runner/main.py --check-regressions --against previous-release --threshold 15
      
      - name: Upload results
        uses: actions/upload-artifact@v4
        with:
          name: benchmark-results
          path: benchmarks/results/latest/

Configuration

Profiles

Edit config/profiles.json:

{
  "smoke": {
    "description": "Quick sanity check",
    "warmups": 1,
    "repeats": 3,
    "timeout_ms": 300000
  },
  "standard": {
    "description": "Default developer run",
    "warmups": 2,
    "repeats": 10,
    "timeout_ms": 300000
  },
  "deep": {
    "description": "Release-quality with more samples",
    "warmups": 5,
    "repeats": 30,
    "timeout_ms": 600000
  }
}

Scenarios

Edit config/scenarios.json:

{
  "scenarios": [
    {
      "id": "simple_sum",
      "family": "algorithm",
      "description": "Sum 1 to 1000",
      "expected_output": "500500",
      "comparable": true,
      "tags": ["loop", "arithmetic"]
    }
  ],
  "runtimes": ["radon", "python", "go"],
  "baseline_runtime": "python"
}

Thresholds

Edit config/thresholds.json:

{
  "improved_threshold": -10.0,
  "stable_threshold": 5.0,
  "warning_threshold": 15.0
}
Change Status CI Action
≤ -10% ✅ Improved Pass
-10% to +5% ⚡ Stable Pass
+5% to +15% ⚠️ Warning Pass + Alert
> +15% 🔴 Regression Fail CI

Output Formats

Results are saved to results/latest/:

File Description
results.json Raw benchmark data (machine-readable)
summary.md Markdown report (human-readable)
report.html Interactive HTML report with charts
chart_execution_time.png Bar chart of execution times
chart_performance_ratio.png Ratio chart vs Python
chart_overview.png Overview doughnut chart

HTML Report Features

  • 🌙 Dark/Light theme toggle
  • 📊 Interactive Chart.js visualizations
  • 💻 Detailed system information
  • 📋 Per-scenario results table
  • 📈 Version comparison section (when using --compare)

GitHub Pages

The benchmark suite includes a landing page (index.html) for browsing results on GitHub Pages.

Features:

  • Modern UI with Space Grotesk font
  • Latest benchmark run summary with runtime versions
  • History list auto-populated from results/history/index.json
  • Dark/Light theme toggle
  • Direct links to each run's HTML report

How History Works:

The runner/main.py script automatically:

  1. Saves each run to results/history/<timestamp>/
  2. Updates results/history/index.json with metadata

To enable GitHub Pages:

  1. Go to repo Settings → Pages
  2. Set source to "Deploy from a branch" → main / / (root)
  3. Visit https://radon-project.github.io/benchmark/

Manual History Rebuild:

If history entries are missing from index.json:

python -c "
import json
from pathlib import Path

history = Path('results/history')
entries = []
for folder in sorted(history.iterdir()):
    if not folder.is_dir(): continue
    rf = folder / 'results.json'
    if rf.exists():
        data = json.loads(rf.read_text())
        entries.append({
            'id': folder.name,
            'label': f\"{data['profile'].capitalize()} benchmark run\",
            'radon_version': data['host'].get('radon_version', '-'),
            'python_version': '.'.join(data['host'].get('python_version', '').split('.')[:2]),
            'go_version': data['host'].get('go_version') or '-',
            'scenarios': len(set(c['scenario_id'] for c in data.get('cases', [])))
        })
(history / 'index.json').write_text(json.dumps({'entries': entries}, indent=2))
print(f'Indexed {len(entries)} entries')
"

Adding New Scenarios

1. Define the Scenario

Add to config/scenarios.json:

{
  "id": "my_scenario",
  "family": "algorithm",
  "description": "Description of what it tests",
  "expected_output": "expected stdout",
  "comparable": true,
  "tags": ["cpu_bound", "loop"]
}

2. Create Fixture Files

Create matching implementations in each runtime:

fixtures/radon/my_scenario.rn

# My benchmark scenario - Radon
# Expected output: 12345

var result = 0
# ... benchmark code ...
print(result)

fixtures/python/my_scenario.py

# My benchmark scenario - Python
# Expected output: 12345

result = 0
# ... benchmark code ...
print(result)

fixtures/go/my_scenario.go

// My benchmark scenario - Go
// Expected output: 12345
package main

import "fmt"

func main() {
    result := 0
    // ... benchmark code ...
    fmt.Println(result)
}

3. Run and Verify

python runner/main.py --profile smoke --scenarios my_scenario

Architecture

benchmarks/
├── index.html             # GitHub Pages landing page
├── runner/                 # Benchmark harness
│   ├── main.py            # CLI entry point
│   ├── orchestrator.py    # Runs benchmarks across runtimes
│   ├── reporter.py        # Generates reports (JSON, MD, HTML, PNG)
│   └── comparator.py      # Version comparison engine
├── adapters/              # Runtime adapters
│   ├── base_adapter.py    # Abstract base class
│   ├── radon_adapter.py   # Radon runtime
│   ├── python_adapter.py  # Python runtime
│   └── go_adapter.py      # Go runtime
├── config/                # Configuration
│   ├── profiles.json      # Benchmark profiles
│   ├── scenarios.json     # Scenario definitions
│   └── thresholds.json    # Regression thresholds
├── fixtures/              # Benchmark code
│   ├── radon/             # Radon implementations
│   ├── python/            # Python implementations
│   └── go/                # Go implementations
├── baselines/             # Saved baselines for comparison
├── trends/                # Aggregated trend data
├── results/               # Benchmark output
│   ├── latest/            # Most recent run
│   └── history/           # Historical runs
│       └── index.json     # History registry for GitHub Pages
└── Makefile               # Convenience targets

Examples

Example 1: Track Radon Improvement Over Releases

# Before release v0.1.0
python runner/main.py --profile standard --baseline radon-v0.0.1

# ... make improvements to Radon ...

# After release v0.1.0
python runner/main.py --profile standard --baseline radon-v0.1.0 --compare radon-v0.0.1

# View trends
python runner/main.py --trends

Example 2: Quick Pre-Commit Check

# Fast smoke test before committing
python runner/main.py --profile smoke --runtimes radon python --scenarios simple_sum fib_20

Example 3: Full Release Benchmark

# Deep benchmark with all runtimes, save as release baseline
python runner/main.py --profile deep --runtimes radon python go --baseline radon-v1.0.0

# Generate comparison against previous release
python runner/main.py --diff radon-v0.9.0 radon-v1.0.0

Example 4: CI Pipeline Integration

# Run benchmarks
python runner/main.py --profile standard --baseline current-run

# Check for regressions (fails if >10% slower)
python runner/main.py --check-regressions --against previous-release --threshold 10
echo "Exit code: $?"  # 0 = pass, 1 = regression detected

Makefile Reference

The Makefile provides convenient shortcuts for common operations.

Benchmark Profiles

make smoke              # Quick sanity check (1 warmup, 3 repeats)
make standard           # Default run (2 warmups, 10 repeats)
make deep               # Release-quality (5 warmups, 30 repeats)

# Run with specific scenarios
make smoke SCENARIOS='simple_sum fib_20'

# Save result as baseline
make standard NAME=radon-v0.1.0

Runtime Selection

make radon-only         # Benchmark only Radon
make python-only        # Benchmark only Python  
make compare-all        # Benchmark all runtimes (radon, python, go)

# Runtime targets also support NAME=
make radon-only NAME=radon-only-v0.1.0

Baseline & Comparison

# Save a baseline
make baseline NAME=radon-v0.1.0

# List all saved baselines
make list-baselines

# Run benchmarks and compare against baseline
make compare BASELINE=radon-v0.0.1

# Diff two baselines (no new benchmark run)
make diff BASE1=radon-v0.0.1 BASE2=radon-v0.1.0

# View trends across all baselines
make trends

# CI regression check
make check-regressions BASELINE=radon-v0.0.1
make check-regressions BASELINE=radon-v0.0.1 THRESHOLD=10

Maintenance

make clean              # Remove results/latest/*
make clean-go           # Remove compiled Go binaries
make clean-baselines    # Remove all saved baselines
make help               # Show all available targets

Variables

Variable Description Example
NAME Baseline name for saving NAME=radon-v0.1.0
BASELINE Baseline to compare against BASELINE=radon-v0.0.1
BASE1, BASE2 Baselines for diff BASE1=v0.0.1 BASE2=v0.1.0
SCENARIOS Scenarios to run SCENARIOS='simple_sum fib_20'
THRESHOLD Regression threshold (%) THRESHOLD=10

CLI Reference

python runner/main.py [OPTIONS]

Benchmark Execution:
  --profile, -p {smoke,standard,deep}  Benchmark profile (default: standard)
  --runtimes, -r RUNTIME [...]         Runtimes to benchmark (default: all)
  --scenarios, -s SCENARIO [...]       Scenarios to run (default: all)
  --output-dir, -o DIR                 Output directory (default: results/latest)
  --no-html                            Skip HTML report generation

Baseline Management:
  --baseline, -b NAME                  Save current run as named baseline
  --list-baselines                     List all saved baselines
  --delete-baseline NAME               Delete a saved baseline

Comparison:
  --compare, -c BASELINE               Compare against a saved baseline
  --diff BASELINE1 BASELINE2           Diff two baselines (no benchmark run)
  --matrix BASELINE [...]              Multi-version matrix comparison

Trend Analysis:
  --trends                             Show performance trends
  --export FILE                        Export trend data to JSON

CI Integration:
  --check-regressions                  Check for regressions (exit code 1 if found)
  --threshold PERCENT                  Regression threshold (default: 15)
  --against BASELINE                   Baseline to compare against

License

Part of the Radon Programming Language project.

About

Radon Benchmark Suite

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors