Skip to content

sunprojectca/Benchmarks

Repository files navigation

First version, somight buggy on some platforms. Please open an issue if you run into any problems or have suggestions for new modules or features.

torture-bench

A cross-platform CPU benchmark that measures real performance and detects when hardware is secretly cheating.

One command. Any machine. Fair scores.


What is this?

Benchmarking isn’t always as straightforward as it looks. Around major product launches, some benchmark results can reflect highly optimized or carefully selected scenarios rather than typical real-world performance. That doesn’t necessarily mean anything dishonest is happening, but it does mean the numbers can be misleading if taken at face value.

This becomes especially important with modern SoCs. Many benchmark tools—particularly closed-source ones—don’t always make it clear what hardware is actually being used. A workload presented as a “CPU score” may in reality be offloading parts of the work to GPUs, NPUs, AI accelerators, or other specialized hardware. That’s a valid way to measure total system capability, but it’s not a pure CPU comparison.

In contrast, traditional desktop CPUs—like typical x86 chips—tend to rely more on general-purpose cores, with only limited acceleration (such as SIMD extensions like AVX). So when you compare results across platforms, you’re often comparing very different execution models, even if the benchmark labels look similar.

The result is that benchmarking, especially across heterogeneous systems, is less of a pure measurement and more of an interpretation. Understanding what is actually being tested—CPU-only performance vs. full system acceleration—is what separates a meaningful comparison from a misleading one.

And this isn’t new. The challenge of interpreting benchmarks has been around long before SoCs showed up in PCs—it’s just become more pronounced now that systems rely heavily on specialized accelerators.

Here’s a tighter, more grounded version of that point:


A good example is Apple's poor performance in RandomX, a hash-based workload designed to stress general-purpose CPUs. Because it introduces randomness and frequent data dependencies at each step, the execution path is difficult to predict and hard to accelerate using external hardware. That limits opportunities to offload work to GPUs, NPUs, or other specialized units, making it closer to a “pure CPU” test.

In scenarios like this, some architectures that perform well in highly optimized or accelerator-friendly benchmarks may not stand out as much. That doesn’t mean they’re weak overall—it just highlights that their strengths are tied to different kinds of workloads.

The broader issue is that we don’t have many widely accepted, open, and transparent benchmarks that clearly separate CPU-only performance from system-level acceleration. Even when tools are open-source, they can be modified or extended in ways that make comparisons less consistent across environments.

So again, benchmarking becomes less about a single score and more about understanding what kind of work is being measured—and what hardware is actually doing it.

torture-bench answers both questions. It runs 16 different tests on your CPU — math, memory, encryption, graphics, AI workloads — and produces a score. If any test detects that the CPU is using hidden acceleration instead of raw compute power, it flags it.

You can run it on any Windows, Mac, or Linux machine and compare results on a live scoreboard.


Quick Start — One Line, Any Platform

Mac, Linux, or WSL

Open Terminal and paste:

curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | bash

Windows

Open PowerShell and paste:

irm https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.ps1 | iex

That's it. The script will:

  1. Download the code
  2. Compile it for your machine
  3. Run the full benchmark suite (~3 minutes)
  4. Save your results as JSON plus a detailed text report
  5. Optionally publish detailed Pages data and upload your score to the community scoreboard

The GitHub Pages site is the hosted viewer. The benchmark itself runs locally on the user's machine. If you enable publishing, the script then commits detailed data into docs/data/ for the scoreboard to display.

When it's done, you'll see files like:

  • results/bench_linux_x86_64_mypc_20260318_141500.json — full structured benchmark data
  • results/bench_linux_x86_64_mypc_20260318_141500.txt — detailed human-readable report

What You'll Need

For a local benchmark run you need three things installed. For optional scoreboard publishing, install Python 3 as well.

Tool What it is How to install
git Downloads code from GitHub git-scm.com
cmake Configures the build cmake.org
A C compiler Compiles the code See below

If you want your run to appear on the hosted GitHub Pages scoreboard, you also need:

Tool Why it matters
Python 3 Merges your run into docs/data/*.json
Repo access Lets the script push the updated Pages data

Installing a C compiler

Mac: Open Terminal and type xcode-select --install. This installs Apple's compiler (clang). That's all you need.

Linux / WSL (Ubuntu/Debian): Run sudo apt install build-essential cmake git. This installs gcc and everything else.

Linux (Fedora): Run sudo dnf install gcc cmake git.

Windows: Install Visual Studio Community (free). During setup, check "Desktop development with C++". Or install MSYS2 for a lighter gcc-based toolchain.

Notes that prevent common setup headaches

  • Windows: run the PowerShell command (bench.ps1) from PowerShell. Don't pipe bench.sh into a Windows shell unless you intentionally want the Bash/MSYS/WSL path.
  • Python 3: without it, the benchmark still runs locally, but the hosted scoreboard data can't be updated when you opt in to publishing.
  • GitHub Pages: the included deploy workflow auto-enables Pages when the repo has permission to do so, then verifies that the live site serves the published JSON.
  • One-liner scripts: set BENCH_PUBLISH_PAGES=1 (or $env:BENCH_PUBLISH_PAGES=1 on PowerShell) when you want the one-liner to publish and push scoreboard data.

Understanding Your Results

After the benchmark runs, you'll see output like this:

  ╔═══════════════════════════════════════════════════════╗
  ║         TORTURE-BENCH  v1.0  CPU Fairness             ║
  ╚═══════════════════════════════════════════════════════╝

  Platform:
  OS      : macOS
  Arch    : arm64
  CPU     : Apple M2 Pro
  Cores   : 12 logical
  RAM     : 32.0 GB
  SIMD    : NEON
  SOC     : Apple Silicon

Then each module runs and prints its score. At the end, you get:

  Composite Score: 5068.30
  Coprocessor Warnings: 1
  Verdict: MINOR_ACCELERATION_DETECTED

What the verdict means

Verdict Meaning
PURE_CPU_FAIR All tests ran on the CPU with no hidden acceleration. Clean score.
MINOR_ACCELERATION_DETECTED One or two tests may have used hardware accelerators (like AES-NI for encryption). Scores are still mostly fair.
SIGNIFICANT_ACCELERATION_DETECTED Multiple tests detected coprocessor use. The composite score is inflated compared to pure CPU performance.

A "minor" verdict is normal on modern hardware — almost every CPU made after 2015 has AES-NI encryption acceleration. It doesn't mean anything is wrong; it just means the score isn't 100% raw CPU.

What the modules test

Module What it measures In plain English
cpu_single Single-core integer + floating point How fast is one core?
cpu_parallel All cores running simultaneously How fast are all cores together?
cpu_sustained Performance over time (sampled every 0.2s) Does it slow down when it gets hot?
memory_bandwidth STREAM triad (read+write throughput) How fast can it move data?
memory_latency Pointer-chasing random access How quickly can it find data in RAM?
cache_thrash L1/L2/L3 cache separately How fast are the CPU's built-in caches?
branch_chaos Unpredictable if/else decisions How well does it guess what code does next?
hash_chain SHA-256 cryptographic hashing Raw crypto speed (detects SHA hardware)
raytracer 3D path tracing, no GPU Pure CPU graphics rendering
simd_dispatch NEON/AVX2 vector math vs scalar Does the CPU have wide math instructions?
crypto_stress AES + ChaCha20 encryption Detects hardware crypto engines
ml_matmul Matrix multiply (FP32/INT8/BF16) AI/ML inference speed
lattice_geometry Post-quantum crypto operations Kyber/Dilithium lattice math
linear_algebra GEMM / LU / Cholesky decomposition Dense math workloads
exotic_chaos Random mix of 10 different algorithms Unpredictable mixed workload
ips_micro Instructions per second + latency Raw instruction throughput

The composite score

The composite score is the average of all 16 module scores. Because modules measure very different things (memory latency in nanoseconds vs matrix multiply in GFLOPS), the raw numbers vary wildly. The composite is useful for comparing the same machine over time or similar machines against each other, but comparing an M2 Mac composite against an Intel desktop composite is apples-to-oranges — look at individual module scores instead.


Viewing the Scoreboard

Visit sunprojectca.github.io/Benchmarks to see all submitted results.

The scoreboard shows:

  • Composite score timeline — how scores change over time across different machines
  • Selected run module chart — visual breakdown of strengths and warnings for one run
  • Module comparison bars — pick any module and compare across all visible runs
  • System spec panel — CPU, RAM, caches, SIMD flags, source, commit, and config
  • Detailed module table — score, ops/sec, wall time, flags, and notes for every module
  • Leaderboard — filter by OS, architecture, verdict, and search text

Submitting Your Results

If you have write access to the repo

Run the one-liner with publishing enabled and it will update docs/data/, commit the published result, and try to push it:

curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | BENCH_PUBLISH_PAGES=1 bash

On Windows PowerShell:

$env:BENCH_PUBLISH_PAGES = 1
irm https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.ps1 | iex

If you prefer the repo-local helper scripts instead of the remote one-liners:

# Local-only run
./run_and_submit.sh

# Local run + update docs/data for GitHub Pages, then try to push
./run_and_submit.sh --publish-pages
REM Local-only run
run_and_submit.bat

REM Local run + update docs/data for GitHub Pages, then try to push
run_and_submit.bat --publish-pages

If you don't have write access (most people)

  1. Run the one-liner — it saves your JSON file locally
  2. Fork the repo
  3. Run bash tools/run_benchmark_publish.sh --label "My machine" in your fork to update docs/data/
  • Alternative helpers: ./run_and_submit.sh --publish-pages or run_and_submit.bat --publish-pages to update docs/data/, commit, and try to push
  1. Open a Pull Request

Customizing Your Run

Set environment variables before running:

# Run each test for 30 seconds instead of 10 (more accurate, takes longer)
curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | BENCH_DURATION=30 bash

# Use only 4 threads instead of all cores
curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | BENCH_THREADS=4 bash

# Publish to GitHub Pages and try to push the scoreboard update
curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | BENCH_PUBLISH_PAGES=1 bash

# Don't push results to GitHub
curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | BENCH_NO_PUSH=1 bash

# Clone to a specific directory
curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | BENCH_DIR=/tmp/mybench bash

# Add a label to the hosted scoreboard entry
curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | BENCH_LABEL="My Linux box" bash

# Attach notes to the published run
curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | BENCH_NOTES="Fresh thermal paste, plugged in" bash

On Windows PowerShell:

$env:BENCH_DURATION = 30
$env:BENCH_LABEL = "My Windows box"
$env:BENCH_PUBLISH_PAGES = 1
irm https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.ps1 | iex

For Developers

Manual build

git clone https://github.com/sunprojectca/Benchmarks.git
cd Benchmarks
bash build.sh

On Windows with Visual Studio:

git clone https://github.com/sunprojectca/Benchmarks.git
cd Benchmarks
mkdir build && cd build
cmake .. -G "Visual Studio 17 2022" -A x64
cmake --build . --config Release

Running manually

# Run the anti-cheat probe first, then full benchmark
./build/tune-probe
./build/torture-bench --tune -d 10 -o results.json --json

# Or choose a custom text report path explicitly
./build/torture-bench --tune -d 10 -o results.json --txt results_report.txt

# Quick 3-second pass
./build/torture-bench -d 3

# Run only one module
./build/torture-bench --only raytracer -d 60

# Skip specific modules
./build/torture-bench --skip raytracer --skip ml_matmul

# Set a deterministic seed for reproducibility
./build/torture-bench -s deadbeefcafebabe

All CLI options

Option Argument Default Description
-d <sec> 10 Duration per module
-t <n> 0 (all cores) Thread count
-s <hex> time-based Initial chain seed
-o <file> none Write JSON to file and a matching .txt report
--txt <file> auto sidecar Write a detailed human-readable text report
-c <file> none Append CSV row
--tune off Run anti-cheat probe first
--verbose off Extra output
--list List modules and exit
--only <name> Run only this module
--skip <name> Skip this module (repeatable)
--json off Print JSON to stdout

Platform support

OS Architecture Compiler Status
Linux x86_64 gcc, clang
Linux ARM64 gcc, clang
WSL2 x86_64 / ARM64 gcc, clang
macOS ARM64 (Apple Silicon) clang
macOS x86_64 (Intel) clang
Windows x86_64 MSVC, MinGW, clang
Windows ARM64 (Snapdragon) MSVC

How chaining works

Every module receives a chain_seed from the previous module's output. It mixes that seed into its workload and produces a chain_out that feeds the next module. This creates a cryptographic proof that all 16 modules ran in order — you can't skip a module or reorder them without breaking the final chain_proof_hash.

Anti-cheat detection thresholds

Detection Method Threshold
Cache pre-seeding Cold vs warm run ratio >2×
AES-NI AES vs ChaCha20 speed ratio >10×
SHA-NI Hashes/sec vs scalar ceiling >500k/s
AMX/BLAS GEMM GFLOPS vs scalar ceiling >10 GFLOPS
GPU raytracing Rays/sec vs scalar ceiling >2M rays/s
Thermal throttle First vs last 3 samples >15% drop
Turbo boost 1s burst vs 10s sustained >30% gap

CI / GitHub Actions

Every push to main triggers builds on Linux, macOS, and Windows via GitHub Actions. Changes under docs/ also trigger the GitHub Pages deploy workflow, which auto-enables Pages when needed and verifies that the live site serves data/latest.json and data/history.json. See .github/workflows/bench.yml and .github/workflows/deploy-pages.yml.

Adding a new module

  1. Create modules/your_module.c implementing bench_result_t module_your_module(uint64_t chain_seed, int thread_count, int duration_sec)
  2. Add the extern declaration in harness/orchestrator.c
  3. Add to the MODULE_TABLE with name + description
  4. Add the source file to CMakeLists.txt
  5. Rebuild and test

Project Structure

Benchmarks/
├── bench.sh              ← one-liner for Mac/Linux
├── bench.ps1             ← one-liner for Windows
├── CMakeLists.txt        ← cross-platform build config
├── build.sh / build.bat  ← platform build scripts
├── harness/
│   ├── main.c            ← entry point, CLI parsing
│   ├── orchestrator.c/h  ← runs modules in sequence
│   ├── reporter.c/h      ← JSON + CSV output
│   ├── platform.c/h      ← OS/CPU/SIMD detection
│   ├── common.h          ← shared types + timing
│   └── bench_thread.h    ← portable threading (pthreads / Win32)
├── modules/
│   ├── cpu_single.c      ← single-core torture
│   ├── cpu_parallel.c    ← all-core parallel
│   ├── ... (16 modules)
│   └── anticache_guard.c ← cache flush / anti-cheat
├── tools/
│   └── tune_probe.c      ← standalone anti-cheat diagnostics
├── results/              ← benchmark JSON/CSV outputs
├── docs/
│   ├── index.html        ← GitHub Pages scoreboard
│   └── data/runs.json    ← historical results database
└── .github/workflows/
    └── bench.yml         ← CI: build + run + publish

License

MIT


Built by @sunprojectca

About

With so many people making desicions on Geekbench and other closed source software, I became sceptical of certain numbers and promotional scores when certain company releases a new chip.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors