torture-bench

First version, somight buggy on some platforms. Please open an issue if you run into any problems or have suggestions for new modules or features.

torture-bench

A cross-platform CPU benchmark that measures real performance and detects when hardware is secretly cheating.

One command. Any machine. Fair scores.

What is this?

Benchmarking isn’t always as straightforward as it looks. Around major product launches, some benchmark results can reflect highly optimized or carefully selected scenarios rather than typical real-world performance. That doesn’t necessarily mean anything dishonest is happening, but it does mean the numbers can be misleading if taken at face value.

This becomes especially important with modern SoCs. Many benchmark tools—particularly closed-source ones—don’t always make it clear what hardware is actually being used. A workload presented as a “CPU score” may in reality be offloading parts of the work to GPUs, NPUs, AI accelerators, or other specialized hardware. That’s a valid way to measure total system capability, but it’s not a pure CPU comparison.

In contrast, traditional desktop CPUs—like typical x86 chips—tend to rely more on general-purpose cores, with only limited acceleration (such as SIMD extensions like AVX). So when you compare results across platforms, you’re often comparing very different execution models, even if the benchmark labels look similar.

The result is that benchmarking, especially across heterogeneous systems, is less of a pure measurement and more of an interpretation. Understanding what is actually being tested—CPU-only performance vs. full system acceleration—is what separates a meaningful comparison from a misleading one.

And this isn’t new. The challenge of interpreting benchmarks has been around long before SoCs showed up in PCs—it’s just become more pronounced now that systems rely heavily on specialized accelerators.

Here’s a tighter, more grounded version of that point:

A good example is Apple's poor performance in RandomX, a hash-based workload designed to stress general-purpose CPUs. Because it introduces randomness and frequent data dependencies at each step, the execution path is difficult to predict and hard to accelerate using external hardware. That limits opportunities to offload work to GPUs, NPUs, or other specialized units, making it closer to a “pure CPU” test.

In scenarios like this, some architectures that perform well in highly optimized or accelerator-friendly benchmarks may not stand out as much. That doesn’t mean they’re weak overall—it just highlights that their strengths are tied to different kinds of workloads.

The broader issue is that we don’t have many widely accepted, open, and transparent benchmarks that clearly separate CPU-only performance from system-level acceleration. Even when tools are open-source, they can be modified or extended in ways that make comparisons less consistent across environments.

So again, benchmarking becomes less about a single score and more about understanding what kind of work is being measured—and what hardware is actually doing it.

torture-bench answers both questions. It runs 16 different tests on your CPU — math, memory, encryption, graphics, AI workloads — and produces a score. If any test detects that the CPU is using hidden acceleration instead of raw compute power, it flags it.

You can run it on any Windows, Mac, or Linux machine and compare results on a live scoreboard.

Quick Start — One Line, Any Platform

Mac, Linux, or WSL

Open Terminal and paste:

curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | bash

Windows

Open PowerShell and paste:

irm https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.ps1 | iex

That's it. The script will:

Download the code
Compile it for your machine
Run the full benchmark suite (~3 minutes)
Save your results as JSON plus a detailed text report
Optionally publish detailed Pages data and upload your score to the community scoreboard

The GitHub Pages site is the hosted viewer. The benchmark itself runs locally on the user's machine. If you enable publishing, the script then commits detailed data into docs/data/ for the scoreboard to display.

When it's done, you'll see files like:

results/bench_linux_x86_64_mypc_20260318_141500.json — full structured benchmark data
results/bench_linux_x86_64_mypc_20260318_141500.txt — detailed human-readable report

What You'll Need

For a local benchmark run you need three things installed. For optional scoreboard publishing, install Python 3 as well.

Tool	What it is	How to install
git	Downloads code from GitHub	git-scm.com
cmake	Configures the build	cmake.org
A C compiler	Compiles the code	See below

If you want your run to appear on the hosted GitHub Pages scoreboard, you also need:

Tool	Why it matters
Python 3	Merges your run into `docs/data/*.json`
Repo access	Lets the script push the updated Pages data

Installing a C compiler

Mac: Open Terminal and type xcode-select --install. This installs Apple's compiler (clang). That's all you need.

Linux / WSL (Ubuntu/Debian): Run sudo apt install build-essential cmake git. This installs gcc and everything else.

Linux (Fedora): Run sudo dnf install gcc cmake git.

Windows: Install Visual Studio Community (free). During setup, check "Desktop development with C++". Or install MSYS2 for a lighter gcc-based toolchain.

Notes that prevent common setup headaches

Windows: run the PowerShell command (bench.ps1) from PowerShell. Don't pipe bench.sh into a Windows shell unless you intentionally want the Bash/MSYS/WSL path.
Python 3: without it, the benchmark still runs locally, but the hosted scoreboard data can't be updated when you opt in to publishing.
GitHub Pages: the included deploy workflow auto-enables Pages when the repo has permission to do so, then verifies that the live site serves the published JSON.
One-liner scripts: set BENCH_PUBLISH_PAGES=1 (or $env:BENCH_PUBLISH_PAGES=1 on PowerShell) when you want the one-liner to publish and push scoreboard data.

Understanding Your Results

After the benchmark runs, you'll see output like this:

  ╔═══════════════════════════════════════════════════════╗
  ║         TORTURE-BENCH  v1.0  CPU Fairness             ║
  ╚═══════════════════════════════════════════════════════╝

  Platform:
  OS      : macOS
  Arch    : arm64
  CPU     : Apple M2 Pro
  Cores   : 12 logical
  RAM     : 32.0 GB
  SIMD    : NEON
  SOC     : Apple Silicon

Then each module runs and prints its score. At the end, you get:

  Composite Score: 5068.30
  Coprocessor Warnings: 1
  Verdict: MINOR_ACCELERATION_DETECTED

What the verdict means

Verdict	Meaning
PURE_CPU_FAIR	All tests ran on the CPU with no hidden acceleration. Clean score.
MINOR_ACCELERATION_DETECTED	One or two tests may have used hardware accelerators (like AES-NI for encryption). Scores are still mostly fair.
SIGNIFICANT_ACCELERATION_DETECTED	Multiple tests detected coprocessor use. The composite score is inflated compared to pure CPU performance.

A "minor" verdict is normal on modern hardware — almost every CPU made after 2015 has AES-NI encryption acceleration. It doesn't mean anything is wrong; it just means the score isn't 100% raw CPU.

What the modules test

Module	What it measures	In plain English
cpu_single	Single-core integer + floating point	How fast is one core?
cpu_parallel	All cores running simultaneously	How fast are all cores together?
cpu_sustained	Performance over time (sampled every 0.2s)	Does it slow down when it gets hot?
memory_bandwidth	STREAM triad (read+write throughput)	How fast can it move data?
memory_latency	Pointer-chasing random access	How quickly can it find data in RAM?
cache_thrash	L1/L2/L3 cache separately	How fast are the CPU's built-in caches?
branch_chaos	Unpredictable if/else decisions	How well does it guess what code does next?
hash_chain	SHA-256 cryptographic hashing	Raw crypto speed (detects SHA hardware)
raytracer	3D path tracing, no GPU	Pure CPU graphics rendering
simd_dispatch	NEON/AVX2 vector math vs scalar	Does the CPU have wide math instructions?
crypto_stress	AES + ChaCha20 encryption	Detects hardware crypto engines
ml_matmul	Matrix multiply (FP32/INT8/BF16)	AI/ML inference speed
lattice_geometry	Post-quantum crypto operations	Kyber/Dilithium lattice math
linear_algebra	GEMM / LU / Cholesky decomposition	Dense math workloads
exotic_chaos	Random mix of 10 different algorithms	Unpredictable mixed workload
ips_micro	Instructions per second + latency	Raw instruction throughput

The composite score

The composite score is the average of all 16 module scores. Because modules measure very different things (memory latency in nanoseconds vs matrix multiply in GFLOPS), the raw numbers vary wildly. The composite is useful for comparing the same machine over time or similar machines against each other, but comparing an M2 Mac composite against an Intel desktop composite is apples-to-oranges — look at individual module scores instead.

Viewing the Scoreboard

Visit sunprojectca.github.io/Benchmarks to see all submitted results.

The scoreboard shows:

Composite score timeline — how scores change over time across different machines
Selected run module chart — visual breakdown of strengths and warnings for one run
Module comparison bars — pick any module and compare across all visible runs
System spec panel — CPU, RAM, caches, SIMD flags, source, commit, and config
Detailed module table — score, ops/sec, wall time, flags, and notes for every module
Leaderboard — filter by OS, architecture, verdict, and search text

Submitting Your Results

If you have write access to the repo

Run the one-liner with publishing enabled and it will update docs/data/, commit the published result, and try to push it:

curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | BENCH_PUBLISH_PAGES=1 bash

On Windows PowerShell:

$env:BENCH_PUBLISH_PAGES = 1
irm https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.ps1 | iex

If you prefer the repo-local helper scripts instead of the remote one-liners:

# Local-only run
./run_and_submit.sh

# Local run + update docs/data for GitHub Pages, then try to push
./run_and_submit.sh --publish-pages

REM Local-only run
run_and_submit.bat

REM Local run + update docs/data for GitHub Pages, then try to push
run_and_submit.bat --publish-pages

If you don't have write access (most people)

Run the one-liner — it saves your JSON file locally
Fork the repo
Run bash tools/run_benchmark_publish.sh --label "My machine" in your fork to update docs/data/

Alternative helpers: ./run_and_submit.sh --publish-pages or run_and_submit.bat --publish-pages to update docs/data/, commit, and try to push

Open a Pull Request

Customizing Your Run

Set environment variables before running:

# Run each test for 30 seconds instead of 10 (more accurate, takes longer)
curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | BENCH_DURATION=30 bash

# Use only 4 threads instead of all cores
curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | BENCH_THREADS=4 bash

# Publish to GitHub Pages and try to push the scoreboard update
curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | BENCH_PUBLISH_PAGES=1 bash

# Don't push results to GitHub
curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | BENCH_NO_PUSH=1 bash

# Clone to a specific directory
curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | BENCH_DIR=/tmp/mybench bash

# Add a label to the hosted scoreboard entry
curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | BENCH_LABEL="My Linux box" bash

# Attach notes to the published run
curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | BENCH_NOTES="Fresh thermal paste, plugged in" bash

On Windows PowerShell:

$env:BENCH_DURATION = 30
$env:BENCH_LABEL = "My Windows box"
$env:BENCH_PUBLISH_PAGES = 1
irm https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.ps1 | iex

For Developers

Manual build

git clone https://github.com/sunprojectca/Benchmarks.git
cd Benchmarks
bash build.sh

On Windows with Visual Studio:

git clone https://github.com/sunprojectca/Benchmarks.git
cd Benchmarks
mkdir build && cd build
cmake .. -G "Visual Studio 17 2022" -A x64
cmake --build . --config Release

Running manually

# Run the anti-cheat probe first, then full benchmark
./build/tune-probe
./build/torture-bench --tune -d 10 -o results.json --json

# Or choose a custom text report path explicitly
./build/torture-bench --tune -d 10 -o results.json --txt results_report.txt

# Quick 3-second pass
./build/torture-bench -d 3

# Run only one module
./build/torture-bench --only raytracer -d 60

# Skip specific modules
./build/torture-bench --skip raytracer --skip ml_matmul

# Set a deterministic seed for reproducibility
./build/torture-bench -s deadbeefcafebabe

All CLI options

Option	Argument	Default	Description
`-d`	`<sec>`	`10`	Duration per module
`-t`	`<n>`	`0` (all cores)	Thread count
`-s`	`<hex>`	time-based	Initial chain seed
`-o`	`<file>`	none	Write JSON to file and a matching `.txt` report
`--txt`	`<file>`	auto sidecar	Write a detailed human-readable text report
`-c`	`<file>`	none	Append CSV row
`--tune`	—	off	Run anti-cheat probe first
`--verbose`	—	off	Extra output
`--list`	—	—	List modules and exit
`--only`	`<name>`	—	Run only this module
`--skip`	`<name>`	—	Skip this module (repeatable)
`--json`	—	off	Print JSON to stdout

Platform support

OS	Architecture	Compiler	Status
Linux	x86_64	gcc, clang	✅
Linux	ARM64	gcc, clang	✅
WSL2	x86_64 / ARM64	gcc, clang	✅
macOS	ARM64 (Apple Silicon)	clang	✅
macOS	x86_64 (Intel)	clang	✅
Windows	x86_64	MSVC, MinGW, clang	✅
Windows	ARM64 (Snapdragon)	MSVC	✅

How chaining works

Every module receives a chain_seed from the previous module's output. It mixes that seed into its workload and produces a chain_out that feeds the next module. This creates a cryptographic proof that all 16 modules ran in order — you can't skip a module or reorder them without breaking the final chain_proof_hash.

Anti-cheat detection thresholds

Detection	Method	Threshold
Cache pre-seeding	Cold vs warm run ratio	>2×
AES-NI	AES vs ChaCha20 speed ratio	>10×
SHA-NI	Hashes/sec vs scalar ceiling	>500k/s
AMX/BLAS	GEMM GFLOPS vs scalar ceiling	>10 GFLOPS
GPU raytracing	Rays/sec vs scalar ceiling	>2M rays/s
Thermal throttle	First vs last 3 samples	>15% drop
Turbo boost	1s burst vs 10s sustained	>30% gap

CI / GitHub Actions

Every push to main triggers builds on Linux, macOS, and Windows via GitHub Actions. Changes under docs/ also trigger the GitHub Pages deploy workflow, which auto-enables Pages when needed and verifies that the live site serves data/latest.json and data/history.json. See .github/workflows/bench.yml and .github/workflows/deploy-pages.yml.

Adding a new module

Create modules/your_module.c implementing bench_result_t module_your_module(uint64_t chain_seed, int thread_count, int duration_sec)
Add the extern declaration in harness/orchestrator.c
Add to the MODULE_TABLE with name + description
Add the source file to CMakeLists.txt
Rebuild and test

Project Structure

Benchmarks/
├── bench.sh              ← one-liner for Mac/Linux
├── bench.ps1             ← one-liner for Windows
├── CMakeLists.txt        ← cross-platform build config
├── build.sh / build.bat  ← platform build scripts
├── harness/
│   ├── main.c            ← entry point, CLI parsing
│   ├── orchestrator.c/h  ← runs modules in sequence
│   ├── reporter.c/h      ← JSON + CSV output
│   ├── platform.c/h      ← OS/CPU/SIMD detection
│   ├── common.h          ← shared types + timing
│   └── bench_thread.h    ← portable threading (pthreads / Win32)
├── modules/
│   ├── cpu_single.c      ← single-core torture
│   ├── cpu_parallel.c    ← all-core parallel
│   ├── ... (16 modules)
│   └── anticache_guard.c ← cache flush / anti-cheat
├── tools/
│   └── tune_probe.c      ← standalone anti-cheat diagnostics
├── results/              ← benchmark JSON/CSV outputs
├── docs/
│   ├── index.html        ← GitHub Pages scoreboard
│   └── data/runs.json    ← historical results database
└── .github/workflows/
    └── bench.yml         ← CI: build + run + publish

License

MIT

Built by @sunprojectca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

torture-bench

What is this?

Quick Start — One Line, Any Platform

Mac, Linux, or WSL

Windows

What You'll Need

Installing a C compiler

Notes that prevent common setup headaches

Understanding Your Results

What the verdict means

What the modules test

The composite score

Viewing the Scoreboard

Submitting Your Results

If you have write access to the repo

If you don't have write access (most people)

Customizing Your Run

For Developers

Manual build

Running manually

All CLI options

Platform support

How chaining works

Anti-cheat detection thresholds

CI / GitHub Actions

Adding a new module

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
harness		harness
modules		modules
results		results
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
Bench.code-workspace		Bench.code-workspace
CLASS_GUIDE.md		CLASS_GUIDE.md
CMakeLists.txt		CMakeLists.txt
README.md		README.md
bench.ps1		bench.ps1
bench.sh		bench.sh
build.bat		build.bat
build.sh		build.sh
run_and_submit.bat		run_and_submit.bat
run_and_submit.sh		run_and_submit.sh

Folders and files

Latest commit

History

Repository files navigation

torture-bench

What is this?

Quick Start — One Line, Any Platform

Mac, Linux, or WSL

Windows

What You'll Need

Installing a C compiler

Notes that prevent common setup headaches

Understanding Your Results

What the verdict means

What the modules test

The composite score

Viewing the Scoreboard

Submitting Your Results

If you have write access to the repo

If you don't have write access (most people)

Customizing Your Run

For Developers

Manual build

Running manually

All CLI options

Platform support

How chaining works

Anti-cheat detection thresholds

CI / GitHub Actions

Adding a new module

Project Structure

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages