First version, somight buggy on some platforms. Please open an issue if you run into any problems or have suggestions for new modules or features.
A cross-platform CPU benchmark that measures real performance and detects when hardware is secretly cheating.
One command. Any machine. Fair scores.
Benchmarking isn’t always as straightforward as it looks. Around major product launches, some benchmark results can reflect highly optimized or carefully selected scenarios rather than typical real-world performance. That doesn’t necessarily mean anything dishonest is happening, but it does mean the numbers can be misleading if taken at face value.
This becomes especially important with modern SoCs. Many benchmark tools—particularly closed-source ones—don’t always make it clear what hardware is actually being used. A workload presented as a “CPU score” may in reality be offloading parts of the work to GPUs, NPUs, AI accelerators, or other specialized hardware. That’s a valid way to measure total system capability, but it’s not a pure CPU comparison.
In contrast, traditional desktop CPUs—like typical x86 chips—tend to rely more on general-purpose cores, with only limited acceleration (such as SIMD extensions like AVX). So when you compare results across platforms, you’re often comparing very different execution models, even if the benchmark labels look similar.
The result is that benchmarking, especially across heterogeneous systems, is less of a pure measurement and more of an interpretation. Understanding what is actually being tested—CPU-only performance vs. full system acceleration—is what separates a meaningful comparison from a misleading one.
And this isn’t new. The challenge of interpreting benchmarks has been around long before SoCs showed up in PCs—it’s just become more pronounced now that systems rely heavily on specialized accelerators.
Here’s a tighter, more grounded version of that point:
A good example is Apple's poor performance in RandomX, a hash-based workload designed to stress general-purpose CPUs. Because it introduces randomness and frequent data dependencies at each step, the execution path is difficult to predict and hard to accelerate using external hardware. That limits opportunities to offload work to GPUs, NPUs, or other specialized units, making it closer to a “pure CPU” test.
In scenarios like this, some architectures that perform well in highly optimized or accelerator-friendly benchmarks may not stand out as much. That doesn’t mean they’re weak overall—it just highlights that their strengths are tied to different kinds of workloads.
The broader issue is that we don’t have many widely accepted, open, and transparent benchmarks that clearly separate CPU-only performance from system-level acceleration. Even when tools are open-source, they can be modified or extended in ways that make comparisons less consistent across environments.
So again, benchmarking becomes less about a single score and more about understanding what kind of work is being measured—and what hardware is actually doing it.
torture-bench answers both questions. It runs 16 different tests on your CPU — math, memory, encryption, graphics, AI workloads — and produces a score. If any test detects that the CPU is using hidden acceleration instead of raw compute power, it flags it.
You can run it on any Windows, Mac, or Linux machine and compare results on a live scoreboard.
Open Terminal and paste:
curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | bash
Open PowerShell and paste:
irm https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.ps1 | iex
That's it. The script will:
- Download the code
- Compile it for your machine
- Run the full benchmark suite (~3 minutes)
- Save your results as JSON plus a detailed text report
- Optionally publish detailed Pages data and upload your score to the community scoreboard
The GitHub Pages site is the hosted viewer. The benchmark itself runs locally on the user's machine. If you enable publishing, the script then commits detailed data into
docs/data/for the scoreboard to display.
When it's done, you'll see files like:
results/bench_linux_x86_64_mypc_20260318_141500.json— full structured benchmark dataresults/bench_linux_x86_64_mypc_20260318_141500.txt— detailed human-readable report
For a local benchmark run you need three things installed. For optional scoreboard publishing, install Python 3 as well.
| Tool | What it is | How to install |
|---|---|---|
| git | Downloads code from GitHub | git-scm.com |
| cmake | Configures the build | cmake.org |
| A C compiler | Compiles the code | See below |
If you want your run to appear on the hosted GitHub Pages scoreboard, you also need:
| Tool | Why it matters |
|---|---|
| Python 3 | Merges your run into docs/data/*.json |
| Repo access | Lets the script push the updated Pages data |
Mac: Open Terminal and type xcode-select --install. This installs Apple's compiler (clang). That's all you need.
Linux / WSL (Ubuntu/Debian): Run sudo apt install build-essential cmake git. This installs gcc and everything else.
Linux (Fedora): Run sudo dnf install gcc cmake git.
Windows: Install Visual Studio Community (free). During setup, check "Desktop development with C++". Or install MSYS2 for a lighter gcc-based toolchain.
- Windows: run the PowerShell command (
bench.ps1) from PowerShell. Don't pipebench.shinto a Windows shell unless you intentionally want the Bash/MSYS/WSL path. - Python 3: without it, the benchmark still runs locally, but the hosted scoreboard data can't be updated when you opt in to publishing.
- GitHub Pages: the included deploy workflow auto-enables Pages when the repo has permission to do so, then verifies that the live site serves the published JSON.
- One-liner scripts: set
BENCH_PUBLISH_PAGES=1(or$env:BENCH_PUBLISH_PAGES=1on PowerShell) when you want the one-liner to publish and push scoreboard data.
After the benchmark runs, you'll see output like this:
╔═══════════════════════════════════════════════════════╗
║ TORTURE-BENCH v1.0 CPU Fairness ║
╚═══════════════════════════════════════════════════════╝
Platform:
OS : macOS
Arch : arm64
CPU : Apple M2 Pro
Cores : 12 logical
RAM : 32.0 GB
SIMD : NEON
SOC : Apple Silicon
Then each module runs and prints its score. At the end, you get:
Composite Score: 5068.30
Coprocessor Warnings: 1
Verdict: MINOR_ACCELERATION_DETECTED
| Verdict | Meaning |
|---|---|
| PURE_CPU_FAIR | All tests ran on the CPU with no hidden acceleration. Clean score. |
| MINOR_ACCELERATION_DETECTED | One or two tests may have used hardware accelerators (like AES-NI for encryption). Scores are still mostly fair. |
| SIGNIFICANT_ACCELERATION_DETECTED | Multiple tests detected coprocessor use. The composite score is inflated compared to pure CPU performance. |
A "minor" verdict is normal on modern hardware — almost every CPU made after 2015 has AES-NI encryption acceleration. It doesn't mean anything is wrong; it just means the score isn't 100% raw CPU.
| Module | What it measures | In plain English |
|---|---|---|
| cpu_single | Single-core integer + floating point | How fast is one core? |
| cpu_parallel | All cores running simultaneously | How fast are all cores together? |
| cpu_sustained | Performance over time (sampled every 0.2s) | Does it slow down when it gets hot? |
| memory_bandwidth | STREAM triad (read+write throughput) | How fast can it move data? |
| memory_latency | Pointer-chasing random access | How quickly can it find data in RAM? |
| cache_thrash | L1/L2/L3 cache separately | How fast are the CPU's built-in caches? |
| branch_chaos | Unpredictable if/else decisions | How well does it guess what code does next? |
| hash_chain | SHA-256 cryptographic hashing | Raw crypto speed (detects SHA hardware) |
| raytracer | 3D path tracing, no GPU | Pure CPU graphics rendering |
| simd_dispatch | NEON/AVX2 vector math vs scalar | Does the CPU have wide math instructions? |
| crypto_stress | AES + ChaCha20 encryption | Detects hardware crypto engines |
| ml_matmul | Matrix multiply (FP32/INT8/BF16) | AI/ML inference speed |
| lattice_geometry | Post-quantum crypto operations | Kyber/Dilithium lattice math |
| linear_algebra | GEMM / LU / Cholesky decomposition | Dense math workloads |
| exotic_chaos | Random mix of 10 different algorithms | Unpredictable mixed workload |
| ips_micro | Instructions per second + latency | Raw instruction throughput |
The composite score is the average of all 16 module scores. Because modules measure very different things (memory latency in nanoseconds vs matrix multiply in GFLOPS), the raw numbers vary wildly. The composite is useful for comparing the same machine over time or similar machines against each other, but comparing an M2 Mac composite against an Intel desktop composite is apples-to-oranges — look at individual module scores instead.
Visit sunprojectca.github.io/Benchmarks to see all submitted results.
The scoreboard shows:
- Composite score timeline — how scores change over time across different machines
- Selected run module chart — visual breakdown of strengths and warnings for one run
- Module comparison bars — pick any module and compare across all visible runs
- System spec panel — CPU, RAM, caches, SIMD flags, source, commit, and config
- Detailed module table — score, ops/sec, wall time, flags, and notes for every module
- Leaderboard — filter by OS, architecture, verdict, and search text
Run the one-liner with publishing enabled and it will update docs/data/, commit the published result, and try to push it:
curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | BENCH_PUBLISH_PAGES=1 bashOn Windows PowerShell:
$env:BENCH_PUBLISH_PAGES = 1
irm https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.ps1 | iexIf you prefer the repo-local helper scripts instead of the remote one-liners:
# Local-only run
./run_and_submit.sh
# Local run + update docs/data for GitHub Pages, then try to push
./run_and_submit.sh --publish-pagesREM Local-only run
run_and_submit.bat
REM Local run + update docs/data for GitHub Pages, then try to push
run_and_submit.bat --publish-pages- Run the one-liner — it saves your JSON file locally
- Fork the repo
- Run
bash tools/run_benchmark_publish.sh --label "My machine"in your fork to updatedocs/data/
- Alternative helpers:
./run_and_submit.sh --publish-pagesorrun_and_submit.bat --publish-pagesto updatedocs/data/, commit, and try to push
- Open a Pull Request
Set environment variables before running:
# Run each test for 30 seconds instead of 10 (more accurate, takes longer)
curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | BENCH_DURATION=30 bash
# Use only 4 threads instead of all cores
curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | BENCH_THREADS=4 bash
# Publish to GitHub Pages and try to push the scoreboard update
curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | BENCH_PUBLISH_PAGES=1 bash
# Don't push results to GitHub
curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | BENCH_NO_PUSH=1 bash
# Clone to a specific directory
curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | BENCH_DIR=/tmp/mybench bash
# Add a label to the hosted scoreboard entry
curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | BENCH_LABEL="My Linux box" bash
# Attach notes to the published run
curl -sL https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.sh | BENCH_NOTES="Fresh thermal paste, plugged in" bashOn Windows PowerShell:
$env:BENCH_DURATION = 30
$env:BENCH_LABEL = "My Windows box"
$env:BENCH_PUBLISH_PAGES = 1
irm https://raw.githubusercontent.com/sunprojectca/Benchmarks/main/bench.ps1 | iexgit clone https://github.com/sunprojectca/Benchmarks.git
cd Benchmarks
bash build.shOn Windows with Visual Studio:
git clone https://github.com/sunprojectca/Benchmarks.git
cd Benchmarks
mkdir build && cd build
cmake .. -G "Visual Studio 17 2022" -A x64
cmake --build . --config Release# Run the anti-cheat probe first, then full benchmark
./build/tune-probe
./build/torture-bench --tune -d 10 -o results.json --json
# Or choose a custom text report path explicitly
./build/torture-bench --tune -d 10 -o results.json --txt results_report.txt
# Quick 3-second pass
./build/torture-bench -d 3
# Run only one module
./build/torture-bench --only raytracer -d 60
# Skip specific modules
./build/torture-bench --skip raytracer --skip ml_matmul
# Set a deterministic seed for reproducibility
./build/torture-bench -s deadbeefcafebabe| Option | Argument | Default | Description |
|---|---|---|---|
-d |
<sec> |
10 |
Duration per module |
-t |
<n> |
0 (all cores) |
Thread count |
-s |
<hex> |
time-based | Initial chain seed |
-o |
<file> |
none | Write JSON to file and a matching .txt report |
--txt |
<file> |
auto sidecar | Write a detailed human-readable text report |
-c |
<file> |
none | Append CSV row |
--tune |
— | off | Run anti-cheat probe first |
--verbose |
— | off | Extra output |
--list |
— | — | List modules and exit |
--only |
<name> |
— | Run only this module |
--skip |
<name> |
— | Skip this module (repeatable) |
--json |
— | off | Print JSON to stdout |
| OS | Architecture | Compiler | Status |
|---|---|---|---|
| Linux | x86_64 | gcc, clang | ✅ |
| Linux | ARM64 | gcc, clang | ✅ |
| WSL2 | x86_64 / ARM64 | gcc, clang | ✅ |
| macOS | ARM64 (Apple Silicon) | clang | ✅ |
| macOS | x86_64 (Intel) | clang | ✅ |
| Windows | x86_64 | MSVC, MinGW, clang | ✅ |
| Windows | ARM64 (Snapdragon) | MSVC | ✅ |
Every module receives a chain_seed from the previous module's output. It mixes that seed into its workload and produces a chain_out that feeds the next module. This creates a cryptographic proof that all 16 modules ran in order — you can't skip a module or reorder them without breaking the final chain_proof_hash.
| Detection | Method | Threshold |
|---|---|---|
| Cache pre-seeding | Cold vs warm run ratio | >2× |
| AES-NI | AES vs ChaCha20 speed ratio | >10× |
| SHA-NI | Hashes/sec vs scalar ceiling | >500k/s |
| AMX/BLAS | GEMM GFLOPS vs scalar ceiling | >10 GFLOPS |
| GPU raytracing | Rays/sec vs scalar ceiling | >2M rays/s |
| Thermal throttle | First vs last 3 samples | >15% drop |
| Turbo boost | 1s burst vs 10s sustained | >30% gap |
Every push to main triggers builds on Linux, macOS, and Windows via GitHub Actions. Changes under docs/ also trigger the GitHub Pages deploy workflow, which auto-enables Pages when needed and verifies that the live site serves data/latest.json and data/history.json. See .github/workflows/bench.yml and .github/workflows/deploy-pages.yml.
- Create
modules/your_module.cimplementingbench_result_t module_your_module(uint64_t chain_seed, int thread_count, int duration_sec) - Add the extern declaration in
harness/orchestrator.c - Add to the
MODULE_TABLEwith name + description - Add the source file to
CMakeLists.txt - Rebuild and test
Benchmarks/
├── bench.sh ← one-liner for Mac/Linux
├── bench.ps1 ← one-liner for Windows
├── CMakeLists.txt ← cross-platform build config
├── build.sh / build.bat ← platform build scripts
├── harness/
│ ├── main.c ← entry point, CLI parsing
│ ├── orchestrator.c/h ← runs modules in sequence
│ ├── reporter.c/h ← JSON + CSV output
│ ├── platform.c/h ← OS/CPU/SIMD detection
│ ├── common.h ← shared types + timing
│ └── bench_thread.h ← portable threading (pthreads / Win32)
├── modules/
│ ├── cpu_single.c ← single-core torture
│ ├── cpu_parallel.c ← all-core parallel
│ ├── ... (16 modules)
│ └── anticache_guard.c ← cache flush / anti-cheat
├── tools/
│ └── tune_probe.c ← standalone anti-cheat diagnostics
├── results/ ← benchmark JSON/CSV outputs
├── docs/
│ ├── index.html ← GitHub Pages scoreboard
│ └── data/runs.json ← historical results database
└── .github/workflows/
└── bench.yml ← CI: build + run + publish
MIT
Built by @sunprojectca