Comprehensive performance benchmarks for the Parallax GPU offload compiler.
This directory contains:
- micro/ - Microbenchmarks for individual algorithms
- apps/ - Real-world application benchmarks
- results/ - Benchmark results and performance data
- src/ - Shared benchmark infrastructure
# Build all benchmarks
mkdir build && cd build
cmake ..
make -j$(nproc)
# Run microbenchmarks
./micro/bench_for_each
./micro/bench_transform
./micro/bench_performance
# Run application benchmarks
./apps/bench_image_processingbench_for_each- std::for_each performance across dataset sizesbench_transform- std::transform performancebench_performance- Comprehensive performance suitebench_correctness- Correctness validation
bench_image_processing- Image filtering, transformationsbench_scientific_computing- Monte Carlo, numerical methods
Latest results on NVIDIA GeForce GTX 980M (December 2025):
| Dataset Size | Time (ms) | Throughput (M elem/s) |
|---|---|---|
| 1K | 5.57 | 0.18 |
| 10K | 0.27 | 36.77 |
| 100K | 0.44 | 228.06 |
| 1M | 1.34 | 744.36 |
| Dataset Size | Time (ms) | Throughput (M elem/s) |
|---|---|---|
| 1K | 2.61 | 0.38 |
| 10K | 0.27 | 36.63 |
| 100K | 0.41 | 243.33 |
| 1M | 1.37 | 732.34 |
Key Findings:
- Excellent scaling on large datasets (>100K elements)
- 700+ million elements/second on 1M dataset
- Overhead dominant on small datasets (<10K)
- Linear performance scaling with dataset size
# Run single benchmark
./micro/bench_for_each
# Run with specific size
./micro/bench_performance --size 1000000
# Run with iterations
./micro/bench_performance --iterations 100
# Save results
./micro/bench_performance --output results/my_results.json# Compare with CPU baseline
./micro/bench_performance --compare-cpu| Dataset Size | Expected Behavior | Use Case |
|---|---|---|
| < 1K | CPU likely faster | Don't use GPU |
| 1K - 10K | Break-even point | Context dependent |
| 10K - 100K | 10-100x speedup | Good for GPU |
| > 100K | 100-1000x speedup | Excellent for GPU |
- Vulkan 1.2+ capable GPU
- 4GB+ GPU memory recommended
- CPU with AVX2 for baseline comparisons