Skip to content

NVIDIA/SOL-ExecBench

SOL ExecBench

Speed-Of-Light ExecBench is a rigorous GPU kernel evaluation and benchmarking framework built to benchmark AI-generated kernel solutions written with the variety of DSLs that NVIDIA hardware supports.

Kernels are:

  • Checked for various forms of reward hacking
  • Tested against a reference solution for numerical correctness
  • Timed under reproducible conditions

Leaderboard submissions are ranked based on SOL-Score: a metric that grades custom kernel performance based on the theoretical roofline of a NVIDIA B200 GPU (obtained analytically with SOLAR).

Supported kernel languages: PyTorch, Triton, CUTLASS, cuDNN, CuTe DSL, cuTile, CUDA C++.

Prerequisites

Setup

1. Download benchmark data (one-time)

./scripts/download_data.sh

This downloads the SOL-ExecBench and FlashInfer Trace datasets into data/.

2. Build and launch the Docker container

./scripts/run_docker.sh --build

This builds the image and drops you into an interactive shell inside the container. The repo's src/, tests/, and downloaded data are mounted automatically.

Evaluating a Solution

Inside the container, use the sol-execbench CLI:

# Evaluate using a problem directory (contains definition.json + workload.jsonl)
sol-execbench <problem_dir> --solution solution.json

# Or specify files explicitly
sol-execbench --definition def.json --workload wkl.jsonl --solution sol.json

Example

# From the host — build, launch, and evaluate in one command:
./scripts/run_docker.sh --build -- \
  sol-execbench examples/cute_dsl/jamba_attn_proj \
    --solution examples/cute_dsl/jamba_attn_proj/solution_cute_dsl.json

# Or from inside the container:
sol-execbench examples/cute_dsl/jamba_attn_proj \
  --solution examples/cute_dsl/jamba_attn_proj/solution_cute_dsl.json

CLI Options

Flag Description
--compile-timeout Compilation timeout in seconds (default: 120)
--timeout Evaluation timeout in seconds (default: 600)
-o, --output Write JSONL traces to file
--json Print traces as JSON to stdout
--lock-clocks Lock GPU clocks for stable benchmarks
--keep-staging Preserve staging directory after run
-v, --verbose Show subprocess output

Running a Dataset

Use scripts/run_dataset.py to evaluate an entire dataset (or a single problem) in batch. By default it runs the definition's reference implementation as the solution unless --solution-name is specified. Saves to ./out/{subset} by default.

# Run all problems in the benchmark.
# Auto builds solution.json from a single code file
uv run scripts/run_dataset.py data/SOL-ExecBench/benchmark --solution-name solution.py

# Run specific categories with multiple solution code files
uv run scripts/run_dataset.py data/SOL-ExecBench/benchmark --category L1 L2 --solution-name solution.json

# Run a single problem
uv run scripts/run_dataset.py data/SOL-ExecBench/benchmark/L1/my_problem

# Limit number of problems and workloads
uv run scripts/run_dataset.py data/SOL-ExecBench/benchmark --limit 5 --max-workloads 3 -o ./results

Results (traces and a summary JSON) are written to out/run_dataset/ by default (override with -o). Problems that already passed are skipped on subsequent runs unless --rerun is specified.

Problem Format

A problem directory contains:

  • definition.json — Kernel specification: function signature, tensor shapes, dtypes, reference implementation.
  • workload.jsonl — One JSON object per line, each defining input shapes, values, and tolerance thresholds.

A solution is a separate JSON file referencing source files with the kernel implementation.

See the full schema docs:

  • Definition — Kernel specification (function signature, tensor shapes, dtypes, reference code)
  • Workload — Concrete input configurations and tolerance thresholds
  • Solution — Source files and build specs for a kernel implementation
  • Trace — Evaluation output (correctness and performance results)

License

Apache-2.0. See LICENSE. Contributions require DCO sign-off — see CONTRIBUTING.md.

About

A benchmark of real-world DL kernel problems

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors