SOL ExecBench

HuggingFace Dataset | Leaderboard | Arxiv (coming soon)

Speed-Of-Light ExecBench is a rigorous GPU kernel evaluation and benchmarking framework built to benchmark AI-generated kernel solutions written with the variety of DSLs that NVIDIA hardware supports.

Kernels are:

Checked for various forms of reward hacking
Tested against a reference solution for numerical correctness
Timed under reproducible conditions

Leaderboard submissions are ranked based on SOL-Score: a metric that grades custom kernel performance based on the theoretical roofline of a NVIDIA B200 GPU (obtained analytically with SOLAR).

Supported kernel languages: PyTorch, Triton, CUTLASS, cuDNN, CuTe DSL, cuTile, CUDA C++.

Prerequisites

Docker with NVIDIA Container Toolkit
Hugging Face CLI (pip install huggingface-hub[cli])
NVIDIA driver version 580+

Setup

1. Download benchmark data (one-time)

./scripts/download_data.sh

This downloads the SOL-ExecBench and FlashInfer Trace datasets into data/.

2. Build and launch the Docker container

./scripts/run_docker.sh --build

This builds the image and drops you into an interactive shell inside the container. The repo's src/, tests/, and downloaded data are mounted automatically.

Evaluating a Solution

Inside the container, use the sol-execbench CLI:

# Evaluate using a problem directory (contains definition.json + workload.jsonl)
sol-execbench <problem_dir> --solution solution.json

# Or specify files explicitly
sol-execbench --definition def.json --workload wkl.jsonl --solution sol.json

Example

# From the host — build, launch, and evaluate in one command:
./scripts/run_docker.sh --build -- \
  sol-execbench examples/cute_dsl/jamba_attn_proj \
    --solution examples/cute_dsl/jamba_attn_proj/solution_cute_dsl.json

# Or from inside the container:
sol-execbench examples/cute_dsl/jamba_attn_proj \
  --solution examples/cute_dsl/jamba_attn_proj/solution_cute_dsl.json

CLI Options

Flag	Description
`--compile-timeout`	Compilation timeout in seconds (default: 120)
`--timeout`	Evaluation timeout in seconds (default: 600)
`-o, --output`	Write JSONL traces to file
`--json`	Print traces as JSON to stdout
`--lock-clocks`	Lock GPU clocks for stable benchmarks
`--keep-staging`	Preserve staging directory after run
`-v, --verbose`	Show subprocess output

Running a Dataset

Use scripts/run_dataset.py to evaluate an entire dataset (or a single problem) in batch. By default it runs the definition's reference implementation as the solution unless --solution-name is specified. Saves to ./out/{subset} by default.

# Run all problems in the benchmark.
# Auto builds solution.json from a single code file
uv run scripts/run_dataset.py data/SOL-ExecBench/benchmark --solution-name solution.py

# Run specific categories with multiple solution code files
uv run scripts/run_dataset.py data/SOL-ExecBench/benchmark --category L1 L2 --solution-name solution.json

# Run a single problem
uv run scripts/run_dataset.py data/SOL-ExecBench/benchmark/L1/my_problem

# Limit number of problems and workloads
uv run scripts/run_dataset.py data/SOL-ExecBench/benchmark --limit 5 --max-workloads 3 -o ./results

Results (traces and a summary JSON) are written to out/run_dataset/ by default (override with -o). Problems that already passed are skipped on subsequent runs unless --rerun is specified.

Problem Format

A problem directory contains:

definition.json — Kernel specification: function signature, tensor shapes, dtypes, reference implementation.
workload.jsonl — One JSON object per line, each defining input shapes, values, and tolerance thresholds.

A solution is a separate JSON file referencing source files with the kernel implementation.

See the full schema docs:

Definition — Kernel specification (function signature, tensor shapes, dtypes, reference code)
Workload — Concrete input configurations and tolerance thresholds
Solution — Source files and build specs for a kernel implementation
Trace — Evaluation output (correctness and performance results)

License

Apache-2.0. See LICENSE. Contributions require DCO sign-off — see CONTRIBUTING.md.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
docker		docker
docs		docs
examples		examples
scripts		scripts
src/sol_execbench		src/sol_execbench
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
THIRD_PARTY_NOTICES.txt		THIRD_PARTY_NOTICES.txt
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SOL ExecBench

Prerequisites

Setup

1. Download benchmark data (one-time)

2. Build and launch the Docker container

Evaluating a Solution

Example

CLI Options

Running a Dataset

Problem Format

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SOL ExecBench

Prerequisites

Setup

1. Download benchmark data (one-time)

2. Build and launch the Docker container

Evaluating a Solution

Example

CLI Options

Running a Dataset

Problem Format

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages