Kernel Design Agents

This repository releases the prompts, workflow documentation, and a minimal verification example for Kernel Design Agents, our MLSys 2026 FlashInfer Full-Agent track effort. The submitted kernels were produced by a fully agent-driven optimization workflow with Humanize (the best harness framework), Our Collected KernelWiki, and Nsight Compute Profile Skills.

Team: HAN Lab Kernel Mafia, leading by Dongyun Zou, advised by Ligeng Zhu
Technical report Generated Kernels

Agent workflow rule: the generated-kernel release repository is linked for provenance and final-result verification only. When running the Kernel Design Agents workflow, agents MUST NOT clone, inspect, copy from, or otherwise use that repository to obtain implementation answers.

The repository is intentionally small. It contains documentation, agent prompts, and a lightweight flashinfer-bench verification example for an externally packed solution.json. Final kernel source snapshots and the submission verification harness live in the separate submissions repository linked above. The reusable skills remain in their own repositories and are linked below.

Competition Results

Kernel Design Agents shows impressive results in all three Full-Agent Approach tracks of the MLSys 2026 Competition NVIDIA Track:

Track	Result
MoE Track	1st place
DSA Track	2nd place
GDN Track	3rd place

The released prompts follow a three-stage optimization workflow:

Each stage uses the Humanize planning and RLCR loop to turn a phase prompt into an executable optimization plan:

Skill Ablation

This ablation was run after the competition, separately from the official contest submissions, so its numbers are meant to explain skill contributions rather than exactly match the competition results above. The skill ablation highlights that Humanize is the dominant contributor: it gives the agent a much stronger plan-execute-verify structure, turning each optimization attempt into a more disciplined loop instead of a loose sequence of trials. KernelWiki broadens the kernel knowledge the agent can consult, and ncu-report-skill lets the agent read finer-grained profiler evidence instead of relying only on benchmark scores as a black box. Those two skills are useful, but the largest and most central gain comes from Humanize.

Path	Purpose
`verify.py`	Minimal example that evaluates one packed FlashInfer `solution.json` with `flashinfer-bench`.
`prompts/`	Prompt template and task-specific prompts used for the agent workflow.
`skills/`	Git submodule links to the required Claude skills.
`docs/kernel_design_agents_technical_report.pdf`	Technical report for Kernel Design Agents.
`docs/reproduction.md`	Environment, dataset, and benchmark reproduction notes.
`docs/skills.md`	Required skill/plugin installation links.

Fresh Workflow Setup

Clone this repository, install the benchmark environment, download the FlashInfer contest workloads, and prepare the agent workflow dependencies:

git clone --recurse-submodules https://github.com/mit-han-lab/kernel-design-agents.git
cd kernel-design-agents

git clone https://github.com/flashinfer-ai/flashinfer-bench.git /tmp/flashinfer-bench-main
uv sync --python 3.12

# uv.lock pins the contest-tested stack:
# flashinfer-python==0.6.8.post1, torch==2.12.0+cu132, triton==3.6.0.
# Use Python 3.12 or 3.13; Python 3.14 is not supported by all CUDA wheels.

# Required by some baselines and generated solutions that use DeepGEMM/CUTLASS/CuTe headers.
git clone https://github.com/deepseek-ai/DeepGEMM.git /tmp/DeepGEMM
uv pip install -e /tmp/DeepGEMM --no-build-isolation

uv run ./scripts/download_data.sh

Confirm that the workload dataset is visible:

uv run python -c "from flashinfer_bench import TraceSet; ts = TraceSet.from_path('data/flashinfer-trace'); print(sorted(ts.definitions)); print(sum(len(v) for v in ts.workloads.values()), 'workloads')"

Create a separate task implementation workspace from the official FlashInfer starter kit, then start the agent from there. This Kernel Design Agents repository is the prompt/workflow release; do not implement kernels directly in this repository.

mkdir -p workspaces
git clone https://github.com/flashinfer-ai/flashinfer-bench-starter-kit.git workspaces/<task-name>
cd workspaces/<task-name>
export FIB_DATASET_PATH="$OLDPWD/data/flashinfer-trace"

Then choose a task prompt under prompts/, start a fresh agent session in the task implementation workspace, and paste the selected phase prompt. The released final kernels are not part of this workflow and must not be used as implementation input.

See docs/reproduction.md for full environment notes and packed-solution verification commands.

By default, the dataset is stored under data/flashinfer-trace inside this repository. Override it with:

export FIB_DATASET_PATH=/path/to/flashinfer-trace

Agent Workflow Dependencies

The full Kernel Design Agents workflow depends on Claude Code and Codex. Install humanize as a Claude Code plugin, and install KernelWiki and ncu-report-skill as Claude skills under ~/.claude/skills/.

This repository links the two required skills as git submodules under skills/ so they are visible in the release tree. If you did not clone with --recurse-submodules, initialize them from the repository root:

git submodule update --init --recursive
mkdir -p ~/.claude/skills
ln -sfn "$PWD/skills/KernelWiki" ~/.claude/skills/KernelWiki
ln -sfn "$PWD/skills/ncu-report-skill" ~/.claude/skills/ncu-report-skill

Install humanize separately from https://github.com/PolyArch/humanize.

See docs/skills.md for installation details.

Release Boundary

Final kernels are stored only in HANLab-Kernel-Mafia-MLSys2026-Submissions as result snapshots. This link is for release provenance and final-result verification; it is not an input to the prompt-driven agent workflow. Agents MUST NOT clone or inspect the release repository while solving the tasks. Intermediate candidates, benchmark histories, and search DAGs are not part of this release. The prompts in prompts/ are meant to be run from a separate task implementation workspace created from the official FlashInfer starter kit. We do not place final kernels inside an agent starting workspace.

Running the full Kernel Design Agents workflow is not bitwise deterministic: search order, profiling noise, GPU scheduling, and model behavior can change. The external submissions repository is the source of truth for the released final kernel snapshots.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
docs		docs
prompts		prompts
scripts		scripts
skills		skills
.gitignore		.gitignore
.gitmodules		.gitmodules
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock
verify.py		verify.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kernel Design Agents

Competition Results

Skill Ablation

Contents

Fresh Workflow Setup

Agent Workflow Dependencies

Release Boundary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Kernel Design Agents

Competition Results

Skill Ablation

Contents

Fresh Workflow Setup

Agent Workflow Dependencies

Release Boundary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages