KernelWiki — Blackwell & Hopper Kernel Optimization Knowledge Base

Important

This skill is maintained as a standalone submodule of Kernel Design Agents (KDA) for easy installation.

For bug reports, feature requests, and discussions, please use the main KDA repository: https://github.com/mit-han-lab/kernel-design-agents

Knowledge cutoff: 2026-04-27. All upstream PRs, blog snapshots, and version-claim entries are anchored to upstream state on or before this date (recorded in data/refresh-cutoff.yaml). Triton claims pin to release 3.6.0 (released 2026-01-21); CUTLASS claims pin to 4.5.0 (released 2026-03-27); see data/tool-versions.yaml for all tracked tools. To advance the cutoff, run scripts/refresh_candidate_ledger.py, regenerate PR pages, and bump the cutoff date file.

A structured knowledge base of NVIDIA Blackwell (SM100, B200) and Hopper (SM90, H100) GPU kernel optimization, packaged as a Claude Code skill. The repository root is the skill directory — clone it directly into ~/.claude/skills/ and it works out of the box.

Install as a Claude Code Skill

git clone git@github.com:DongyunZou/KernelWiki.git ~/.claude/skills/KernelWiki
pip install -r ~/.claude/skills/KernelWiki/requirements.txt

That's it. The skill auto-registers (because SKILL.md lives at the clone root), and the query scripts auto-resolve the wiki root to their own directory — no environment variable required.

Smoke test:

cd ~/.claude/skills/KernelWiki
python3 scripts/query.py --tag nvfp4 --type kernel --compact
python3 scripts/get_page.py kernel-flash-attention-4 --frontmatter-only

Optional override for relocating the scripts:

export BLACKWELL_WIKI_ROOT=/path/to/KernelWiki

What's Here

2,179 PR references from NVIDIA/cutlass (32), sgl-project/sglang (645), vllm-project/vllm (833), flashinfer-ai/flashinfer (583), pytorch/pytorch (85), deepseek-ai/DeepGEMM (1) — Jan 2025 – Apr 2026
48 synthesized wiki pages — hardware features, techniques, kernel case studies, problem patterns, DSL guides, migration guides
20 community blog summaries, 11 official doc summaries, 7 competition pages (GPU Mode NVFP4 hackathon, FlashInfer MLSys 2026)
89 verbatim/extracted/derived asset bundles under artifacts/ (PR diffs, kernel files, blog code) — pinned to upstream SHAs via PROVENANCE.yaml
6 auto-generated cross-reference indices — by problem / technique / hardware feature / repo / kernel type / language
6 candidate ledgers tracking 4,222 merged PRs with include/defer/exclude decisions
Hybrid version-claim registry (data/version-claims.yaml) — per-page version_sensitive: <id> pointers + central registry, validated for bidirectional consistency

Query Tools

All tools run from the skill root, no env var needed.

Tool	Purpose
`scripts/query.py`	Unified search across 2,265 pages (keywords + filters + alias-aware)
`scripts/get_page.py`	Fetch any page by `id` or path; `--follow-sources` expands cited sources
`scripts/grep_wiki.py`	Regex text search across wiki bodies and PR pages

Examples:

python3 scripts/query.py "ping-pong attention" --limit 5
python3 scripts/query.py --tag UMMA --type hardware --compact          # alias → tcgen05
python3 scripts/query.py --architecture B200 --type kernel             # alias → sm100
python3 scripts/get_page.py kernel-flash-attention-4 --follow-sources
python3 scripts/grep_wiki.py "tcgen05\\.fence" --only wiki

Companion Docs

SKILL.md — Skill entry point: when to engage, 5 navigation paths, output contract.
references/primer.md — Topic map: hardware features, techniques, kernels, symptoms → canonical page IDs.
references/schema.md — Frontmatter schema, confidence rules, reproducibility ladder, controlled vocabulary, canonical aliases.
references/examples.md — 10 worked query patterns (user question → command sequence → synthesis).
CLAUDE.md — Extended schema + navigation reference for Claude Code.
index.md — Human-facing curated top-level index.

Architecture

Three layers (inspired by Karpathy's LLM Wiki pattern):

sources/ — Raw data. Immutable summaries of PRs, blogs, docs, contests.
wiki/ — Synthesized knowledge pages. Cross-referenced by id. All have YAML frontmatter.
queries/ — Auto-generated cross-reference indices. Do not edit manually; regenerate via scripts/generate-indices.py.

Supporting files:

data/schemas.yaml — Required/optional fields per page type
data/tags.yaml — Controlled vocabulary (80+ tags)
data/aliases.yaml — Canonical → synonym mappings
data/version-claims.yaml — Central registry for version-sensitive claims (DEC-1 hybrid)
data/tool-versions.yaml — Snapshot of tracked tool releases (Triton, CUTLASS, CUDA, PTX, …)
data/refresh-cutoff.yaml — Single source of truth for the knowledge cutoff date
candidates/ — Reviewed PR candidate ledgers (per repo)
artifacts/ — Verbatim / extracted / derived asset bundles, each with PROVENANCE.yaml

Maintenance Tooling

Script	Purpose
`scripts/validate.py`	Validate YAML frontmatter, enforce schema, check link integrity
`scripts/generate-indices.py`	Regenerate `queries/*.md` from frontmatter
`scripts/generate-pr-pages.py`	Batch-generate source PR pages from candidate ledgers

pip install -r requirements.txt
python3 scripts/validate.py            # reports 2265 files / 89 bundles / 6 ledgers, 0 errors
python3 scripts/generate-indices.py    # regenerate query indices

Quality Gates (knowledge cutoff: 2026-04-27)

2,265 files, 2,217 source IDs, 0 validation errors
89 asset bundles validated (verbatim=64, extracted=13, derived=12)
6 candidate ledgers normalized
0 broken links across all internal references
All verified wiki pages have official-doc + upstream-code evidence (enforced by evidence_basis field)
All technique/kernel/language pages have compilable code snippets (reproducibility >= snippet)
All Hopper-inclusive pages explain their blackwell_relevance
Version-sensitive claims (Triton 3.6, CUTLASS 4.5, etc.) carry version_sensitive: <id> pointers resolving to the central registry

Scope Rules

Blackwell-first — SM100 content is primary. SM90 requires explicit blackwell_relevance field.
Kernel-only — No distributed-system topics (DeepEP, DualPipe, EPLB are out of scope).
English canonical — All content in English.
First-class DSLs — CuTe DSL, CUDA C++, PTX, Triton. TileLang / cuTile / JAX-Pallas mentioned but no dedicated guides.

Repository Layout

KernelWiki/                             (= ~/.claude/skills/KernelWiki/)
├── SKILL.md                           # Skill entry point
├── README.md                          # This file
├── CLAUDE.md                          # Extended navigation + schema reference
├── index.md                           # Curated top-level index
├── requirements.txt                   # PyYAML
│
├── scripts/                           # Query tools + maintenance tooling
│   ├── query.py                       # Unified search
│   ├── get_page.py                    # Page fetcher
│   ├── grep_wiki.py                   # Regex search
│   ├── _wiki_root.py                  # Shared root resolver
│   ├── validate.py                    # Schema validator
│   ├── generate-indices.py            # Query-index generator
│   └── generate-pr-pages.py           # Batch PR page generator
│
├── references/                        # Skill knowledge layer
│   ├── primer.md                      # Topic map
│   ├── schema.md                      # Condensed schema reference
│   └── examples.md                    # 10 worked query patterns
│
├── data/                              # Schema + vocabulary
│   ├── schemas.yaml
│   ├── tags.yaml
│   └── aliases.yaml
│
├── candidates/                        # Reviewed PR ledgers (ingestion source of truth)
│   ├── cutlass.yaml
│   ├── sglang.yaml
│   ├── vllm.yaml
│   ├── flashinfer.yaml
│   ├── pytorch.yaml
│   └── deepgemm.yaml
│
├── sources/                           # Layer 1: raw data
│   ├── prs/{repo}/PR-{N}.md
│   ├── contests/{contest}/
│   ├── docs/
│   └── blogs/
│
├── wiki/                              # Layer 2: synthesized knowledge
│   ├── hardware/
│   ├── techniques/
│   ├── kernels/
│   ├── patterns/
│   ├── languages/
│   └── migration/
│
└── queries/                           # Layer 3: auto-generated indices
    ├── by-problem.md
    ├── by-technique.md
    ├── by-hardware-feature.md
    ├── by-repo.md
    ├── by-kernel-type.md
    └── by-language.md

License

Summaries and wiki syntheses in this repository are derivative works citing upstream PRs, blogs, and docs. The tooling (scripts/, references/, data/) is MIT-style; see individual files for any exceptions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KernelWiki — Blackwell & Hopper Kernel Optimization Knowledge Base

Install as a Claude Code Skill

What's Here

Query Tools

Companion Docs

Architecture

Maintenance Tooling

Quality Gates (knowledge cutoff: 2026-04-27)

Scope Rules

Repository Layout

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
artifacts		artifacts
candidates		candidates
data		data
queries		queries
references		references
scripts		scripts
sources		sources
tests		tests
wiki		wiki
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
SKILL.md		SKILL.md
index.md		index.md
requirements.txt		requirements.txt
research-contests.md		research-contests.md
research-deepseek-qwen.md		research-deepseek-qwen.md

Folders and files

Latest commit

History

Repository files navigation

KernelWiki — Blackwell & Hopper Kernel Optimization Knowledge Base

Install as a Claude Code Skill

What's Here

Query Tools

Companion Docs

Architecture

Maintenance Tooling

Quality Gates (knowledge cutoff: 2026-04-27)

Scope Rules

Repository Layout

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages