Skip to content

gpuasm/autosass

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

autosass

Autoresearch for SASS-level GPU kernel optimization. Give it a compiled CUDA kernel, go to sleep, wake up to an optimized binary. No source code changes.

progress

Inspired by @karpathy/autoresearch and autokernel. Instead of rewriting kernels in Triton, autosass optimizes the compiler's own GPU assembly (SASS) — reordering instructions, setting register cache hints, and renaming registers to reduce bank conflicts. Uses gpuasm.com MCP API for disassembly/reassembly.

How It Works

Give autosass any compiled CUDA kernel (.cubin). It will:

  1. Disassemble the kernel via the gpuasm.com MCP API
  2. Find FFMA (fused multiply-add) blocks in the inner loop
  3. Reorder instructions to maximize register file cache hits
  4. Set reuse flags so hardware caches register values between instructions
  5. Rename registers via liveness analysis to break bank conflicts
  6. Reassemble with selective binary patching (only changed instructions are re-encoded)

The agent reads program.md — the optimization playbook — which describes the SM86 register file cache model, bank conflict rules, and the MCP API. It runs optimize.py, benchmarks with bench_cubin.py, keeps or reverts.

Each optimization takes ~30 seconds. Benchmarking takes ~60 seconds.

Quick Start

Requirements: NVIDIA GPU, Python 3.10+, a compiled CUDA kernel.

# Clone
git clone https://github.com/gpuasm/autosass.git
cd autosass

# Optimize a kernel
python optimize.py --cubin kernel.cubin --kernel myKernel --out optimized.cubin

# Benchmark (patches cubin into executable, compares baseline vs optimized)
python bench_cubin.py optimized.cubin

# Dry run (no GPU needed — checks MCP connectivity + analysis)
python optimize.py --cubin kernel.cubin --kernel myKernel --dry-run

Running the Agent

Spin up Claude or any coding agent in this directory:

Read program.md. Optimize the kernel in sgemm_kernel_10.cubin.

The agent will disassemble, find optimization opportunities, apply them, reassemble, and benchmark. program.md covers the microarchitecture model, the optimization strategy, and crash recovery.

References

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages