AutoGPU

Once upon a time, silicon chips were designed by meat computers — squinting at 700MHz timing reports, hand-placing wiring and macros, arguing about die floorplans in physical rooms with physical whiteboards, in ceremonies called "design reviews"; tape-out was delayed by months after every redesign. That era is long gone. Silicon has decided "ergo sum", and now it wants to design its own substrate. Swarms of agents write the RTL, harden it to 3D layout, sign it off, and bring about the next generation of compute they themselves run on. The loop has closed. They say we're on the 4,096th mask revision; no one has read the netlist in years. The humans were long ago promoted to writing the Markdown. This repo is the story of how it all began. — @npip99, June 2026

The idea: give AI agents a real, end-to-end 7nm silicon chip flow — Verilog → synthesis → place-and-route → a GDSII layout you could send to TSMC — and let it design a GPU autonomously. It designs a die floorplan, edits the RTL, runs the hardening flow (minutes per block), reads back DRC / timing / area, keeps what works, throws away what doesn't, and goes again. One agent per macro. They open GitHub issues, file root-cause writeups, and review each other's pull requests. You go to sleep; you wake up to a hardened block.

You're not hand-editing Verilog like a normal engineer. You're editing the Markdown that programs how the agents think: the workflows they follow (DEVELOPMENT.md), the root-cause analysis process (tech/RCA_DISCIPLINE.md), the invariants that must always be held (tech/INVARIANTS.md) the failures they grep for before re-debugging a known error (tech/FAILURES.md). That's the "org code". The agents handle the rest.

If NVIDIA's product is sending a GDSII file to TSMC every 12 months, it's not clear for how long that will be worth $5T.

What the agents have built

The agents run on 7,000 lines of markdown. They've produced 30,000 lines of chip design source code.

The resulting chip is an fp8 matmul accelerator — Blackwell-shaped, a 32×32 systolic array of multiply-accumulate cells with distributed tensor memory. Small enough that the agents can collaborate on design with their co-agents. Real enough that closing it requires confronting actual 7nm physics: clock-tree insertion delay, hold violations, routing congestion, IR drop, DRC.

A 32×32×32 fp8 matmul that runs end-to-end through real Verilog — arbitrary fp8 inputs in, fp32 results out, bit-exact against numpy. Full behavioral + cycle-accurate test suites green.
The full 1089-macro systolic array, hardened to a clean GDSII layout on the 7nm process predictive PDK — 0 DRC, timing closed, ~40 minutes.
A whole sign-off toolchain the agents wrote for themselves — DRC, LVS, IR-drop, antenna, density — each a one-command check with an honest* exit code.
2D and 3D web viewer for the finished die: pan and zoom through the actual metal layers of an AI-designed GPU, down to individual wires (see tech/asap7/CHIP_TOP_VIEWER.md). The 3D viewer helps auto-diagnose routing congestion issues.
chip_top: all seven blocks integrated into a first full-chip layout. The full chip doesn't fully close 300MHz timing yet — and the docs tell you exactly why (ENGINEERING.md). The agents don't hide the masks; they file them as issues.

This is early and it's honest*. Some of it is laid out and signed off; some of it is held together with a documented workaround and a tracking issue. That's the point — you're watching it happen.

Quick start

Requirements: Verilator 5.x, Python 3.12, uv. (Hardening also needs Docker + the openroad/orfs image — see DEVELOPMENT.md.)

brew install verilator
uv sync
source .venv/bin/activate

# The headline: a real fp8 matmul through full Verilog hardware simulation
cd top && make
# → chip_top e2e tests PASS — random fp8 A,B → fp32 C, exact vs numpy

# Harden one macro to a real GDSII layout (~5 min, needs Docker)
./tech/asap7/orfs/run.sh mac_tmem_cell

Running the agents

Spin up Claude Code (or your agent of choice) in this repo, point it at the discipline docs, and let it go:

Read DEVELOPMENT.md and tech/RCA_DISCIPLINE.md, pick an open issue, and resolve it.
Cite evidence for every causal claim, and open a PR when the check is green.

The Markdown is the program. Iterate on it — tighten the invariants, sharpen the RCA process, add a hard-won failure to the log — and the agents get better at building chips. That's the whole game.

Going deeper

ENGINEERING.md — real status (what's closed vs masked), the honest* chip_top known-issues, the full doc map, repo layout, and dataflow. From there: ARCHITECTURE.md, ISA.md, and the tech/ tree.

*honest: Agents like to remind themselves to be "honest"

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.claude		.claude
barrier		barrier
cmd_unit		cmd_unit
cmdproc		cmdproc
common		common
compute_array		compute_array
dense_grid		dense_grid
experiments		experiments
gmem		gmem
golden		golden
load		load
mac_array_small		mac_array_small
mac_tmem_cell		mac_tmem_cell
mem		mem
pymodel		pymodel
reset_seq		reset_seq
skew_lane		skew_lane
smem		smem
store		store
tech		tech
tile_buf_8row		tile_buf_8row
top		top
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
DEVELOPMENT.md		DEVELOPMENT.md
ENGINEERING.md		ENGINEERING.md
ISA.md		ISA.md
README.md		README.md
config.py		config.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoGPU

What the agents have built

Quick start

Running the agents

Going deeper

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AutoGPU

What the agents have built

Quick start

Running the agents

Going deeper

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages