gstack-auto

Reinforcement learning applied to the development process itself. Not at the token level — at the product level.

You describe what you want to build. gstack-auto brainstorms the spec with you, then spawns parallel implementations — each one planning, reviewing adversarially, building, testing, and fixing bugs autonomously. The best one wins. Then it does it again. Each round gets better.

  /office-hours (or product-spec.md)
        |
        v
  +-- ROUND LOOP (1..R) --------------------------------+
  |                                                      |
  |   +-- N PARALLEL RUNS ----------------------------+  |
  |   |  Each run:                                    |  |
  |   |    CEO plan > adversarial review              |  |
  |   |    eng plan > adversarial review              |  |
  |   |    design plan > adversarial review           |  |
  |   |    eng plan v2 > adversarial review           |  |
  |   |    implement > ship > QA > fix                |  |
  |   |  Run A biases toward code quality             |  |
  |   |  Run B biases toward UX polish                |  |
  |   |  Run C biases toward robustness               |  |
  |   +-----------------------------------------------+  |
  |          |                                           |
  |          v                                           |
  |   SELECT WINNER (by score, bugs, fix cycles)         |
  |   Commit to git. Feed into next round.               |
  |                                                      |
  +------------------------------------------------------+
        |
        v
  RESULTS: dashboard + email + git history

How Rounds Work

Round 1 builds your app from scratch. Three parallel attempts, scored independently. The best one wins.

Round 2 takes the winner's code and improves it. Three more parallel attempts, each starting from that codebase. The best improvement wins.

Round 3 does it again. Scores go up. Bugs go down. Each round's winner is committed to git with a full score card — you can see the progression.

Round 1: Best 7.2/10 (run-b)  ==============        72%
Round 2: Best 8.4/10 (run-a)  ================       84%  (+1.2)
Round 3: Best 9.1/10 (run-c)  ==================     91%  (+0.7)

The 13 Phases

Each build goes through:

CEO Plan — Product vision, MVP scope, risk assessment
Adversarial Review — Dual Claude+Codex challenge of the plan
Eng Plan — Architecture, file plan, test strategy
Adversarial Review — Cross-model engineering challenge
Design Plan — Visual hierarchy, typography, spacing, color
Adversarial Review — Design critique
Eng Plan v2 — Reconcile design constraints with architecture
Adversarial Review — Final pre-implementation challenge
Implement — Write the code in the style of a legendary engineer
Ship — Validate, lint, security check
QA — Headless browser testing with screenshot evidence
- Bug fix sub-loop (up to 3 cycles: plan fix → implement → re-QA)
Document — Generate README, CHANGELOG
Score — Rate on 6 dimensions, write a retrospective

Every build runs in its own git worktree. Completely isolated. They can't see each other.

Getting Started

1. Prerequisites

Install Conductor (the AI development environment) and gstack (the skill system). In Claude Code:

Install gstack: run git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp__claude-in-chrome__* tools, and lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /ship, /browse, /qa, /setup-browser-cookies, /retro. Then ask the user if they also want to add gstack to the current project so teammates get it.

2. Start Mission Control

Mission Control is a Flask web service that manages your builds. Start it and open the browser:

python3 scripts/setup-server.py
# opens at http://localhost:8080

Sign in with Google OAuth. First-time users land on the waitlist — an admin must approve your account before you can access the app.

3. Configure your build

Option A (recommended): Use Office Hours in Mission Control. A Claude-powered chat helps you define what to build and produces a structured product spec. When you're done, click "Complete" to generate the spec and proceed to the build handoff.

Option B (power users): Edit product-spec.md directly with what you want built. Be specific:

# Product Spec

## What It Does
A personal bookshelf app where I can track books I've read
and want to read, with a simple rating system.

## Core Interaction
User opens the page and sees two lists: "Want to Read" and "Read."
They can add books by title, move them between lists, and rate
books they've read from 1-5 stars.

## Constraints
- Pure HTML/CSS/JS, no frameworks
- Data stored in localStorage
- Must work on mobile

Vague specs produce vague software.

Email (optional): Copy .env.example to .env, add your Gmail App Password, and update email.to in pipeline/config.yml. Run python3 scripts/send-email.py --probe to verify. Or set email.method: "file-only" and skip it — results are always saved to disk.

4. Run

Copy the handoff prompt from Mission Control into Conductor and run the pipeline:

Run the gstack-auto pipeline with N=3

Go get coffee. Come back in 30 minutes.

5. Iterate

Once a build completes, Mission Control shows two iteration paths from the build detail page:

Iterate — opens a new Office Hours session pre-seeded with the parent build's context and scores. Full conversational spec refinement before the next build.
Quick Fix — type a one-liner description of what to change. Skips Office Hours and goes straight to the handoff prompt.

Both paths carry the parent build's output into the next round so agents improve the existing code rather than starting from scratch.

What You Get

Mission Control at localhost:8080 — build history, detail views with phase progress and scoring breakdowns, iterate and quick-fix buttons, and Fly.io deploy trigger.

Email (if configured) — ASCII score bar charts, architectural narratives, code highlights, and git branch names for each run.

Git history — each round's winner committed with score card and feature summary. git log tells the story of how the code evolved.

The winning build lives in output/. Open its index.html and see what you got.

Configuration

pipeline/config.yml:

parallel_runs: 3              # How many builds to run simultaneously
rounds: 1                     # Sequential rounds (each improves on the last)
auto_accept_winner: true       # Auto-select best score (false = pick via dashboard)
max_fix_cycles: 3              # Max bug-fix attempts before forced scoring
style: "marlinspike"           # Engineering style (see pipeline/styles/)
adversarial_reviews: [02, 08]  # Which phases get dual Claude+Codex review
follow_up_budget: 3            # Mid-run questions per round (0 = fully autonomous)
email:
  to: "you@gmail.com"
  method: "smtp"               # or "file-only" to skip email

Available styles: carmack, antirez, abramov, metz, holowaychuk, majors, marlinspike. Each encodes concrete coding principles that guide implementation, review, and scoring. Or leave it blank for the default.

Validation

bash tests/validate-pipeline.sh

All checks should pass before you run the pipeline.

Built with Conductor and gstack.

Special thanks to Garry Tan for building gstack — the skill system that makes this entire pipeline possible.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
app		app
migrations		migrations
output		output
pipeline		pipeline
scripts		scripts
templates		templates
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.gitkeep		.gitkeep
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
TODOS.md		TODOS.md
VERSION		VERSION
dashboard.html		dashboard.html
fly.toml		fly.toml
index.html		index.html
requirements.txt		requirements.txt
setup.html		setup.html
style.css		style.css
wsgi.py		wsgi.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gstack-auto

How Rounds Work

The 13 Phases

Getting Started

1. Prerequisites

2. Start Mission Control

3. Configure your build

4. Run

5. Iterate

What You Get

Configuration

Validation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

gstack-auto

How Rounds Work

The 13 Phases

Getting Started

1. Prerequisites

2. Start Mission Control

3. Configure your build

4. Run

5. Iterate

What You Get

Configuration

Validation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages