A multi-agent pipeline that turns a one-line prompt into a deployed full-stack application. Built on the Claude Agent SDK.
Three agents run in sequence — Planner, Generator, Evaluator — reading from a single taste profile that encodes your tech stack, design standards, and quality bars. The harness handles retries, cost tracking, and incremental progress so it can resume if interrupted.
prompt ──> Planner ──> Generator ──> Evaluator ──> Production
│ │ ▲ │
│ │ └─────────┘
│ │ retry loop (up to 3x)
▼ ▼
product_spec feature-by-feature
feature_list implementation
- Planner expands your brief into a product spec and prioritized feature list (8-15 features)
- Generator implements one feature at a time with fresh context per feature
- Evaluator tests each feature — runs pytest, checks API endpoints, verifies the frontend renders
- Failed features get retried with evaluator feedback (up to 3 attempts)
- Final feature deploys to Fly.io + Supabase
# 1. Install dependencies
pip install -r requirements.txt
# 2. Set your API key
export ANTHROPIC_API_KEY="sk-..."
# 3. (Optional) Set deployment tokens
export FLY_API_TOKEN="..."
export SUPABASE_ACCESS_TOKEN="..."
# 4. Build an app
python3 main.py --name "task-manager" --brief "A task management app with kanban boards and team collaboration"Add --skip-deploy to build locally without deploying.
- Python 3.10+
- Node.js 18+
- npm / npx (for Playwright MCP)
taste.md is the single config file that controls what every app looks like. It defines:
- Tech stack — Next.js (App Router), FastAPI, Supabase, shadcn/ui, Tailwind
- Design standards — information-dense layouts, dark mode, command palette, keyboard shortcuts
- Anti-patterns — what to avoid (generic UI, modal overload, excessive whitespace)
- API conventions — RESTful naming, Pydantic schemas, structured error responses
- Database rules — RLS policies, naming conventions, indexing strategy
- Code quality bars — strict TypeScript, no
console.log, error boundaries, test coverage
Edit this file to match your preferences. The agents follow whatever's in it.
All tunables live in config.py:
| Setting | Default | Description |
|---|---|---|
MODEL |
claude-opus-4-6 |
Model used by all agents |
DEFAULT_BUDGET |
$200 | Total budget cap per run |
PLANNER_BUDGET |
$2 | Max spend for the planner |
GENERATOR_BUDGET |
$10 | Max spend per feature |
EVALUATOR_BUDGET |
$3 | Max spend per evaluation |
MAX_FEATURE_ATTEMPTS |
3 | Retries before a feature is marked blocked |
PLANNER_MAX_TURNS |
20 | Turn limit for planner |
GENERATOR_MAX_TURNS |
150 | Turn limit per feature |
EVALUATOR_MAX_TURNS |
60 | Turn limit per evaluation |
While running:
apps/{name}/feature_list.json— feature completion statusapps/{name}/.harness/progress.txt— latest session statelogs/{name}.log— full output with cost breakdown
If interrupted, re-run the same command. The harness reads the existing feature_list.json and picks up from the first incomplete feature.
.
├── main.py # Pipeline orchestrator + cost tracker
├── config.py # Models, budgets, paths
├── taste.md # Tech stack & design standards
├── agents/
│ ├── planner.py # Expands brief → product spec + feature list
│ ├── generator.py # Implements one feature per invocation
│ └── evaluator.py # QA testing via pytest + API checks
├── prompts/
│ ├── planner.md # System prompt for planner
│ ├── generator.md # System prompt for generator
│ └── evaluator.md # System prompt for evaluator
├── apps/ # Generated apps (gitignored)
└── logs/ # Build logs (gitignored)
Every generated app uses:
- Frontend: Next.js (App Router) + TypeScript + shadcn/ui + Tailwind CSS
- Backend: FastAPI (Python) with auto-generated OpenAPI docs
- Database: Supabase (PostgreSQL) with Row Level Security
- Auth: Supabase Auth (email/password + OAuth)
- Hosting: Fly.io
- AI: Claude API for in-app chat interface
Change any of this in taste.md.
MIT