Coding Agent Harness

A multi-agent pipeline that turns a one-line prompt into a deployed full-stack application. Built on the Claude Agent SDK.

Three agents run in sequence — Planner, Generator, Evaluator — reading from a single taste profile that encodes your tech stack, design standards, and quality bars. The harness handles retries, cost tracking, and incremental progress so it can resume if interrupted.

How it works

prompt ──> Planner ──> Generator ──> Evaluator ──> Production
             │            │    ▲         │
             │            │    └─────────┘
             │            │     retry loop (up to 3x)
             ▼            ▼
        product_spec   feature-by-feature
        feature_list   implementation

Planner expands your brief into a product spec and prioritized feature list (8-15 features)
Generator implements one feature at a time with fresh context per feature
Evaluator tests each feature — runs pytest, checks API endpoints, verifies the frontend renders
Failed features get retried with evaluator feedback (up to 3 attempts)
Final feature deploys to Fly.io + Supabase

Quick start

# 1. Install dependencies
pip install -r requirements.txt

# 2. Set your API key
export ANTHROPIC_API_KEY="sk-..."

# 3. (Optional) Set deployment tokens
export FLY_API_TOKEN="..."
export SUPABASE_ACCESS_TOKEN="..."

# 4. Build an app
python3 main.py --name "task-manager" --brief "A task management app with kanban boards and team collaboration"

Add --skip-deploy to build locally without deploying.

System requirements

Python 3.10+
Node.js 18+
npm / npx (for Playwright MCP)

The taste profile

taste.md is the single config file that controls what every app looks like. It defines:

Tech stack — Next.js (App Router), FastAPI, Supabase, shadcn/ui, Tailwind
Design standards — information-dense layouts, dark mode, command palette, keyboard shortcuts
Anti-patterns — what to avoid (generic UI, modal overload, excessive whitespace)
API conventions — RESTful naming, Pydantic schemas, structured error responses
Database rules — RLS policies, naming conventions, indexing strategy
Code quality bars — strict TypeScript, no console.log, error boundaries, test coverage

Edit this file to match your preferences. The agents follow whatever's in it.

Configuration

All tunables live in config.py:

Setting	Default	Description
`MODEL`	`claude-opus-4-6`	Model used by all agents
`DEFAULT_BUDGET`	$200	Total budget cap per run
`PLANNER_BUDGET`	$2	Max spend for the planner
`GENERATOR_BUDGET`	$10	Max spend per feature
`EVALUATOR_BUDGET`	$3	Max spend per evaluation
`MAX_FEATURE_ATTEMPTS`	3	Retries before a feature is marked blocked
`PLANNER_MAX_TURNS`	20	Turn limit for planner
`GENERATOR_MAX_TURNS`	150	Turn limit per feature
`EVALUATOR_MAX_TURNS`	60	Turn limit per evaluation

Monitoring a build

While running:

apps/{name}/feature_list.json — feature completion status
apps/{name}/.harness/progress.txt — latest session state
logs/{name}.log — full output with cost breakdown

Resuming

If interrupted, re-run the same command. The harness reads the existing feature_list.json and picks up from the first incomplete feature.

Project structure

.
├── main.py              # Pipeline orchestrator + cost tracker
├── config.py            # Models, budgets, paths
├── taste.md             # Tech stack & design standards
├── agents/
│   ├── planner.py       # Expands brief → product spec + feature list
│   ├── generator.py     # Implements one feature per invocation
│   └── evaluator.py     # QA testing via pytest + API checks
├── prompts/
│   ├── planner.md       # System prompt for planner
│   ├── generator.md     # System prompt for generator
│   └── evaluator.md     # System prompt for evaluator
├── apps/                # Generated apps (gitignored)
└── logs/                # Build logs (gitignored)

Default tech stack

Every generated app uses:

Frontend: Next.js (App Router) + TypeScript + shadcn/ui + Tailwind CSS
Backend: FastAPI (Python) with auto-generated OpenAPI docs
Database: Supabase (PostgreSQL) with Row Level Security
Auth: Supabase Auth (email/password + OAuth)
Hosting: Fly.io
AI: Claude API for in-app chat interface

Change any of this in taste.md.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
agents		agents
prompts		prompts
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
config.py		config.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
taste.md		taste.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coding Agent Harness

How it works

Quick start

System requirements

The taste profile

Configuration

Monitoring a build

Resuming

Project structure

Default tech stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Coding Agent Harness

How it works

Quick start

System requirements

The taste profile

Configuration

Monitoring a build

Resuming

Project structure

Default tech stack

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages