Skip to content

srbdp/coding-agent-harness

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Coding Agent Harness

A multi-agent pipeline that turns a one-line prompt into a deployed full-stack application. Built on the Claude Agent SDK.

Three agents run in sequence — Planner, Generator, Evaluator — reading from a single taste profile that encodes your tech stack, design standards, and quality bars. The harness handles retries, cost tracking, and incremental progress so it can resume if interrupted.

How it works

prompt ──> Planner ──> Generator ──> Evaluator ──> Production
             │            │    ▲         │
             │            │    └─────────┘
             │            │     retry loop (up to 3x)
             ▼            ▼
        product_spec   feature-by-feature
        feature_list   implementation
  1. Planner expands your brief into a product spec and prioritized feature list (8-15 features)
  2. Generator implements one feature at a time with fresh context per feature
  3. Evaluator tests each feature — runs pytest, checks API endpoints, verifies the frontend renders
  4. Failed features get retried with evaluator feedback (up to 3 attempts)
  5. Final feature deploys to Fly.io + Supabase

Quick start

# 1. Install dependencies
pip install -r requirements.txt

# 2. Set your API key
export ANTHROPIC_API_KEY="sk-..."

# 3. (Optional) Set deployment tokens
export FLY_API_TOKEN="..."
export SUPABASE_ACCESS_TOKEN="..."

# 4. Build an app
python3 main.py --name "task-manager" --brief "A task management app with kanban boards and team collaboration"

Add --skip-deploy to build locally without deploying.

System requirements

  • Python 3.10+
  • Node.js 18+
  • npm / npx (for Playwright MCP)

The taste profile

taste.md is the single config file that controls what every app looks like. It defines:

  • Tech stack — Next.js (App Router), FastAPI, Supabase, shadcn/ui, Tailwind
  • Design standards — information-dense layouts, dark mode, command palette, keyboard shortcuts
  • Anti-patterns — what to avoid (generic UI, modal overload, excessive whitespace)
  • API conventions — RESTful naming, Pydantic schemas, structured error responses
  • Database rules — RLS policies, naming conventions, indexing strategy
  • Code quality bars — strict TypeScript, no console.log, error boundaries, test coverage

Edit this file to match your preferences. The agents follow whatever's in it.

Configuration

All tunables live in config.py:

Setting Default Description
MODEL claude-opus-4-6 Model used by all agents
DEFAULT_BUDGET $200 Total budget cap per run
PLANNER_BUDGET $2 Max spend for the planner
GENERATOR_BUDGET $10 Max spend per feature
EVALUATOR_BUDGET $3 Max spend per evaluation
MAX_FEATURE_ATTEMPTS 3 Retries before a feature is marked blocked
PLANNER_MAX_TURNS 20 Turn limit for planner
GENERATOR_MAX_TURNS 150 Turn limit per feature
EVALUATOR_MAX_TURNS 60 Turn limit per evaluation

Monitoring a build

While running:

  • apps/{name}/feature_list.json — feature completion status
  • apps/{name}/.harness/progress.txt — latest session state
  • logs/{name}.log — full output with cost breakdown

Resuming

If interrupted, re-run the same command. The harness reads the existing feature_list.json and picks up from the first incomplete feature.

Project structure

.
├── main.py              # Pipeline orchestrator + cost tracker
├── config.py            # Models, budgets, paths
├── taste.md             # Tech stack & design standards
├── agents/
│   ├── planner.py       # Expands brief → product spec + feature list
│   ├── generator.py     # Implements one feature per invocation
│   └── evaluator.py     # QA testing via pytest + API checks
├── prompts/
│   ├── planner.md       # System prompt for planner
│   ├── generator.md     # System prompt for generator
│   └── evaluator.md     # System prompt for evaluator
├── apps/                # Generated apps (gitignored)
└── logs/                # Build logs (gitignored)

Default tech stack

Every generated app uses:

  • Frontend: Next.js (App Router) + TypeScript + shadcn/ui + Tailwind CSS
  • Backend: FastAPI (Python) with auto-generated OpenAPI docs
  • Database: Supabase (PostgreSQL) with Row Level Security
  • Auth: Supabase Auth (email/password + OAuth)
  • Hosting: Fly.io
  • AI: Claude API for in-app chat interface

Change any of this in taste.md.

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages