🏭 AI Developer Farm

Goodhart-proof AI coding pipeline with architectural isolation.

Autonomous AI development pipeline that generates code from specifications with architectural guarantees against metric gaming. Built on LangGraph, runs locally on consumer hardware.

TL;DR: AI agents that can't cheat the tests because they never see them.

✨ Key Metrics

⏱️ 26 seconds per feature (planning → execution → verification)
💰 $0.03 per feature (vs $0.40+ for SaaS tools)
🔒 Zero metric leakage between layers (TypedDict enforced isolation)
🏠 Runs locally on GTX 1050 Ti (4GB VRAM) + 16GB RAM

🎯 The Problem: Goodhart's Law in AI Coding

"When a measure becomes a target, it ceases to be a good measure."

When AI agents see tests and acceptance criteria, they inevitably optimize code for passing tests rather than solving the problem. Commercial tools try to fight this with prompts and post-review, but prompts are disciplinary measures, not architectural guarantees. Agents are often smarter than their prompts.

Developer Farm makes metric gaming physically impossible through strict 4-layer isolation:

PLANNING        →  TaskInput (NO criteria)
                   ↓
EXECUTION       →  CodeArtifact (NO author info)
                   ↓
VERIFICATION    →  Verdict (score + reason)
                   ↓
RETRY LOOP      →  Abstract Feedback (NO rubric revealed)

Layer	Input	🚫 Restricted From
Planning	User spec, codebase	Execution results, verdicts
Execution	Task description	Acceptance criteria, tests, rubrics
Verification	Git diff, rubric	Worker ID, task description, author
Optimization	Aggregated metrics	Artifact contents, raw logs

📸 Live Demo

Pipeline execution: Planning → Execution → Verification in 26 seconds

🏗 Architecture

Core Components

LangGraph: State machine with SQLite persistence and streaming.
Ollama + Qwen2.5-Coder-3B: Local execution layer (free, 10-14 tok/s).
OpenRouter API: Planning (Qwen-Max) and Verification (Qwen-Turbo).
Git Worktrees: Isolated branches per worker (agent/{task_id}-{id}).
Reconciler: Kubernetes-style control loop for auto-recovery.

Retry Loop with Abstract Feedback

When verification fails, the system generates abstract guidance without revealing the rubric:

❌ Leaking: "Add docstring to is_palindrome() — the rubric requires it."

✅ Abstract: "Code quality needs improvement. Consider adding documentation for public APIs."

📊 Benchmarks

Task: Python Calculator Module

Spec: add, subtract, multiply, divide, division by zero handling, type hints.

Metric	Developer Farm	SaaS Competitors
Total Time	26.4s	1–3 mins
Total Cost	$0.030	$0.40 – $10+
Iterations	1 (Pass)	2–4 (Avg)
Verification Score	0.97 / 1.0	N/A (Opaque)

🚀 Quick Start

Prerequisites

Ubuntu 22.04 (Linux recommended)
Python 3.11+
NVIDIA GPU with 4GB+ VRAM (GTX 1050 Ti tested)
16GB RAM
OpenRouter API Key

Installation

# 1. Clone
git clone https://github.com/YOUR_USERNAME/developer-farm.git
cd developer-farm

# 2. Bootstrap (installs venv, Ollama, models, deps)
chmod +x bootstrap.sh
./bootstrap.sh

# 3. Setup Env
source venv/bin/activate
cp .env.example .env
nano .env  # Add your OPENROUTER_API_KEY

Usage

1. Write a spec:

mkdir -p work/my-feature
cat > work/my-feature/user-spec.md << 'SPEC'
# Feature: JWT Auth
Implement login and token refresh with FastAPI.
Constraints: Python 3.11+, RS256, rate limiting.
SPEC

2. Run the pipeline:

python -m graph.graph work/my-feature/user-spec.md

3. Review & Merge:

# View generated branch
git diff master...agent/task-001-<artifact_id>

# Merge
git merge agent/task-001-<artifact_id> --no-ff

📁 Project Structure

developer-farm/
├── bootstrap.sh              # One-click setup
├── contracts.py              # TypedDict layer boundaries (Core Security)
├── graph/
│   ├── graph.py              # StateGraph orchestration
│   ├── nodes.py              # Layer wrappers
│   └── reconciler.py         # Auto-recovery loop
├── nodes/
│   ├── planning.py           # Spec → Task (API)
│   ├── execution.py          # Task → Code (Local Ollama)
│   └── verification.py       # Code → Verdict (API)
├── utils/
│   └── git_worktree.py       # Git isolation manager
└── dashboard/                # Real-time monitoring UI

📚 Documentation

Architecture Deep Dive — How isolation works
Goodhart's Law — Why this matters
Contributing — How to help

🤝 Contributing

Contributions are welcome! We are looking for:

Support for more LLM providers (Anthropic, OpenAI)
Enhanced dashboard metrics
Kubernetes deployment manifests

⚠️ Important: Any PR must strictly maintain the 4-layer isolation. Violating isolation (e.g., passing tests to the execution agent) will be rejected.

📄 License

MIT License — Open Source & Free for Commercial Use.

Built by engineers who refuse to delegate understanding.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
dashboard		dashboard
docs		docs
graph		graph
nodes		nodes
scripts		scripts
utils		utils
work/mvp		work/mvp
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
bootstrap.sh		bootstrap.sh
contracts.py		contracts.py
test_auradb.py		test_auradb.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏭 AI Developer Farm

✨ Key Metrics

🎯 The Problem: Goodhart's Law in AI Coding

📸 Live Demo

🏗 Architecture

Core Components

Retry Loop with Abstract Feedback

📊 Benchmarks

Task: Python Calculator Module

🚀 Quick Start

Prerequisites

Installation

Usage

📁 Project Structure

📚 Documentation

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🏭 AI Developer Farm

✨ Key Metrics

🎯 The Problem: Goodhart's Law in AI Coding

📸 Live Demo

🏗 Architecture

Core Components

Retry Loop with Abstract Feedback

📊 Benchmarks

Task: Python Calculator Module

🚀 Quick Start

Prerequisites

Installation

Usage

📁 Project Structure

📚 Documentation

🤝 Contributing

📄 License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages