Autonomous multi-agent development orchestrator that drives Claude Code through TDD loops with security scanning and conventional commits.
You describe what to build. Forge builds it — test-first, secure, and documented.
- TDD Enforcement — Red-Green-Refactor cycle tracked and enforced. Tests are written before code, every time.
- Multi-Agent Teams — 6 specialized roles (Architect, Implementer, Tester, Reviewer, Security, Documenter) with automatic task-to-agent matching
- Quality Gates with Auto-Fix — 5-gate pipeline (tests, coverage, security, lint, commit) with blocking/warning severity. When gates fail, Claude automatically attempts to fix the issues before giving up.
- Security Scanning — Secret detection, SAST, dependency audit on every iteration
- Conventional Commits — Automatic commit per TDD phase (
test:,feat:,refactor:) - Live TUI Dashboard — Real-time Claude output, cost tracking, TDD pipeline visualization, quality gate status. Press
dfor detailed dashboard overlay,qfor graceful quit. - Cost Tracking — Real-time cost per task, per execution, and per phase with averages
- Smart Task Ordering — Tasks sorted by priority (critical → low) and dependency graph depth (foundational tasks first)
- Auto Task Decomposition — Large complex tasks are automatically split into TDD-friendly subtasks
- Session Continuity — Unified TDD prompt preserves Claude's context across Red/Green/Refactor phases. Resume loops across restarts with persistent state.
- Spec-Kit Integration — Use GitHub's spec-kit for planning, Forge for execution
- Human-in-the-Loop — When a task fails repeatedly, Forge pauses and prompts you: retry with guidance, defer to later, skip permanently, or abort. Your guidance is injected into the next attempt.
- Deferred Tasks — Skip a task for now and come back to it later. Deferred tasks run after all other pending work completes.
- Circuit Breaker — Stagnation detection with auto-recovery (Nygard pattern)
- Stream Output — Real-time structured JSON streaming from Claude CLI for live TUI feedback
# Install
npm install -g @redgreen-labs/forge-cli
# Run inside your existing project directory
cd my-project
forge init
# Import requirements
forge import requirements.md
# Start the development loop
forge run --iterations 20
# Check progress
forge statusForge supports three task formats, auto-detected in priority order:
Use spec-kit for the planning phase, then Forge for execution:
# Generate specs with spec-kit
npx spec-kit specify
npx spec-kit plan
npx spec-kit tasks
# Forge auto-detects specs/tasks.md
forge runForge reads from specs/:
| File | Purpose |
|---|---|
specs/tasks.md |
Task list with T-IDs, phases, dependencies |
specs/constitution.md |
Project principles — injected into agent prompts |
specs/spec.md |
Detailed requirements — injected into agent prompts |
specs/plan.md |
Architecture decisions — injected into agent prompts |
Spec-kit task format:
## Phase 1: Setup (Shared Infrastructure)
- [ ] T001 [P] Initialize project structure
- [ ] T002 Configure CI pipeline (depends on T001)
## Phase 2: User Story 1 - Authentication (Priority: P1)
- [ ] T003 [US1] Implement login endpoint
- Returns JWT token on success
- Returns 401 on invalid credentialsMarkers: [P] = parallelizable, [US1] = user story ref, (depends on T001) = dependency.
forge import requirements.md # Parses to .forge/prd.json
forge run# Place a tasks.md in .forge/
forge run| Command | Description |
|---|---|
forge init |
Initialize project, auto-detect workspaces and language |
forge import <file> |
Import PRD, scan for existing implementations, auto-decompose |
forge run |
Start the autonomous development loop |
forge status |
Show session progress and quality metrics |
forge report |
Generate a project health report |
forge decompose |
Decompose large tasks into smaller TDD-friendly subtasks |
forge agents |
List available agent roles and their tools |
-n, --name <name> Project name
-i, --interactive Guided PRD creation
-f, --force Overwrite existing .forge directory
--no-scan Skip workspace auto-detection
-v, --verbose Show detailed scan output
--no-scan Skip codebase scan for existing implementations
--no-decompose Skip automatic decomposition of large tasks
-v, --verbose Show detailed scan output
-n, --iterations <n> Maximum iterations (default: 50)
--resume Resume from previous run, skipping completed tasks
--no-tui Disable live TUI (plain text output)
-v, --verbose Show detailed executor output
--solo Single agent mode (no team rotation)
--dry-run Simulate execution without running Claude
--json Output as JSON
-w, --watch Refresh status every few seconds
--interval <seconds> Watch interval in seconds (default: 3)
-f, --format <type> Output format: terminal, html, or json (default: terminal)
--threshold <n> Complexity threshold 1-10 (tasks above this are decomposed)
--max-subtasks <n> Max subtasks per parent task
--dry-run Show which tasks would be decomposed without calling Claude
-v, --verbose Show detailed output
Each iteration follows this pipeline:
Select Task → TDD Red (write failing test)
→ TDD Green (implement to pass)
→ TDD Refactor (clean up)
→ Security Scan
→ Quality Gates ──→ Pass → Commit → Next Task
└→ Fail → Auto-Fix → Re-run Gates (up to 3 retries)
Tasks are selected by priority (critical first) and dependency graph depth (foundational tasks that unblock others run first). The loop stops when all tasks are complete, max iterations reached, or the circuit breaker trips (repeated failures).
The live dashboard shows real-time progress:
╭──────────────── FORGE Development Loop ─────────────────╮
│ │
├──────────────────────────────────────────────────────────┤
│ Phase: IMPLEMENTING Tasks: 3/10 Iter: 5 │
│ Elapsed: 04:32 Cost: $0.45 Commits: 8 Files: 3 │
│ │
│ [████████████░░] 85% │
│ Task: Implement user authentication │
│ │
│ ✓Red → ✓Green → ●Refactor → ○Gates (2 cycles) │
│ Gates: ✓tests ✓security ✓lint ✗coverage │
├──────────────────────────────────────────────────────────┤
│ Claude Output │
│ ⚡ Writing src/auth/login.ts... │
│ Running npm test -- --reporter verbose │
│ Tests 42 passed (42) │
├──────────────────────────────────────────────────────────┤
│ [d] Dashboard [q] Quit ✓tests ✓sec ✓lint ✗cov │
╰──────────────────────────────────────────────────────────╯
Press d to toggle the dashboard overlay with cost breakdown, coverage, security findings, and code quality metrics.
When a task fails maxTaskFailures times (default 3), Forge pauses and shows an interactive prompt:
╭─ Task Failed (3x) ──────────────────────────────────────╮
│ │
│ Automated UI tests (integration_test package) │
│ Green phase failed: Process timed out (exit code 143) │
│ │
│ What would you like to do? │
│ │
│ ▸ Retry with guidance — provide a hint to help │
│ Skip for now — defer to later │
│ Skip permanently — won't retry │
│ Abort session — stop forge │
│ │
│ Use ↑↓ arrows and Enter to select │
╰──────────────────────────────────────────────────────────╯
- Retry with guidance: Type a hint (e.g., "Use widget tests instead of integration tests"). Your guidance is injected into the next attempt's prompt.
- Skip for now (defer): The task moves to the back of the queue. Other tasks run first, then it retries with a fresh failure count.
- Skip permanently: The task won't be retried (current default behavior).
- Abort session: Stops Forge immediately.
In non-interactive mode (--no-tui), tasks are auto-skipped after maxTaskFailures (no prompt shown).
.forge/forge.config.json:
{
"maxIterations": 50,
"maxCallsPerHour": 100,
"timeoutMinutes": 15,
"tdd": {
"enabled": true,
"requireFailingTestFirst": true,
"commitPerPhase": true
},
"coverage": {
"lineThreshold": 80,
"branchThreshold": 70
},
"security": {
"enabled": true,
"sast": true,
"dependencyAudit": true,
"secretScanning": true,
"blockOnSeverity": "high"
},
"agents": {
"team": ["architect", "implementer", "tester", "reviewer"],
"soloMode": false
}
}| Variable | Effect |
|---|---|
FORGE_MAX_ITERATIONS |
Override max loop iterations |
FORGE_MAX_CALLS_PER_HOUR |
Override API rate limit |
FORGE_TDD_ENABLED |
Enable/disable TDD enforcement |
FORGE_SECURITY_ENABLED |
Enable/disable security scanning |
src/
├── agents/ # Multi-agent role system (roles, handoff, teams)
├── commands/ # CLI commands (init, import, run, status, report)
├── commits/ # Conventional commit classification and planning
├── config/ # Zod-validated config with env var overrides
├── docs/ # Changelog and ADR generation
├── gates/ # Quality gate pipeline with auto-fix retry
├── logging/ # Pino structured logger with file transport
├── loop/ # Core engine, orchestrator, executor, circuit breaker
├── metrics/ # Code complexity and test ratio analysis
├── prd/ # PRD parsing, task graph (DAG), auto-decomposition
├── security/ # Secret detection, SAST, dependency audit
├── speckit/ # Spec-kit format parser and context injection
├── tdd/ # Red-Green-Refactor enforcement
├── tui/ # Ink live dashboard with cost tracking and overlays
├── cli.ts # CLI entry point
└── index.ts # Public API
npm install
npm test # Run tests
npm run test:coverage # Coverage report
npm run typecheck # TypeScript strict mode
npm run build # Build with tsup- Node.js >= 20
- Claude Code CLI installed and authenticated
MIT