Skip to content

redgreenlabs/forge-cli

Repository files navigation

Forge CLI

Autonomous multi-agent development orchestrator that drives Claude Code through TDD loops with security scanning and conventional commits.

You describe what to build. Forge builds it — test-first, secure, and documented.

Features

  • TDD Enforcement — Red-Green-Refactor cycle tracked and enforced. Tests are written before code, every time.
  • Multi-Agent Teams — 6 specialized roles (Architect, Implementer, Tester, Reviewer, Security, Documenter) with automatic task-to-agent matching
  • Quality Gates with Auto-Fix — 5-gate pipeline (tests, coverage, security, lint, commit) with blocking/warning severity. When gates fail, Claude automatically attempts to fix the issues before giving up.
  • Security Scanning — Secret detection, SAST, dependency audit on every iteration
  • Conventional Commits — Automatic commit per TDD phase (test:, feat:, refactor:)
  • Live TUI Dashboard — Real-time Claude output, cost tracking, TDD pipeline visualization, quality gate status. Press d for detailed dashboard overlay, q for graceful quit.
  • Cost Tracking — Real-time cost per task, per execution, and per phase with averages
  • Smart Task Ordering — Tasks sorted by priority (critical → low) and dependency graph depth (foundational tasks first)
  • Auto Task Decomposition — Large complex tasks are automatically split into TDD-friendly subtasks
  • Session Continuity — Unified TDD prompt preserves Claude's context across Red/Green/Refactor phases. Resume loops across restarts with persistent state.
  • Spec-Kit Integration — Use GitHub's spec-kit for planning, Forge for execution
  • Human-in-the-Loop — When a task fails repeatedly, Forge pauses and prompts you: retry with guidance, defer to later, skip permanently, or abort. Your guidance is injected into the next attempt.
  • Deferred Tasks — Skip a task for now and come back to it later. Deferred tasks run after all other pending work completes.
  • Circuit Breaker — Stagnation detection with auto-recovery (Nygard pattern)
  • Stream Output — Real-time structured JSON streaming from Claude CLI for live TUI feedback

Quick Start

# Install
npm install -g @redgreen-labs/forge-cli

# Run inside your existing project directory
cd my-project
forge init

# Import requirements
forge import requirements.md

# Start the development loop
forge run --iterations 20

# Check progress
forge status

Task Sources

Forge supports three task formats, auto-detected in priority order:

1. Spec-Kit (recommended for new projects)

Use spec-kit for the planning phase, then Forge for execution:

# Generate specs with spec-kit
npx spec-kit specify
npx spec-kit plan
npx spec-kit tasks

# Forge auto-detects specs/tasks.md
forge run

Forge reads from specs/:

File Purpose
specs/tasks.md Task list with T-IDs, phases, dependencies
specs/constitution.md Project principles — injected into agent prompts
specs/spec.md Detailed requirements — injected into agent prompts
specs/plan.md Architecture decisions — injected into agent prompts

Spec-kit task format:

## Phase 1: Setup (Shared Infrastructure)

- [ ] T001 [P] Initialize project structure
- [ ] T002 Configure CI pipeline (depends on T001)

## Phase 2: User Story 1 - Authentication (Priority: P1)

- [ ] T003 [US1] Implement login endpoint
  - Returns JWT token on success
  - Returns 401 on invalid credentials

Markers: [P] = parallelizable, [US1] = user story ref, (depends on T001) = dependency.

2. Forge PRD (JSON)

forge import requirements.md    # Parses to .forge/prd.json
forge run

3. Markdown Task List

# Place a tasks.md in .forge/
forge run

Commands

Command Description
forge init Initialize project, auto-detect workspaces and language
forge import <file> Import PRD, scan for existing implementations, auto-decompose
forge run Start the autonomous development loop
forge status Show session progress and quality metrics
forge report Generate a project health report
forge decompose Decompose large tasks into smaller TDD-friendly subtasks
forge agents List available agent roles and their tools

forge init Options

-n, --name <name>     Project name
-i, --interactive     Guided PRD creation
-f, --force           Overwrite existing .forge directory
--no-scan             Skip workspace auto-detection
-v, --verbose         Show detailed scan output

forge import Options

--no-scan             Skip codebase scan for existing implementations
--no-decompose        Skip automatic decomposition of large tasks
-v, --verbose         Show detailed scan output

forge run Options

-n, --iterations <n>  Maximum iterations (default: 50)
--resume              Resume from previous run, skipping completed tasks
--no-tui              Disable live TUI (plain text output)
-v, --verbose         Show detailed executor output
--solo                Single agent mode (no team rotation)
--dry-run             Simulate execution without running Claude

forge status Options

--json                Output as JSON
-w, --watch           Refresh status every few seconds
--interval <seconds>  Watch interval in seconds (default: 3)

forge report Options

-f, --format <type>   Output format: terminal, html, or json (default: terminal)

forge decompose Options

--threshold <n>       Complexity threshold 1-10 (tasks above this are decomposed)
--max-subtasks <n>    Max subtasks per parent task
--dry-run             Show which tasks would be decomposed without calling Claude
-v, --verbose         Show detailed output

How It Works

Each iteration follows this pipeline:

Select Task → TDD Red (write failing test)
            → TDD Green (implement to pass)
            → TDD Refactor (clean up)
            → Security Scan
            → Quality Gates ──→ Pass → Commit → Next Task
                             └→ Fail → Auto-Fix → Re-run Gates (up to 3 retries)

Tasks are selected by priority (critical first) and dependency graph depth (foundational tasks that unblock others run first). The loop stops when all tasks are complete, max iterations reached, or the circuit breaker trips (repeated failures).

TUI Dashboard

The live dashboard shows real-time progress:

╭──────────────── FORGE Development Loop ─────────────────╮
│                                                          │
├──────────────────────────────────────────────────────────┤
│ Phase: IMPLEMENTING   Tasks: 3/10   Iter: 5              │
│ Elapsed: 04:32   Cost: $0.45   Commits: 8   Files: 3    │
│                                                          │
│ [████████████░░] 85%                                     │
│ Task: Implement user authentication                      │
│                                                          │
│ ✓Red → ✓Green → ●Refactor → ○Gates (2 cycles)           │
│ Gates: ✓tests ✓security ✓lint ✗coverage                  │
├──────────────────────────────────────────────────────────┤
│ Claude Output                                            │
│ ⚡ Writing src/auth/login.ts...                           │
│ Running npm test -- --reporter verbose                   │
│ Tests  42 passed (42)                                    │
├──────────────────────────────────────────────────────────┤
│ [d] Dashboard  [q] Quit          ✓tests ✓sec ✓lint ✗cov │
╰──────────────────────────────────────────────────────────╯

Press d to toggle the dashboard overlay with cost breakdown, coverage, security findings, and code quality metrics.

Human-in-the-Loop

When a task fails maxTaskFailures times (default 3), Forge pauses and shows an interactive prompt:

╭─ Task Failed (3x) ──────────────────────────────────────╮
│                                                          │
│ Automated UI tests (integration_test package)            │
│ Green phase failed: Process timed out (exit code 143)    │
│                                                          │
│ What would you like to do?                               │
│                                                          │
│ ▸ Retry with guidance — provide a hint to help           │
│   Skip for now — defer to later                          │
│   Skip permanently — won't retry                         │
│   Abort session — stop forge                             │
│                                                          │
│ Use ↑↓ arrows and Enter to select                        │
╰──────────────────────────────────────────────────────────╯
  • Retry with guidance: Type a hint (e.g., "Use widget tests instead of integration tests"). Your guidance is injected into the next attempt's prompt.
  • Skip for now (defer): The task moves to the back of the queue. Other tasks run first, then it retries with a fresh failure count.
  • Skip permanently: The task won't be retried (current default behavior).
  • Abort session: Stops Forge immediately.

In non-interactive mode (--no-tui), tasks are auto-skipped after maxTaskFailures (no prompt shown).

Configuration

.forge/forge.config.json:

{
  "maxIterations": 50,
  "maxCallsPerHour": 100,
  "timeoutMinutes": 15,
  "tdd": {
    "enabled": true,
    "requireFailingTestFirst": true,
    "commitPerPhase": true
  },
  "coverage": {
    "lineThreshold": 80,
    "branchThreshold": 70
  },
  "security": {
    "enabled": true,
    "sast": true,
    "dependencyAudit": true,
    "secretScanning": true,
    "blockOnSeverity": "high"
  },
  "agents": {
    "team": ["architect", "implementer", "tester", "reviewer"],
    "soloMode": false
  }
}

Environment Variable Overrides

Variable Effect
FORGE_MAX_ITERATIONS Override max loop iterations
FORGE_MAX_CALLS_PER_HOUR Override API rate limit
FORGE_TDD_ENABLED Enable/disable TDD enforcement
FORGE_SECURITY_ENABLED Enable/disable security scanning

Architecture

src/
├── agents/          # Multi-agent role system (roles, handoff, teams)
├── commands/        # CLI commands (init, import, run, status, report)
├── commits/         # Conventional commit classification and planning
├── config/          # Zod-validated config with env var overrides
├── docs/            # Changelog and ADR generation
├── gates/           # Quality gate pipeline with auto-fix retry
├── logging/         # Pino structured logger with file transport
├── loop/            # Core engine, orchestrator, executor, circuit breaker
├── metrics/         # Code complexity and test ratio analysis
├── prd/             # PRD parsing, task graph (DAG), auto-decomposition
├── security/        # Secret detection, SAST, dependency audit
├── speckit/         # Spec-kit format parser and context injection
├── tdd/             # Red-Green-Refactor enforcement
├── tui/             # Ink live dashboard with cost tracking and overlays
├── cli.ts           # CLI entry point
└── index.ts         # Public API

Development

npm install
npm test              # Run tests
npm run test:coverage # Coverage report
npm run typecheck     # TypeScript strict mode
npm run build         # Build with tsup

Requirements

License

MIT

About

AI-powered software craftsmanship: autonomous agents that build software test-first, secure, and documented

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors