GitHub - gaebalai/hdd-dev: HDD(Harness-Driven Development) - Keep AI-built systems coherent when requirements change.

HDD — Harness-Driven Development
Keep AI-built systems coherent when requirements change.

Harnesses tell agents how to work. HDD keeps artifacts coherent.

pip install hdd-dev

v1.0.0 — init / scan / impact are stable. generate / implement / assemble / validate / extract are alpha.

Why HDD?

AI can generate code from specs. But what happens when requirements change mid-project?

Which design docs are affected?
Which tests need updating?
Which API contracts broke?
Did anyone forget to update the database migration?

Spec Kit and OpenSpec answer "how do I start?" HDD answers "how do I keep going when things change?"

How It Works

Requirements (human)  →  Design docs (AI)  →  Code & tests (AI)
                              ↑
                    hdd scan builds the
                     dependency graph
                              ↓
            Something changes? hdd impact tells you
             exactly what's affected — automatically.

The Three Layers

Harness (CLAUDE.md, Hooks, Skills)   ← Rules, guardrails, workflow
  └─ HDD (methodology)              ← Harness across changes
       └─ Design docs (docs/*.md)    ← Artifacts HDD manages

HDD is harness-agnostic — works with Claude Code, Copilot, Cursor, or any agent framework.

Core Principle: Derive, Don't Configure

Architecture	Derived test strategy	Config needed?
Next.js + Supabase	vitest + Playwright	None
FastAPI + Python	pytest + httpx	None
CLI tool in Go	go test	None

Upstream determines downstream. You define requirements and constraints. AI derives everything else.

Quick Start

Greenfield (new project)

pip install hdd-dev
mkdir my-project && cd my-project && git init

# Initialize — pass your requirements file, any format works
hdd init --project-name "my-project" --language "typescript" \
  --requirements spec.txt

# AI designs the document dependency graph
hdd plan --init

# Generate design docs wave by wave
waves=$(hdd plan --waves)
for wave in $(seq 1 $waves); do
  hdd generate --wave $wave
done

# Quality gate — catch AI laziness (TODOs, placeholders)
hdd validate

# Generate code from design docs
sprints=$(hdd plan --sprints)
for sprint in $(seq 1 $sprints); do
  hdd implement --sprint $sprint
done

# Assemble code fragments into a buildable project
hdd assemble

Brownfield (existing project)

hdd extract              # Reverse-engineer design docs from code
hdd plan --init          # Generate wave_config from extracted docs
hdd scan                 # Build dependency graph
hdd impact               # Change impact analysis

5-Minute Greenfield Demo — Spec to Working App

37 lines of spec → 6 design docs (1,353 lines) → 102 code files (6,445 lines) → TypeScript strict build passes.

Step 1: Write your requirements

# TaskFlow — Personal Todo App

## Functional Requirements
- Task CRUD: create, read, update, delete tasks
- Each task has: title, description (optional), due date (optional),
  priority (low/medium/high), completed status
- Task list with filtering by: status (all/active/completed), priority
- Local state management (no backend, localStorage)

## UI Requirements
- Single-page app with responsive layout (mobile-first)
- Dark theme with accent color (#3b82f6)
- Floating action button opens a modal form
- Toast notifications on create/update/delete
- Keyboard shortcuts: Enter to submit, Escape to close modal

## Constraints
- Next.js 15 App Router with React Server Components
- Tailwind CSS
- TypeScript strict mode
- Deploy-ready as static export

Step 2: Run the pipeline

pip install hdd-dev
hdd init --requirements spec.md
hdd plan --init                          # AI designs the wave structure

waves=$(hdd plan --waves)                # → 4
for wave in $(seq 1 $waves); do
  hdd generate --wave $wave              # design docs, wave by wave
done

hdd validate                             # quality gate

sprints=$(hdd plan --sprints)            # → 17
for sprint in $(seq 1 $sprints); do
  hdd implement --sprint $sprint         # code from design docs
done

hdd assemble                             # integrate into buildable project
npm run build                             # TypeScript strict, zero errors

No interactive AI chat at any step. Every AI call goes through claude --print — prompt in, text out. Harness as Code: the entire workflow is a shell script.

Step 3: Model role separation

# Design docs — needs judgment, use Opus
hdd generate --wave 1 --ai-cmd 'claude --print --model claude-opus-4-6 --tools ""'

# Code generation — needs volume, use Codex (or Sonnet)
hdd implement --sprint 1 --ai-cmd 'codex --full-auto -q'

5-Minute Brownfield Demo — Change Impact Analysis

Already have a codebase? HDD tracks what's affected when requirements change.

Step 1: Write requirements and generate design docs

# TaskFlow — Requirements

## Functional Requirements
- User auth (email + Google OAuth)
- Workspace management (teams, roles, invites)
- Task CRUD with assignees, labels, due dates
- Real-time updates (WebSocket)
- File attachments (S3)
- Notification system (in-app + email)

## Constraints
- Next.js + Prisma + PostgreSQL
- Row-level security for workspace isolation
- All API endpoints rate-limited

hdd init --requirements spec.txt
hdd plan --init
waves=$(hdd plan --waves)
for wave in $(seq 1 $waves); do hdd generate --wave $wave; done
hdd scan

Scan complete:
  Documents with frontmatter: 7
  Graph: 7 nodes, 15 edges

Step 2: Change requirements mid-project

Your PM asks for SSO and audit logging. Add to docs/requirements/requirements.md:

## Additional Requirements (v1.1)
- SAML SSO (enterprise customers)
- Audit logging (record & export all operations)

hdd impact    # detects uncommitted changes automatically

# HDD Impact Report

## Green Band (high confidence, auto-propagate)
| Target                  | Depth | Confidence |
|-------------------------|-------|------------|
| design:system-design    | 1     | 0.90       |
| design:api-design       | 1     | 0.90       |
| detail:db-design        | 2     | 0.90       |
| detail:auth-design      | 2     | 0.90       |

## Amber Band (must review)
| Target                  | Depth | Confidence |
|-------------------------|-------|------------|
| test:test-strategy      | 2     | 0.90       |

2 lines changed → 6 out of 7 docs affected. Green band: AI auto-updates. Amber: human reviews. You know exactly what to fix before anything breaks.

Wave-Based Generation

Design docs are generated in dependency order — each Wave depends on the previous:

Wave 1  Acceptance criteria + ADR       ← requirements only
Wave 2  System design                   ← req + Wave 1
Wave 3  DB design + API design          ← req + Wave 1-2
Wave 4  UI/UX design                    ← req + Wave 1-3
Wave 5  Implementation plan             ← all above

Verification runs bottom-up (V-Model):

Unit tests        ← verifies detailed design
Integration       ← verifies system design
E2E / System      ← verifies requirements + acceptance criteria

Frontmatter = Single Source of Truth

Dependencies are declared in Markdown frontmatter. No separate config files.

---
hdd:
  node_id: "design:api-design"
  modules: ["api", "auth"]        # ← links to source code modules
  depends_on:
    - id: "design:system-design"
      relation: derives_from
    - id: "req:my-project-requirements"
      relation: implements
---

The modules field enables reverse traceability: when source code changes, hdd extract identifies affected modules, and the modules field maps those modules back to the design docs that need updating.

hdd/scan/ is a cache — regenerated on every hdd scan.

AI Model Configuration

HDD calls an external AI CLI for document generation. The default is Claude Opus:

# hdd.yaml
ai_command: "claude --print --model claude-opus-4-6"

Per-Command Override

Different commands can use different models. For example, use Opus for design doc generation but Codex for code implementation:

ai_command: "claude --print --model claude-opus-4-6"   # global default
ai_commands:
  generate: "claude --print --model claude-opus-4-6"    # design doc generation
  restore: "claude --print --model claude-opus-4-6"     # brownfield reconstruction
  review: "claude --print --model claude-opus-4-6"      # quality evaluation
  plan_init: "claude --print --model claude-sonnet-4-6" # wave_config planning
  implement: "codex --print"                             # code generation

Resolution priority: CLI --ai-cmd flag > ai_commands.{command} > ai_command > built-in default (Opus).

Config Directory Discovery

By default, hdd init creates a hdd/ directory. If your project already has a hdd/ directory (e.g., it's your source code package), use --config-dir:

hdd init --config-dir .hdd --project-name "my-project" --language "python"

All other commands (scan, impact, generate, etc.) automatically discover whichever config directory exists — hdd/ first, then .hdd/. No extra flags needed.

Brownfield? Start Here

Already have a codebase? HDD provides a full brownfield workflow — from code extraction to design doc reconstruction.

Step 1: Extract structure from code

hdd extract reverse-engineers design documents from your source code. No AI required — pure static analysis.

cd existing-project
hdd extract

Extracted: 13 modules from 45 files (12,340 lines)
Output: hdd/extracted/
  system-context.md     # Module map + dependency graph
  modules/auth.md       # Per-module design doc
  modules/api.md
  modules/db.md
  ...

Step 2: Generate wave_config from extracted docs

hdd plan --init automatically detects extracted docs and generates a wave_config — no requirement docs needed.

hdd plan --init    # Detects hdd/extracted/, builds brownfield wave_config

Each artifact in the generated wave_config includes a modules field linking it to source code modules — enabling reverse traceability from code changes back to design docs.

Step 3: Restore design documents

hdd restore reconstructs design documents from extracted facts. Unlike hdd generate (which creates docs from requirements), restore asks "what IS the current design?" — reconstructing intent from code structure.

hdd restore --wave 2   # Reconstruct system design from extracted facts
hdd restore --wave 3   # Reconstruct DB/API design

Step 4: Build the graph

hdd scan
hdd impact

Philosophy: In V-Model, intent lives only in requirements. Architecture, design, and tests are structural facts — extractable from code. hdd extract gets the structure; hdd restore reconstructs the design; you add the "why" later.

Greenfield vs Brownfield

	Greenfield	Brownfield
Starting point	Requirements (human-written)	Existing codebase
Planning	`hdd plan --init` (from requirements)	`hdd plan --init` (from extracted docs)
Doc generation	`hdd generate` (forward: requirements → design)	`hdd restore` (backward: code facts → design)
Traceability	`modules` field links docs → code	`modules` field links docs → code
Modification	`hdd propagate` (code → affected docs → optional AI update)	Same flow

Commands

Command	Status	Description
`hdd init`	Stable	Initialize HDD in any project (`--config-dir .hdd` for projects where `hdd/` exists)
`hdd scan`	Stable	Build dependency graph from frontmatter
`hdd impact`	Stable	Change impact analysis (Green / Amber / Gray)
`hdd validate`	Alpha	Frontmatter integrity & graph consistency check
`hdd generate`	Experimental	Generate design docs in Wave order (greenfield)
`hdd restore`	Experimental	Reconstruct design docs from extracted facts (brownfield)
`hdd plan`	Experimental	Wave execution status (`--init` supports brownfield fallback)
`hdd verify`	Experimental	V-Model verification
`hdd implement`	Experimental	Design-to-code generation
`hdd propagate`	Experimental	Reverse-propagate source code changes to design docs
`hdd review`	Experimental	AI-powered artifact quality evaluation (LLM-as-Judge)
`hdd extract`	Alpha	Reverse-engineer design docs from existing code

Claude Code Integration

HDD ships with slash-command Skills for Claude Code. Instead of running CLI commands yourself, use Skills — Claude reads the project context and runs the right command with the right flags.

Skills Demo — Same TaskFlow App, Zero CLI

You:  /hdd-init
      → Claude: hdd init --project-name "taskflow" --language "typescript" \
                  --requirements spec.txt

You:  /hdd-generate
      → Claude: hdd generate --wave 2 --path .
      → Claude reads every generated doc, checks scope, validates frontmatter
      → "Wave 2의 설계서를 확인했습니다. Wave 3로 진행할까요?"

You:  yes

You:  /hdd-generate
      → Claude: hdd generate --wave 3 --path .

You:  /hdd-scan
      → Claude: hdd scan --path .
      → Reports: "7 documents, 15 edges. No warnings."

You:  (edit requirements — add SSO + audit logging)

You:  /hdd-impact
      → Claude: hdd impact --path .
      → Green Band: auto-updates system-design, api-design, db-design, auth-design
      → Amber Band: "test-strategy is affected. Update it?"

You:  (modify source code — implement the SSO feature)

You:  /hdd-propagate
      → Claude: hdd propagate --path .
      → "3 files changed in auth module. 2 design docs affected:
         design:system-design, design:auth-detail"
      → "Run with --update to update these docs?"

You:  yes
      → Claude: hdd propagate --path . --update
      → Reviews updated docs, confirms changes are accurate

Key difference: Skills add human-in-the-loop gates. /hdd-generate pauses between waves for approval. /hdd-impact follows the Green/Amber/Gray protocol — auto-updating safe changes, asking before risky ones.

Hook Integration — Set It Once, Never Think Again

Add this hook and you never run hdd scan manually again. Every file edit triggers it automatically — the dependency graph is always current, always accurate, zero mental overhead:

{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Edit|Write",
      "hooks": [{
        "type": "command",
        "command": "hdd scan --path ."
      }]
    }]
  }
}

With hooks active, your entire workflow becomes: edit files normally, then run /hdd-impact when you want to know what's affected. That's it. The graph maintenance is invisible.

Available Skills

Skill	What it does
`/hdd-init`	Initialize + import requirements
`/hdd-generate`	Generate design docs wave-by-wave with HITL gates (greenfield)
`/hdd-restore`	Reconstruct design docs from extracted code facts (brownfield)
`/hdd-scan`	Rebuild dependency graph
`/hdd-impact`	Change impact analysis with Green/Amber/Gray protocol
`/hdd-validate`	Frontmatter & dependency consistency check
`/hdd-propagate`	Reverse-propagate source code changes to design docs
`/hdd-review`	AI quality review with PASS/FAIL verdict and feedback

See docs/claude-code-setup.md for complete setup.

Autonomous Quality Loop

hdd review evaluates artifacts using AI (LLM-as-Judge), and --feedback feeds results back into generation. Together they enable a fully autonomous quality loop:

# Generate → Review → Regenerate with feedback until PASS
hdd generate --wave 2 --force
feedback=$(hdd review --path . --json | jq -r '.results[0].feedback')
verdict=$(hdd review --path . --json | jq -r '.results[0].verdict')

while [ "$verdict" = "FAIL" ]; do
  hdd generate --wave 2 --force --feedback "$feedback"
  result=$(hdd review --path . --json)
  verdict=$(echo "$result" | jq -r '.results[0].verdict')
  feedback=$(echo "$result" | jq -r '.results[0].feedback')
done

Review criteria are type-specific:

Doc Type	Criteria
Requirement	Completeness, consistency, testability, ambiguity
Design	Architecture soundness, API quality, security, upstream consistency
Detailed Design	Implementation clarity, data model, error handling, interface contracts
Test	Coverage, edge cases, independence, traceability

Scoring: 80+ = PASS. CRITICAL issues auto-cap at 59. Exit code 1 on FAIL — loop-friendly.

Model allocation: Use Opus for review (ai_commands.review), Codex for implementation (ai_commands.implement). The ai_commands config makes this a one-line change.

How HDD Differs from Other Spec-Driven Tools

All major spec-driven tools focus on creating design documents. None address what happens when those documents change. HDD fills that gap with a dependency graph, impact analysis, and a band-based update protocol.

	spec-kit (GitHub)	Kiro (AWS)	cc-sdd (gotalab)	HDD
Focus	Spec creation (req -> design -> tasks -> code)	Agentic IDE with native SDD pipeline	Kiro-style SDD for Claude Code	Post-creation coherence maintenance
Stars	83.7k	N/A (proprietary IDE)	3k	--
Change propagation	No	No	No	`hdd impact` + dependency graph
Impact analysis	No	No	No	Green / Amber / Gray bands
Spec notation	Markdown + 40 extensions	EARS notation	Quality gates + git worktree	Frontmatter `depends_on`
Harness lock-in	GitHub Copilot	Kiro IDE	Claude Code	Any agent / IDE

In short: spec-kit, Kiro, and cc-sdd answer "how do I create specs?" HDD answers "how do I keep specs, code, and tests coherent when requirements change?"

Comparison

	Spec Kit	OpenSpec	HDD
Spec-first generation	Yes	Yes	Yes
Change propagation	No	No	Dependency graph + impact analysis
Derive test strategy	No	No	Automatic from architecture
V-Model verification	No	No	Unit → Integration → E2E
Impact analysis	No	No	`hdd impact`
Harness-agnostic	Copilot focused	Multi-agent	Any harness

Real-World Usage

Battle-tested on a production web app — 18 design docs connected by a dependency graph. All docs, code, and tests generated by AI following HDD. When requirements changed mid-project, hdd impact identified affected artifacts and AI fixed them automatically.

docs/
├── requirements/       # What to build (human input — plain text)
├── design/             # System design, API, DB, UI (AI-generated)
├── detailed_design/    # Module-level specs (AI-generated)
├── governance/         # ADRs (AI-generated)
├── plan/               # Implementation plan
├── test/               # Acceptance criteria, test strategy
├── operations/         # Runbooks
└── infra/              # Infrastructure design

HDD Manages Its Own Development

HDD dogfoods itself. The .hdd/ directory contains HDD's own config, and hdd extract reverse-engineers design docs from its own source code. The full V-Model lifecycle runs on itself:

hdd init --config-dir .hdd --project-name "hdd-dev" --language "python"
hdd extract          # 15 modules → design docs with dependency frontmatter
hdd scan             # 49 nodes, 83 edges
hdd verify           # mypy + pytest (127/127 tests pass)

If HDD can't manage itself, it shouldn't manage your project.

Roadmap

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
.hdd		.hdd
docs		docs
hdd		hdd
hooks		hooks
skills		skills
templates		templates
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
README_ko.md		README_ko.md
RUNBOOK.md		RUNBOOK.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Why HDD?

How It Works

The Three Layers

Core Principle: Derive, Don't Configure

Quick Start

Greenfield (new project)

Brownfield (existing project)

5-Minute Greenfield Demo — Spec to Working App

Step 1: Write your requirements

Step 2: Run the pipeline

Step 3: Model role separation

5-Minute Brownfield Demo — Change Impact Analysis

Step 1: Write requirements and generate design docs

Step 2: Change requirements mid-project

Wave-Based Generation

Frontmatter = Single Source of Truth

AI Model Configuration

Per-Command Override

Config Directory Discovery

Brownfield? Start Here

Step 1: Extract structure from code

Step 2: Generate wave_config from extracted docs

Step 3: Restore design documents

Step 4: Build the graph

Greenfield vs Brownfield

Commands

Claude Code Integration

Skills Demo — Same TaskFlow App, Zero CLI

Hook Integration — Set It Once, Never Think Again

Available Skills

Autonomous Quality Loop

How HDD Differs from Other Spec-Driven Tools

Comparison

Real-World Usage

HDD Manages Its Own Development

Roadmap

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages