codeharness

Makes autonomous coding agents produce software that actually works — not software that passes tests.

codeharness is an npm CLI + Claude Code plugin that packages verification-driven development as an installable tool: black-box verification via Docker, agent-first observability via VictoriaMetrics, and mechanical enforcement via hooks that make skipping verification architecturally impossible.

What it does

Verifies features work — not just that tests pass. Black-box verification runs the built CLI inside a Docker container with no source code access. If the feature doesn't work from a user's perspective, verification fails.
Fixes what it finds — verification failures with code bugs automatically return to development with specific findings. The dev agent gets told exactly what's broken and why.
Runs sprints autonomously — reads your sprint plan, picks the highest-priority story, implements it, reviews it, verifies it, and moves to the next one. Cross-epic prioritization, retry management, and session handoff built in.
Makes agents see runtime — ephemeral VictoriaMetrics stack (logs, metrics, traces) that agents query programmatically during development. No guessing at what the code does at runtime.

Installation

Two components — install both:

# CLI (npm package)
npm install -g codeharness

# Claude Code plugin (slash commands, hooks, skills)
claude plugin install github:iVintik/codeharness

Quick Start

# Initialize in your project
codeharness init

# Start autonomous sprint execution (inside Claude Code)
/harness-run

How it works

As a CLI (`codeharness`)

The CLI handles all mechanical work — stack detection, Docker management, verification, coverage, retry state.

Command	Purpose
`codeharness init`	Detect stack, install dependencies, start observability, scaffold docs
`codeharness run`	Execute the autonomous coding loop (Ralph)
`codeharness verify --story <key>`	Run verification pipeline for a story
`codeharness status`	Show harness health, sprint progress, Docker stack
`codeharness coverage`	Run tests with coverage and evaluate against targets
`codeharness onboard epic`	Scan codebase for gaps, generate onboarding stories
`codeharness retry --status`	Show retry counts and flagged stories
`codeharness retry --reset`	Clear retry state for re-verification
`codeharness verify-env build`	Build Docker image for black-box verification
`codeharness stack start`	Start the shared observability stack
`codeharness teardown`	Remove harness from project

All commands support --json for machine-readable output.

As a Claude Code plugin (`/harness-*`)

The plugin provides slash commands that orchestrate the CLI within Claude Code sessions:

Command	Purpose
`/harness-run`	Autonomous sprint execution — picks stories by priority, runs create → dev → review → verify loop
`/harness-init`	Interactive project initialization
`/harness-status`	Quick overview of sprint progress and harness health
`/harness-onboard`	Scan project and generate onboarding plan
`/harness-verify`	Verify a story with real-world evidence

BMAD Method integration

codeharness integrates with BMAD Method for structured sprint planning:

Phase	Commands
Analysis	`/create-brief`, `/brainstorm-project`, `/market-research`
Planning	`/create-prd`, `/create-ux`
Solutioning	`/create-architecture`, `/create-epics-stories`
Implementation	`/sprint-planning`, `/create-story`, then `/harness-run`

Verification architecture

┌─────────────────────────────────────────┐
│  Claude Code Session                     │
│  /harness-run picks next story           │
│  → create-story → dev → review → verify  │
└────────────────────┬────────────────────┘
                     │ verify
                     ▼
┌─────────────────────────────────────────┐
│  Docker Container (no source code)       │
│  - codeharness CLI installed from tarball│
│  - claude CLI for nested verification    │
│  - curl/jq for observability queries     │
│  Exercises CLI as a real user would      │
└────────────────────┬────────────────────┘
                     │ queries
                     ▼
┌─────────────────────────────────────────┐
│  Observability Stack (VictoriaMetrics)   │
│  - VictoriaLogs  :9428 (LogQL)          │
│  - VictoriaMetrics :8428 (PromQL)       │
│  - OTEL Collector :4318                  │
└─────────────────────────────────────────┘

When verification finds code bugs → story returns to dev with findings → dev fixes → re-verify. This loop runs up to 10 times per story. Infrastructure failures (timeouts, Docker errors) retry 3 times then skip.

Requirements

Node.js >= 18
Docker (for observability and verification)
Claude Code (for plugin features)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 321 Commits
.claude-plugin		.claude-plugin
.claude		.claude
.github/workflows		.github/workflows
.ralph		.ralph
_bmad-output		_bmad-output
_bmad		_bmad
agents		agents
bin		bin
commands		commands
coverage		coverage
docs		docs
hooks		hooks
knowledge		knowledge
patches		patches
ralph		ralph
scripts		scripts
skills		skills
src		src
templates		templates
test		test
tests		tests
verification		verification
.circuit_breaker_history		.circuit_breaker_history
.circuit_breaker_state		.circuit_breaker_state
.dockerignore		.dockerignore
.gitignore		.gitignore
AGENTS.md		AGENTS.md
ARCHITECTURE.md		ARCHITECTURE.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.harness.yml		docker-compose.harness.yml
eslint.config.js		eslint.config.js
otel-collector-config.yaml		otel-collector-config.yaml
package-lock.json		package-lock.json
package.json		package.json
sprint-state.json		sprint-state.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

codeharness

What it does

Installation

Quick Start

How it works

As a CLI (`codeharness`)

As a Claude Code plugin (`/harness-*`)

BMAD Method integration

Verification architecture

Requirements

License

About

Uh oh!

Releases 62

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

codeharness

What it does

Installation

Quick Start

How it works

As a CLI (codeharness)

As a Claude Code plugin (/harness-*)

BMAD Method integration

Verification architecture

Requirements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 62

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

As a CLI (`codeharness`)

As a Claude Code plugin (`/harness-*`)

Packages