Skip to content

fnusatvik07/autonomous-coding-ralph-loop

Repository files navigation

Ralph Loop

Autonomous Coding Agent

Describe it. Ralph builds it.

From a single task description to tested, committed, production-ready code
with human approval at every step.

Tests Tasks Coverage License Python


Ralph Loop Dashboard


Overview

Ralph Loop takes a plain-English task description and autonomously builds the entire project — specification, task breakdown, code, tests, QA review, and git commits. You approve the spec and task list before any coding begins. Every change is reviewed by a separate QA agent. If something fails, a healer agent fixes it automatically.

The result: tested, committed code with clean git history, delivered in minutes.


At a Glance

What you provide

  • A task description in plain English
  • Your API key
  • Budget limit (optional)

What Ralph delivers

  • Application specification (spec.md)
  • Atomic task breakdown (prd.json)
  • Working code with tests
  • Clean git history (1 commit per task)
  • Cost and analytics dashboard

Proven Results

These are actual runs with real API calls — not benchmarks, not mocks.

Project Tasks Tests Generated Coverage Cost Time
Todo API
FastAPI + SQLite + CRUD + validation
10/10 47 pass $2.48 20 min
URL Shortener
Cache + rate limiting + click tracking
6/6 35 pass $2.81 20 min
Unit Converter
CLI + 3 unit types + registry pattern
12/12 66 pass 98% $5.73 30 min
Existing Codebase
Add search to Todo API (zero regressions)
2/2 58 pass $0.89 9 min

35 out of 35 real API tasks completed. 158 framework tests passing.


How It Works

┌─────────────────────────────────────────────────────────────────┐
│                                                                   │
│   You: "Build a REST API with FastAPI for managing todo items"    │
│                                                                   │
│         │                                                         │
│         ▼                                                         │
│   ┌─────────────┐                                                │
│   │ SPEC GEN    │  LLM writes spec.md                            │
│   └──────┬──────┘  (architecture, models, API, tests)            │
│          │                                                        │
│          ▼                                                        │
│   ┌─────────────┐                                                │
│   │ YOU REVIEW  │  Full-screen markdown viewer                   │
│   │ & APPROVE   │  Edit, download, or reject                     │
│   └──────┬──────┘                                                │
│          │                                                        │
│          ▼                                                        │
│   ┌─────────────┐                                                │
│   │ TASK SPLIT  │  spec.md → atomic tasks (prd.json)             │
│   └──────┬──────┘  Each with acceptance criteria                 │
│          │                                                        │
│          ▼                                                        │
│   ┌─────────────┐  For each task:                                │
│   │ CODE LOOP   │  Code → Test → QA Review → Heal → Commit      │
│   │             │  Fresh context per iteration                    │
│   └──────┬──────┘  Separate QA sentinel per task                 │
│          │                                                        │
│          ▼                                                        │
│   ┌─────────────┐                                                │
│   │ DELIVERED   │  All tests pass. Clean git. Analytics.         │
│   └─────────────┘                                                │
│                                                                   │
└─────────────────────────────────────────────────────────────────┘

Setup

Prerequisites

Requirement Why
Python 3.12+ Runtime
Claude Code CLI npm install -g @anthropic-ai/claude-code
Anthropic API key Or Azure Foundry endpoint, or OpenAI key
Node.js 18+ Only if modifying the web dashboard

Install

git clone https://github.com/fnusatvik07/autonomous-coding-ralph-loop.git
cd autonomous-coding-ralph-loop

# With uv (recommended)
uv pip install -e ".[web]"

# Or with pip
pip install -e ".[web]"

Drop [web] if you only want the CLI without the dashboard.

Configure

cp .env.example .env

Then set your API key in .env:

Option A — Anthropic API (simplest)
ANTHROPIC_API_KEY=sk-ant-your-key-here
Option B — Azure Foundry
CLAUDE_CODE_USE_FOUNDRY=1
ANTHROPIC_FOUNDRY_API_KEY=your-foundry-key
ANTHROPIC_FOUNDRY_BASE_URL=https://your-endpoint.azure.com/anthropic/
ANTHROPIC_DEFAULT_SONNET_MODEL=claude-opus-4-6
Option C — OpenAI (via Deep Agents)
OPENAI_API_KEY=sk-proj-your-key-here
RALPH_PROVIDER=deep-agents
RALPH_MODEL=openai:gpt-4o

Verify

ralph --version
ralph --help

Usage

CLI

ralph run "Build a REST API with FastAPI for a todo app"

ralph run "Build a CLI tool" -m claude-opus-4-20250514     # specific model
ralph run "Build something" --budget 10.00                  # budget cap
ralph run "Add auth" -w ./my-project                        # existing project

ralph resume -w ./my-project                                # continue previous run
ralph status -w ./my-project                                # check progress
ralph analytics -w ./my-project                             # cost breakdown

Web Dashboard

ralph web                    # opens http://localhost:8420
ralph web -w ./my-project    # point at specific workspace
ralph web -p 9000            # custom port

The dashboard walks you through: task inputspec reviewtask approvallive coding terminalresults browser


Key Features

2-Step Spec Flow
Task → spec.md → human review → prd.json → human review → code. Nothing runs without approval.

QA Sentinel
A separate LLM session reviews every code change. Blocks on failing tests, security issues, or missing coverage.

Healer Loop
When QA fails, a debugging specialist iterates up to 5 times. Auto-rollback on final failure.

Multi-Model Routing
Haiku for scaffolding. Sonnet for features. Opus for architecture. 60% cost reduction.

Reflexion
LLM analyzes why it failed and stores the lesson. Future iterations read these before starting.

Git Checkpoints
Tags before each task. Rollback to last known-good state on failure. Clean squash on success.

Budget Control
Set a max spend with --budget. Warning at 80%. Hard stop when exceeded.

Full Observability
sessions.jsonl with cost/duration per session. Structured logging. Web analytics dashboard.

Safety
15 regex patterns blocking dangerous shell commands. acceptEdits permission model. Env filtering.


Project Structure

ralph/
├── cli.py                # CLI commands
├── config.py             # Configuration
├── loop.py               # Main orchestrator
├── models.py             # Data models
├── providers/
│   ├── claude_sdk.py     # Claude Agent SDK
│   └── deep_agents.py    # Deep Agents SDK (any LLM)
├── prompts/
│   └── templates.py      # All prompt templates
├── spec/
│   └── generator.py      # spec.md → prd.json
├── qa/
│   ├── sentinel.py       # Quality gate
│   └── healer.py         # Fix loop
├── routing.py            # Model routing by complexity
├── reflexion.py          # Failure analysis
├── checkpoint.py         # Git checkpoints
├── observability.py      # Logging + analytics
├── web/
│   ├── server.py         # FastAPI + WebSocket
│   ├── runner.py         # WebRalphLoop
│   └── events.py         # Event bus
├── memory/
│   ├── progress.py       # Iteration log
│   └── guardrails.py     # Failure memory
frontend/                 # React + TypeScript + Tailwind
tests/                    # 158 tests, 20 files
.claude/skills/           # /spec, /code, /qa, /status

Workspace Output

When Ralph runs, it creates .ralph/ in the project directory:

File Purpose
spec.md Application specification (human-readable)
prd.json Task queue with status tracking
progress.md Iteration log with learnings
guardrails.md Failure signs for future iterations
reflections.md LLM failure analysis
sessions.jsonl Per-session cost, duration, tools
ralph.log Structured debug log

CLI Reference

Command Description
ralph run "task" Start the coding loop
ralph run -f task.md Task from a file
ralph resume Continue from existing PRD
ralph status Show task progress
ralph analytics Cost and session analytics
ralph web Launch web dashboard
ralph progress Iteration log
ralph guardrails Failure memory
ralph index Codebase index

Tests

python -m pytest tests/ -v       # 158 tests across 20 files

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors