Autonomous software development using AI coding agents in a loop.
A Docker-based system that runs Kilo Code CLI in continuous loops to build software from a Product Requirements Document (PRD). Inspired by Geoffrey Huntley's Ralph Loop and the BMAD Method.
You provide a PRD. The system runs specialized AI agents in a loop until the project is complete.
┌─────────────────────────────────────────────────────────────────┐
│ THE LOOP │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ WORKER │ │ JANITOR │ │ ARCHITECT │ │
│ │ (PROMPT) │ │ │ │ │ │
│ ├─────────────┤ ├─────────────┤ ├─────────────┤ │
│ │ Every tick │ │ Every 4 │ │ Every 8 │ │
│ │ │ │ ticks │ │ ticks │ │
│ ├─────────────┤ ├─────────────┤ ├─────────────┤ │
│ │ Implements │ │ Cleans up │ │ Reviews │ │
│ │ one task │ │ tech debt │ │ architecture│ │
│ │ from TODO │ │ and drift │ │ and planning│ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Key insight: Each agent runs with fresh context. Memory persists via git history and markdown files (TODO.md, ARCHITECTURE.md, LEARNINGS.md), not the LLM's context window. This prevents context pollution and allows indefinite operation.
We've tested this system extensively:
| Metric | Result |
|---|---|
| Longest stable run | 10+ hours without divergence |
| Tasks completed per hour | ~4-6 (depends on complexity) |
| Context pollution | None (fresh context each tick) |
| Human intervention required | Minimal (via comms/ system) |
The system successfully bootstrapped projects from just a PRD, created architecture docs, generated task lists, and implemented features—all autonomously.
- Docker (v20.10+)
- Docker Compose (v2.0+)
- Kilo Code account (for API access)
git clone https://github.com/your-repo/agent-coding-container.git
cd agent-coding-container
# Create workspace with your PRD
mkdir -p workspace
cp your-prd.md workspace/PRD.mdCopy your Kilo Code config:
cp -r ~/.kilocode .kilocode/docker compose upThat's it. The system will:
- Read your PRD
- Bootstrap the project structure
- Generate TODO.md with tasks
- Implement tasks one at a time
- Continue until
.donefile is created
The system uses three specialized prompts that run at different intervals:
| Agent | File | Frequency | Role |
|---|---|---|---|
| Worker | PROMPT.md |
Every tick | Implements one task from TODO.md |
| Janitor | JANITOR.md |
Every 4 ticks | Cleans up drift, prunes completed TODOs |
| Architect | ARCHITECT.md |
Every 8 ticks | Gap analysis, breaks down vague tasks |
This separation of concerns prevents any single agent from both planning and executing, which reduces drift and maintains focus.
- Fresh context each iteration — No context pollution from accumulated conversation
- Git as memory — All progress persists in files and commits
- Single task enforcement — Each session completes ONE task, then stops
- Specialized roles — Planning, execution, and cleanup are separate concerns
- Human-in-the-loop option — The
comms/system allows async communication
After bootstrap, your workspace will look like:
workspace/
├── PRD.md # Your requirements (input)
├── TODO.md # Task list (auto-generated, auto-maintained)
├── ARCHITECTURE.md # Key decisions (auto-generated)
├── LEARNINGS.md # Patterns discovered (auto-updated)
├── BLOCKERS.md # Issues preventing progress
├── .state.json # Loop state persistence
├── .done # Completion marker
├── comms/
│ ├── inbox/ # Human → Agent messages
│ ├── outbox/ # Agent → Human questions
│ └── archive/ # Processed messages
└── src/ # Your actual code
Default is 10 minutes (600 seconds). Adjust via command:
# 5-minute ticks
docker compose run --rm agent_coding_container node /home/automation/run.js 300
# 15-minute ticks
docker compose run --rm agent_coding_container node /home/automation/run.js 900Or via .env:
DELAY_SECONDS=300MOUNT_HOST_DIR=/path/to/your/project docker compose upThe agents can ask questions when blocked. Check workspace/comms/outbox/ for RFIs (Requests for Information).
To respond:
- Read the question in
comms/outbox/ - Create your response in
comms/inbox/ - The next iteration will pick it up
The loop runs until a .done file exists:
touch workspace/.doneThe system considers itself complete when:
- All TODO.md items are checked
- All tests pass
- The app builds successfully
Beyond the build loop, you can run specialized loops for code quality:
Uses BUGFIXER.md and BUGFIXER_BUGCHECK.md:
- Discovers bugs via static analysis, test failures, code smells
- Fixes one bug per session with regression tests
- Tracks progress in
BUGS.md
- Identifies untested code paths
- Adds tests systematically
- Targets 80%+ coverage
- Finds
anytypes and unsafe assertions - Adds proper typing incrementally
- Escalates tsconfig strictness
# docker-compose.multi.yml
services:
build_loop:
build: .
volumes:
- ./workspace:/home/workspace
command: ["node", "/home/automation/run.js", "600"]
bugfix_loop:
build: .
volumes:
- ./workspace:/home/workspace
command: ["node", "/home/automation/run-bugfix.js", "900"]Mount the prompts directory for live editing:
volumes:
- ./workspace:/home/workspace
- ./automation/prompts:/home/automation/promptsservices:
agent_coding_container:
deploy:
resources:
limits:
cpus: '2.0'
memory: 4Gdocker compose logs -f# View current state
cat workspace/.state.json
# Count completed tasks
grep -c "^\- \[x\]" workspace/TODO.md
# Count remaining tasks
grep -c "^\- \[ \]" workspace/TODO.mddocker exec -it agent_coding_container /bin/bash| Issue | Solution |
|---|---|
| Container exits immediately | Check logs: docker compose logs |
| Changes not persisting | Verify volume mount: docker compose config |
| Automation stuck | Check BLOCKERS.md and comms/outbox/ |
| Starting from iteration 1 | State file missing—check .state.json |
| Prompts not updating | Rebuild image: docker compose build |
| Approach | Planning | Execution | Memory | Human Involvement |
|---|---|---|---|---|
| This System | Architect agent | Worker agent | Git + files | Optional (comms/) |
| BMAD Method | Heavy upfront | In-session | Agent handoffs | High |
| Ralph Loop | Minimal | Fresh each loop | Git only | Low (AFK) |
| Claude Code | Interactive | Interactive | Session | High |
PRs welcome. Key areas:
- Additional specialized loops (performance, security, accessibility)
- Better progress reporting and dashboards
- Integration with issue trackers (GitHub Issues, Linear)
- Multi-repo orchestration
MIT
- Geoffrey Huntley for the Ralph Loop concept
- BMAD Method for multi-agent architecture patterns
- Kilo Code for the CLI that makes this possible