Run an AI coding agent against a multi-task backlog overnight, with independent critic processes and per-task git rollback — using your existing ChatGPT Plus or Gemini subscription, no API keys, no Docker.
Supported CLIs: OpenAI Codex (default), Google Gemini
Main supported entrypoint: ./loop.sh init to create and approve a backlog, then ./loop.sh run to execute it. Legacy run.sh document-driven workflows are retained only for compatibility and are documented in docs/legacy_run.md.
Most autonomous coding tools fall into one of two camps:
- Single-shot or interactive (
aider, Codex CLI/goal, Cursor) — great for one task, but on long backlogs the context drifts and the same model "self-reviews" its own work inside one session. - Heavy autonomous frameworks (OpenHands, MetaGPT, Devin) — powerful, but require Docker, API keys, per-token billing, and a lot of setup.
LoopDex sits in a narrow gap between them:
- You have a backlog of tasks, not a single goal.
- You want the agent to run for hours, unattended, picking tasks one by one.
- You want review independence — a separate process that did not see the implementer's reasoning.
- You want per-task review gates: each task either passes review and commits, or LoopDex attempts a git-based rollback within the project.
- You want to pay nothing per token — your ChatGPT Plus or Gemini OAuth quota covers it.
If that matches you, LoopDex is built for that exact loop.
The Planner, Plan Critic, Implementer, and Impl Critic each run as a fresh process with its own context. The Critic never sees what the Planner or Implementer "thought" — it only sees the artifact (plan.md, the diff, the verify output).
In single-context loops (including Codex /goal), the model that wrote the code also reviews it, carrying its own justifications forward. Here, the reviewer comes in cold.
After the deterministic final decision gates pass, the change is committed. On FAIL, LoopDex attempts to roll back the project working tree to the last good state. After 5 consecutive failures the task is marked BLOCKED and the loop moves on. You can wake up to a clear list of what passed, what was skipped, and why.
When ChatGPT or Gemini rate-limits you, LoopDex rolls back any in-progress task, exits with code 2, and leaves state intact. Re-running the same command after the limit resets resumes from the next pending task. Failure count is not incremented.
TASK_PLANNING_FAILURE_PATTERNS.md is a living document of recurring planning failures observed across real runs — task scope too broad, missing verify commands, keyword-only validation, BLOCKED mishandling, and more. The Setup Agent and Setup Critic both consult this file when generating and validating backlogs, so the same mistake is less likely to repeat.
No OPENAI_API_KEY or ANTHROPIC_API_KEY. The loop drives the official codex or gemini CLI, which authenticates against your existing ChatGPT Plus subscription or Google account OAuth. A long overnight run costs you whatever you already pay monthly — not a token bill.
Two shell scripts, two Python helpers, six prompt files. You can read the entire orchestration in an afternoon and modify it in place.
Be honest with yourself. If any of these match, use something else:
- A single, well-scoped task. Use Codex
/goaloraider— one command beats a backlog. - You want interactive, conversational coding. Use
aideror Claude Code. - You don't have planning docs and don't want to write any. Use
gpt-engineeror describe-and-go tools. - You need a sandboxed VM with a web UI. Use OpenHands.
- Your account / employer pays per token via API. OpenHands, AutoGen, or any direct-API agent will give you more flexibility than wrapping a CLI.
LoopDex shines specifically when you have a multi-task backlog and want it ground through unattended with hard review gates.
LoopDex runs AI agents autonomously and executes code they generate without human approval at each step. By default, LOOP_RISK_MODE=unattended keeps permissive CLI flags so the loop can run unattended:
- The bundled
.claude/settings.jsonsets"defaultMode": "bypassPermissions". - Codex runs with
--dangerously-bypass-approvals-and-sandbox. - The default Gemini flags include
--yolo, which bypasses the CLI's sandbox/approval prompts. - The loop calls
git commitandgit rollbackon the target project automatically.
Set LOOP_RISK_MODE=safe to avoid LoopDex's built-in Codex and Gemini bypass flags where the wrapped CLI supports that. Safe mode is not a sandbox; the agent and verify commands still run with your shell permissions.
See docs/security.md for security boundaries and known limitations.
Because of this, you should:
- Run only against a dedicated project directory under version control. Never point it at your home directory, system folders, or a repo with uncommitted work you cannot afford to lose.
- Start run mode from a clean tree. Explicit
runrequires no uncommitted project changes before agent work begins, so rollback has a clear pre-task snapshot. - Run unattended work on a dedicated git branch. LoopDex does not switch branches by default; create and switch to the branch yourself before running it. Optionally set
LOOP_REQUIRE_BRANCH_PREFIXto enforce a branch prefix before explicitrunstarts. - Prefer an isolated environment (a fresh VM, container, or throwaway workspace) — at minimum a dedicated user account.
- Choose
LOOP_RISK_MODE=safeor review.claude/settings.jsonandLOOP_GEMINI_FLAGSif you do not want built-in bypass-permission flags. - Treat planning docs as untrusted input if they came from outside your team — prompt-injected docs can steer the agent.
The authors provide this software as-is, with no warranty. You are responsible for what the agent does on your machine.
loop.sh is the runtime orchestrator. It selects the next backlog task, writes the agent input files, runs the Planner, Plan Critic, Implementer, and Impl Critic as separate CLI processes, executes backlog verify commands, applies deterministic PASS/FAIL gates, and commits or rolls back the project at the task boundary.
The current architecture keeps the CLI boundary explicit: Codex and Gemini are adapters selected by --cli, with risk-mode flags and model settings normalized before each agent call. The intended extraction work is still tracked separately, so these adapter boundaries may still live in the shell runtime until that cleanup lands.
Evidence responsibilities are separated from decision responsibilities. Evidence helpers collect shell facts such as verify output, changed files, diff stats, out-of-scope paths, and retained evidence directories. Decision helpers use that evidence plus critic verdicts, backlog scope, state-file protection, transaction state, and commit results to decide PASS, FAIL, BLOCKED, rollback, or proposal-only outcomes. See docs/dogfood.md for repository self-upgrade workflow notes and docs/security.md for security boundaries.
Runs automatically when backlog.md doesn't exist.
Setup Agent — reads all planning docs and code, generates the full task list (backlog.md)
Setup Critic — validates quality (file paths, verify commands, dependency order)
Human review — enter y / e / n
Picks tasks from backlog.md one at a time.
1. Select next task from backlog (first with all dependencies satisfied)
2. Planner — breaks the task into steps and writes a plan
3. Plan Critic — independently reviews the plan
4. Implementer — writes code and runs the verify command
5. Impl Critic — independently reviews the implementation
6. Shell Report — shell accumulates the report automatically
Each stage runs as an independent process — no context from the previous stage carries over, which is what makes the critic stages meaningful.
PASS → final gates pass + commit + mark task complete + print progress FAIL → git rollback + failure count +1 (5 failures → BLOCKED)
| Situation | Behavior | Exit code |
|---|---|---|
| All backlog tasks done | Exits immediately 🎉 | 0 |
| N iterations exhausted, tasks remain | Prints progress + resume instructions | 0 |
| Only BLOCKED tasks remain | Prints environment guidance | 1 |
| Rate limit reached | Safe exit + auto rollback | 2 |
| Ctrl+C | State files auto-restored | 130 |
Completed backlog is a successful terminal state: when there is no pending work, LoopDex reports success and exits 0 even if no task ran in that invocation.
Ordinary FAIL is reserved for implementation, review, verify, scope-gate, or no-change failures that roll back work and increment Fail count. SCOPE_EXPAND and SPLIT_TASK are blocked policy outcomes, not ordinary failures. They create review proposals, block the active task, report the reason and action required, and leave Fail count unchanged.
Explicit run mode does not automatically modify semantic backlog fields for these outcomes. SCOPE_EXPAND blocks without incrementing Fail count; it records a proposal instead of editing Files, Depends, verify commands, completion criteria, description, or ordering constraints. SPLIT_TASK also blocks without incrementing Fail count; it records a split proposal instead of inserting child tasks or rewriting backlog structure. A human must review the proposal and intentionally edit or leave the backlog blocked before the task is retried.
- macOS / Linux / Windows (Git Bash)
- Node.js 18+
- Python 3.8+ (
pythoncommand — Microsoft Store launcher on Windows is not supported) - Depending on your CLI:
- codex: ChatGPT Plus or higher subscription
- gemini: Gemini CLI authenticated. Supports Google account OAuth or
GEMINI_API_KEY/ Vertex AI. API key and Vertex AI usage may incur charges.
npm install -g @openai/codex
codex login # opens browser for ChatGPT account loginLoopDex does not call the Gemini API directly — it runs the gemini CLI command.
Authentication method determines quota and cost:
- Google account OAuth login: uses Gemini CLI / Code Assist account quota
GEMINI_API_KEY: uses Google AI Studio / Gemini API key, may apply free tier limits or charges- Vertex AI: billed under your Google Cloud / Vertex AI project
Avoid unexpected charges by verifying which authentication method is active.
git clone https://github.com/minseo2222/LoopDex.git
cd LoopDex
chmod +x loop.sh # not needed on Windows Git BashPlace your requirements or design documents in the project folder. File names don't matter.
myproject/
SPEC.md ← Setup Agent reads these
REQUIREMENTS.md automatically
docs/design/*.md
See example_doc.md for a concrete example.
Start by generating and approving the backlog:
./loop.sh init --project <project> --cli codexThen run the loop against that backlog:
./loop.sh run --project <project> --iterations 5 --cli codex--cli is optional, defaults to codex. Options: codex | gemini.
# Codex
./loop.sh init --project /path/to/myproject --cli codex
./loop.sh run --project /path/to/myproject --iterations 5 --cli codex
# Gemini
./loop.sh init --project /path/to/myproject --cli gemini
./loop.sh run --project /path/to/myproject --iterations 5 --cli gemini
# Windows (Git Bash)
./loop.sh init --project "/c/Users/yourname/myproject" --cli gemini
./loop.sh run --project "/c/Users/yourname/myproject" --iterations 5 --cli geminiExplicit run requires an existing .loop-agent/backlog.md. If the backlog does not exist yet, run init first.
Explicit run is unattended and does not ask for human input while executing tasks.
At startup, run prints an informational, non-interactive safety summary before agent work begins. It reports the normalized project path, selected CLI, LOOP_RISK_MODE, clean-tree preflight status, branch-prefix requirement status, and backlog lint status. This banner is a visibility aid only; it is not a sandbox, permission prompt, or approval gate.
Explicit run mode requires a clean tree before agent calls begin. Commit or stash your own work first; this keeps user edits out of task rollback boundaries.
Dedicated branch example:
git checkout -b loop/my-work
LOOP_REQUIRE_BRANCH_PREFIX=loop/ ./loop.sh run --project <project> --iterations 5 --cli codexLoopDex does not switch branches by default. LOOP_REQUIRE_BRANCH_PREFIX is optional; when set for explicit run, startup fails unless the current git branch starts with the configured prefix.
Risk mode examples:
# Default: preserves built-in bypass flags for unattended runs
LOOP_RISK_MODE=unattended ./loop.sh run --project /path/to/myproject --iterations 5 --cli codex
# Avoids LoopDex's built-in bypass flags where the wrapped CLI supports that
LOOP_RISK_MODE=safe ./loop.sh run --project /path/to/myproject --iterations 5 --cli geminiThe legacy compatibility syntax is still supported, but explicit init and run are recommended:
./loop.sh <iterations> <project-folder> [cli]Legacy status: loop.sh init and loop.sh run are the supported entry points. run.sh and older prompt compatibility surfaces are retained for existing workflows and tests, but new usage should prefer the explicit commands above. See docs/legacy_run.md for the retained run.sh behavior. Keep security-sensitive usage aligned with docs/security.md; repository self-runs are covered by docs/dogfood.md.
Status and doctor commands:
./loop.sh status --project <project>
./loop.sh doctor --project <project>status reports the current run state from backlog and event data, including task progress and blocked work.
doctor checks the local environment, backlog lint, and git cleanliness before you start or resume a run.
See tests/README.md for E2E test notes. The fake CLI tests use local deterministic test doubles and do not require Codex or Gemini.
For using LoopDex to upgrade this repository itself, see the dogfood guide.
Run the happy-path E2E test:
bash tests/e2e_pass_fake_cli.shRun the archive compaction regression after changes to backlog archive compaction, cleanup, or lint behavior:
bash tests/e2e_archive_compaction_fake_cli.shRun the CI-equivalent local checks:
bash -n loop.sh run.sh && python -m py_compile backlog_manager.py progress_window.py && bash tests/e2e_pass_fake_cli.sh && bash tests/e2e_rate_limit_fake_cli.sh && bash tests/e2e_archive_compaction_fake_cli.shThe fake CLI is for deterministic testing only, not production use.
The benchmark runner exercises known PASS and FAIL scenarios and reports whether the loop made the correct final decision. A key metric is the false PASS rate: cases where LoopDex incorrectly accepts a bad implementation. Benchmark reporting derives metrics from machine-readable events, backlog state, and git commits, not from .loop-agent/progress_window.md.
The backlog is the single source of truth for what LoopDex builds. Four files are involved:
| File | Role |
|---|---|
backlog_guide.md |
Human-readable spec: backlog format, task sizing rules, quality checklist. Read this to understand or manually edit a backlog. |
agents/setup_agent.md |
Prompt that drives the Setup Agent. Reads your planning docs and generates .loop-agent/backlog.md automatically. |
agents/setup_critic.md |
Prompt that drives the Setup Critic. Validates the generated backlog and requests a retry if quality checks fail. |
TASK_PLANNING_FAILURE_PATTERNS.md |
Living document of recurring failure patterns. Both Setup Agent and Setup Critic reference this when generating and validating backlogs. |
Backlog tasks use [ ] for pending work, [x] for completed work, and [!] for blocked work that needs manual action. During archive compaction, completed task bodies move out of the active backlog, and active backlog.md should not retain empty repeated ## Tasks headings or other empty task headings after completed work is archived.
LoopDex retains the current markdown backlog format in .loop-agent/backlog.md. Each task must have a valid task ID and these fields: Size, Files, Description, Completion criteria, Depends, Fail count, and at least one verify: command in the completion criteria. Size must be Small or Medium, and Files must list non-empty relative project paths.
Check a backlog before running it with:
python backlog_manager.py lint .loop-agent/backlog.mdExplicit run mode performs this lint check before agent calls. If backlog lint fails, the run stops before Planner, Implementer, or Critic processes are started.
If you prefer to generate the backlog yourself instead of running the Setup Phase:
- Open your project folder in Claude Code
- Copy the contents of
backlog_guide.mdinto the conversation - Claude will analyze your planning docs and write
.loop-agent/backlog.md - Review and edit as needed, then run
./loop.sh run --project . --iterations Nto start looping
| Variable | Default | Purpose |
|---|---|---|
COMMIT_ON_PASS |
1 |
Auto-commit on PASS. Set to 0 to accumulate in working tree |
LOOP_RISK_MODE |
unattended |
unattended preserves built-in bypass flags; safe avoids those built-in bypass flags where supported |
LOOP_REQUIRE_BRANCH_PREFIX |
unset | Optional explicit run preflight requiring the current git branch to start with this prefix |
LOOP_EVIDENCE_KEEP_RUNS |
10 |
Keep the newest N loop evidence directories; set to 0 to disable evidence retention |
PROGRESS_SIZE_THRESHOLD |
524288 (512KB) |
Trim progress.txt when it exceeds this size |
PROGRESS_KEEP_ENTRIES |
50 |
Number of recent sections to keep after trim |
| Variable | Default | Purpose |
|---|---|---|
CODEX_MODEL |
gpt-5.5 |
Codex model to use (override to whatever ID your account currently has) |
CODEX_MODEL=gpt-5.4 ./loop.sh run --project . --iterations 5 --cli codex| Variable | Default | Purpose |
|---|---|---|
LOOP_GEMINI_MODEL |
gemini-3.1-pro-preview |
Model ID (override to a model your account has) |
LOOP_GEMINI_FLAGS |
--yolo in unattended mode, empty in safe mode |
Explicit Gemini flags; set this to override the risk-mode default |
LOOP_GEMINI_MODEL_FLAG |
--model |
Flag used to specify the model |
LOOP_GEMINI_USE_PROMPT_ARG |
0 |
Set to 1 to pass prompt via -p instead of stdin |
# Different model
LOOP_GEMINI_MODEL=gemini-2.5-pro ./loop.sh run --project . --iterations 5 --cli gemini
# For versions that don't support stdin
LOOP_GEMINI_USE_PROMPT_ARG=1 ./loop.sh run --project . --iterations 5 --cli geminiThe default model identifiers above are placeholders. Use whatever IDs your subscription currently exposes.
Task completion is decided by deterministic runtime gates, not by agent text alone. The final gate order is:
- Proposal verdicts (
SCOPE_EXPANDorSPLIT_TASK) create proposal files for review and block the task without applying backlog semantic changes. - Impl Critic must return critic PASS.
- Shell verify commands from the backlog must pass.
- Scope and state-file gates must pass.
- If
COMMIT_ON_PASS=1, the git commit must succeed. - The runtime updates lifecycle state only after the previous gates pass.
critic PASS is necessary but not sufficient. A task cannot complete on critic PASS alone; verify failure, out-of-scope changes, state-file modification, malformed or non-PASS verdicts, and commit failure all prevent PASS. FAIL paths roll back implementation changes as appropriate, preserve shell evidence, record the failure, and update lifecycle state.
When a ChatGPT or Gemini rate limit is detected:
- If in Implementer or Impl Critic stage, auto git rollback to the pre-task snapshot
- Failure count is not incremented
- Exits safely with code 2
- Resume with the same command after the limit resets
This is recovery from a provider quota interruption, not a task failure.
Each task may fail up to five consecutive implementation/review attempts. After the fifth consecutive failure, the runtime marks the task BLOCKED, records the reason, and moves on to the next runnable task. BLOCKED tasks are not retried automatically in the same run; update the backlog or task scope intentionally before trying them again.
The loop records in-progress lifecycle updates in .loop-agent/current_transaction.json. If a previous run exits during a state transition, the next startup checks that transaction file and either finishes or rolls back the pending state change before selecting another task. This protects loop-owned state transitions, but it is not a guarantee of perfect crash recovery for every possible OS, disk, or git failure.
Each run acquires .loop-agent/loop.lock before modifying loop state. A second run against the same project exits instead of racing the active run. Normal exits remove the lock; stale locks from dead processes are cleaned up on startup when the owning process is no longer running.
Rollback is per task and git-based. On implementation failure, verify failure, critic failure, out-of-scope changes, state-file modification, or rate-limit exit during implementation/review, LoopDex restores the project to the pre-task working tree state. Shell evidence and loop-owned reports under .loop-agent/ may be preserved for debugging, and files that were already committed by earlier passed tasks are not rolled back. Rollback depends on git state and the current working tree; it is not protection against commands that affect files outside the project, external services, credentials, system settings, or changes already committed in earlier passed tasks.
Safety failures are recorded in raw .loop-agent/progress.txt evidence. When .loop-agent/progress_window.md is available, it summarizes recent progress and failure context for agent prompts, but it is not the system source of truth.
If an agent modifies state files under .loop-agent/ (backlog, progress, etc.), they are automatically restored. Legitimate agent outputs (plan.md, impl_summary.md, etc.) are not affected.
After Implementer runs, LoopDex blocks changes to obvious secret-like paths such as .env, private key filenames, and paths containing private_key. This guard is path-based only: it does not scan file contents for credentials, does not catch every secret naming pattern, and does not make LoopDex a sandbox.
Backlog Verify commands in .loop-agent/backlog.md are authoritative. Planner output, Implementer summaries, and critic summaries cannot override, replace, or skip the backlog-defined verify command.
After implementation, the loop runs the verify command from the user's shell and records the shell evidence. A nonzero exit code or verify timeout is treated as verify failure and prevents PASS, even if the Impl Critic says PASS.
This verify policy is a PASS gate, not a complete security sandbox. Dangerous commands in trusted backlog verify fields can still run with the user's shell permissions, so backlog verify commands must be reviewed before running the loop.
Each loop writes an evidence directory under .loop-agent/ for shell-collected review data. This evidence includes git data such as changed files, diff stats, and out-of-scope files, and the evidence directory is preserved for inspection even when implementation changes are rolled back.
Evidence retention keeps the current loop evidence directory and the newest LOOP_EVIDENCE_KEEP_RUNS loop evidence directories. Older unprotected loop evidence is archived as .tar.gz files, or compacted to a summary file if archiving is unavailable. Evidence referenced by blocked tasks is preserved by default.
The scope gate uses git shell evidence as the authoritative source for actual changed files. The backlog Files field defines the approved write scope; out-of-scope changes prevent PASS. impl_summary.md is an agent report, not the source of truth for changed files.
Explicit run mode treats backlog semantics as immutable. A semantic mutation is any change that alters what a backlog task means, including task scope, Files, Depends, verify commands, completion criteria, description, or ordering constraints. Runtime lifecycle fields such as status, fail count, and completion state are loop-owned execution state; they may change as the loop runs, but they do not redefine the task.
Run mode preserves task meaning by refusing automatic edits to scope, Files, Depends, verify commands, or completion criteria. If implementation discovers that a task needs more files or a different dependency shape, the agent reports that need instead of changing the backlog. This keeps review boundaries stable: Files remains the approved write scope, and Depends remains the approved execution order.
Semantic backlog mutation is disabled by default, and mutation verdicts are proposal-only by default. SCOPE_EXPAND and SPLIT_TASK recommendations create standardized proposal files for review; they do not automatically expand scope, split tasks, insert dependencies, or apply backlog semantic changes by default. Deeper rules are documented in docs/design_invariants.md.
SCOPE_EXPAND is proposal-only in normal run mode. It blocks the active task and records the requested scope change for review, but it does not mutate semantic fields such as Files, Depends, verify commands, completion criteria, or description, and it does not increment Fail count. A human must intentionally update the backlog scope or dependencies before retrying the task.
SPLIT_TASK is also proposal-only in normal run mode. It is used when the active task is too broad or discovery shows that ordered child tasks are needed. The loop blocks the active task for human review, does not increment Fail count, and does not automatically split the task, insert child tasks, rewrite dependencies, or otherwise change backlog structure. A human must edit the backlog intentionally before retrying.
Future automatic scope expansion, task splitting, or dependency insertion remains disabled by default and is governed by docs/backlog_mutation_policy.md. The experimental flags LOOP_ALLOW_AUTO_SCOPE_EXPAND=1, LOOP_ALLOW_AUTO_TASK_SPLIT=1, and LOOP_ALLOW_AUTO_DEPENDENCY_INSERT=1 exist only for guarded experiments and are not recommended for normal users.
- Ctrl+C restores any in-progress changes automatically
- If SIGKILL or a system crash prevented cleanup, the next run cleans up leftover backups on startup
backlog.md updates use a temp file + os.replace pattern. Interruptions or disk-full conditions cannot corrupt the file.
╔══════════════════════════════════════╗
║ Setup Phase · generating backlog.md
╚══════════════════════════════════════╝
── Setup Agent ──
[Setup Agent] running... (cli: codex, model: gpt-5.5, reasoning: high)
✓ [Setup Agent] done
Approve (y/e/n): y
✓ backlog.md approved.
Progress: ░░░░░░░░░░░░░░░░░░░░ 0/12 Tasks (0%)
╔══════════════════════════════════════╗
║ Loop 1 / 5
╚══════════════════════════════════════╝
Current task: Task 1.1 — Define core models
── Phase 1 · Planner ──
Plan goal: Add GenreProfile schema
✓ Plan PASS
── Phase 3 · Implementer ──
Completed steps:
✓ Task 1: Add tropeEmphasis field to GenreProfile schema
✓ Implementation PASS
✓ backlog updated: Task 1.1 complete
Progress: ██░░░░░░░░░░░░░░░░░░ 1/12 Tasks (8%)
Blocked proposal outcomes are reported as blocked progress, not ordinary failures:
Task 3.2 blocked with SCOPE_EXPAND
Blocked reason: implementation requires docs/api_contract.md, which is outside the approved Files scope.
Evidence/proposal: .loop-agent/evidence/loop-4/scope_expand_proposal.md
Human action required: accept by editing backlog Files or dependencies, or reject by revising/leaving the task blocked before retry.
Fail count: unchanged at 0.
Task 5.1 blocked with SPLIT_TASK
Blocked reason: current task requires separate storage and UI changes that need ordered child tasks.
Evidence/proposal: .loop-agent/evidence/loop-7/split_task_proposal.md
Human action required: accept by replacing the backlog task with reviewed child tasks, or reject by revising/leaving the task blocked before retry.
Fail count: unchanged at 1.
The backlog and .loop-agent/events.jsonl are the durable inputs for status and final reports. Reports and status output should use backlog state and machine-readable events, not parse agent-facing Markdown.
.loop-agent/progress.txt is the raw per-loop progress and debug log. It is useful when investigating a failure, and it may be trimmed when it exceeds the configured size threshold.
.loop-agent/progress_window.md is bounded Markdown context for agents. It summarizes recent progress and failure context for prompts, but it is not the source of truth for status, reporting, or task state.
.loop-agent/events.jsonl is the machine-readable system event log for run lifecycle, task progress, blocked-task state, and report/status generation.
.loop-agent/current_transaction.json is recovery state for in-progress lifecycle updates. It lets startup finish or roll back a pending state transition before another task is selected.
Final report files, including .loop-agent/report.md and per-loop files under .loop-agent/reports/ when present, summarize completed work, failures, blocked tasks, verify results, and evidence pointers after a run.
Loop evidence is stored under .loop-agent/evidence/. Evidence retention keeps the current loop evidence directory and the newest LOOP_EVIDENCE_KEEP_RUNS loop evidence directories. Evidence referenced by blocked tasks is preserved by default. Older unprotected loop evidence may be archived as .tar.gz files or compacted to a summary file if archiving is unavailable. Retention is for recent debugging context, not permanent archival.
myproject/
.loop-agent/
backlog.md ← full task list + completion status
progress.txt ← raw per-loop progress/debug log (auto-trimmed)
progress_window.md ← bounded Markdown context window for agents; not system truth
report.md ← cumulative report
codex.log ← agent stderr log (for debugging)
events.jsonl ← machine-readable system event log
current_transaction.json ← recovery state for pending lifecycle updates
reports/ ← per-loop detail (when present)
codex loginexport GEMINI_API_KEY=<your_key>
# or
gemini # OAuth loginnpm install -g @openai/codex # codex
npm install -g @google/gemini-cli # geminiInstall Python 3.8+. On Windows, the Microsoft Store launcher (WindowsApps/python3) is not supported.
Likely a CLI version compatibility issue. Override flags with LOOP_GEMINI_* environment variables.
echo 'export PATH="$PATH:/c/Users/yourname/AppData/Roaming/npm"' >> ~/.bashrc
source ~/.bashrcIf the process exited with code 2, wait for the limit to reset and run the same command again. Already-completed tasks are skipped automatically.
./loop.sh run --project ./myproject --iterations 5 --cli codex # hits rate limit, exit 2
# (wait)
./loop.sh run --project ./myproject --iterations 5 --cli codex # resumes where it left offLoopDex/
loop.sh ← entry point
backlog_manager.py ← backlog.md parsing and atomic updates
progress_window.py ← progress.txt sliding window + truncation
backlog_guide.md ← backlog format spec and generation instructions
TASK_PLANNING_FAILURE_PATTERNS.md ← recurring failure patterns and how to avoid them
agents/
setup_agent.md ← reads planning docs, generates backlog.md
setup_critic.md ← validates backlog.md quality
planner.md ← breaks task into steps
plan_critic.md ← reviews the plan independently
implementer.md ← writes code and runs verify
impl_critic.md ← reviews implementation independently
Each loop triggers up to 5 Codex calls. The Plus plan has usage limits — many iterations may hit the cap. The agent exits safely on detection, and you can resume after the limit resets.
Watch usage and cost when using an API key. The free tier has per-minute request limits.
| LoopDex | Codex /goal |
aider | OpenHands | gpt-engineer | |
|---|---|---|---|---|---|
| Multi-task backlog from docs | ✅ | ❌ | ❌ | partial | ❌ |
| Independent-process critic | ✅ | ❌ | ❌ | ❌ | ❌ |
| Per-task git rollback | ✅ | ❌ | partial | ❌ | ❌ |
| Resume on rate limit | ✅ | ❌ | n/a | n/a | n/a |
| Multi-vendor (Codex + Gemini) | ✅ | ❌ | ✅ | ✅ | ❌ |
| Subscription-only billing | ✅ | ✅ | ❌ | ❌ | ❌ |
| Zero-Docker install | ✅ | ✅ | ✅ | ❌ | ✅ |
| Web UI / dashboard | ❌ | ❌ | ❌ | ✅ | ❌ |
| Interactive conversation | ❌ | partial | ✅ | ✅ | ❌ |
/goal, aider, OpenHands, and gpt-engineer are all excellent at what they do. LoopDex fills the specific slot of unattended multi-task batches with hard review gates on top of subscription CLIs.
- Third-party services. LoopDex invokes external CLIs (OpenAI Codex, Google Gemini, and optionally Anthropic Claude Code). Your planning documents, source code, and prompts are transmitted to those providers under their own privacy and usage policies. Do not feed sensitive personal data, regulated information, or third-party confidential material unless your use of those providers permits it.
- Terms of Service. You are responsible for ensuring that your use of LoopDex — including automated, looped invocation of paid CLIs — complies with the Terms of Service of OpenAI, Google, Anthropic, and any other upstream provider. The authors of LoopDex do not represent that any particular usage pattern is permitted by those providers, and accept no liability for account suspensions, billing, or other consequences arising from such use.
- Trademarks. "ChatGPT", "Codex", "GPT" are trademarks of OpenAI. "Gemini", "Vertex AI", "Google Cloud", "Google AI Studio" are trademarks of Google LLC. "Claude", "Claude Code" are trademarks of Anthropic, PBC. "Windows", "Microsoft Store" are trademarks of Microsoft Corporation. All other product names mentioned are property of their respective owners. Use here is nominative and does not imply endorsement or affiliation.
- Model names. Default model identifiers in this repository (e.g.
gpt-5.5,gemini-3.1-pro-preview) are examples and may not match the model IDs your account currently has access to. Override them via the documented environment variables. - Autonomous execution. As stated in the Safety notice above, LoopDex runs AI agents that generate and execute code without per-step human approval. The user is solely responsible for selecting an appropriate target directory and runtime environment.
- Input content. You retain ownership of any planning documents and source code you provide. By running LoopDex you confirm you have the right to share that content with the upstream LLM providers you have configured.
MIT — see LICENSE.
The software is provided "as is", without warranty of any kind, express or implied. See LICENSE for full terms.