A tool-agnostic autonomous software-delivery system where agents implement, review, and merge, and only real decisions reach the human.
Read the design · The core loop · Three invariants · Glossary
This repo is a durable record of design and knowledge, not installable software. It describes an autonomous software-delivery system in which agents run implementation, review, and merge, only the genuine decisions are escalated to a human, and those decisions accumulate into a persistent Knowledge Graph that makes the next task smarter. The Orchestrator and its current Runtime (the concrete implementation is quarantined to a single doc, 10-runtime-mapping) are swappable parts above the line; the models, contracts, and decisions captured in these documents are the durable substrate below it.
The human's role is decision-maker. They do not write code. They settle only the real decisions that remain after the machine Gauntlet has filtered everything else, and each decision becomes knowledge so the same question never reaches them twice.
- Engineers designing autonomous or semi-autonomous delivery pipelines who need a tool-agnostic model rather than a vendor lock-in.
- Teams that want agents to ship code unattended while keeping a hard, forgery-proof boundary between machine approval and human decision.
- Architects interested in an Evergreen Decision Graph as a source of truth (SoT) that compounds over time, kept in plain markdown + git.
- Anyone evaluating how to let an Agent and harness / coding tool run safely behind a Merge Gate without ceding judgment to a single model's "done" signal.
Issue → implement (agent) → multi-model gauntlet (Claude review → other-model review → CI gate)
→ [safe lane] auto-merge / [real decision] escalate to human → decide → accrue as ADR
→ the next task's agent reads that ADR (compounding)
An issue is implemented by an Agent, then passes through a multi-model review Gauntlet (Claude review, then a different model's review, then the CI gate). On the safe lane it auto-merges; on a real decision it escalates to the human, who decides, and the decision is captured as an ADR. The Agent for the next task reads that ADR, which is the compounding effect.
- Identity separation: the party that produces an approval (a review verdict) and the party that merges are physically separated. Merge credentials cannot forge an approval.
- No guessing: when a real decision is required, the Agent stops and escalates to the human. It never proceeds on a guess.
- Durable + compounding: every decision becomes a grep-able node in markdown + git, and the next task reads it before acting.
| Document | Contents |
|---|---|
| 00-overview | Vision, goals, design principles, scope |
| 01-architecture | End-to-end flow, components, trust boundaries |
| 02-knowledge-graph | Evergreen Decision Graph: knowledge atoms, projection, maturity, visibility |
| 03-review-gauntlet | Multi-model review Gauntlet, forgery-proof verdicts, loop limits |
| 04-merge-gate | Ruleset, aggregated CI checks, risk classification, CODEOWNERS, tiers |
| 05-escalation | Trigger policy, Decision Card, channels, answer paths |
| 06-knowledge-loop | ADR lifecycle, Scribe, compounding metrics |
| 07-unattended-ops | External dead-man-switch, quota backoff, kill-switch, daily caps |
| 08-view-layer | Decision inbox, rendering, notifications |
| 09-multi-repo-docs | Repo registry / onboarding, evolving docs |
| 10-runtime-mapping | Abstract Runtime to current implementation (tool dependence lives only here), swap guide |
| 11-access-and-search | Access control, sharing, search, and human views over git as SoT (federation tiers) |
| 12-agents | Concrete agent roster: roles, models, triggers, and instruction prompts (build-ready) |
| 13-project-lifecycle | Kickoff, design-first phase, milestone checkpoints, deliverables, sub-issue decomposition |
| 14-reporting | Operator dashboard + client report (two audiences, one source) |
| 15-quality-gates | Pervasive machine-checked Definition-of-Done as hard blocking gates |
| 16-self-improvement | autodev improves autodev: observe -> propose -> gate -> measure |
| glossary | Glossary |
| roadmap | Improvement backlog |
| adr/ | Records this design's own decisions in its own ADR format (self-dogfooding) |
- Tools are swappable, knowledge is durable: markdown + git is the source of truth. No vendor store ever becomes canonical.
- completed != correct: a task-completion signal is not a quality signal. Quality gates (CI + independent review + human) are stacked explicitly.
- Headless first: under unattended operations there may be no per-tool plugin (MCP and the like). Every path must be reachable via git / REST / CLI.
- Visibility tiering: personal OSS and company-confidential split on a single frontmatter flag, and several chokepoints prevent leakage.
- Start small, one at a time: drive the single safest lane end to end to build trust, then expand.