autodev — autonomous development cycle for Claude Code

🌐 Language: English · 한국어

Tell Claude Code one line — "build me a markdown editor with live preview" — and this skill drives the whole project: research → plan → design → milestone-by-milestone implementation → tests → review → commit. You step in only at the very end, or when something genuinely needs you.

The skill includes proven patterns from two other open-source projects, so you don't need to install them separately:

From Superpowers by Jesse Vincent — write tests first, debug step by step, verify before committing, use git worktrees safely, request code reviews, brainstorm a design before coding.
From gstack by Garry Tan — QA test in a real browser, security audit (OWASP / STRIDE / LLM / skill-supply-chain), spot scope drift in pull requests, investigate root causes.

Everything is MIT-licensed; full attribution is in NOTICES.md.

Use it for

A new project — you have a one-line goal and want it autonomously split into milestones and built.
One milestone in an existing project — you've already defined M2, want it specced, implemented, tested, and committed without manual steps.

Skip it for

A quick one-line patch ("add a missing semicolon")
A question you just want answered without code changes

Install (no other dependencies)

git clone https://github.com/lwc0917/autodev ~/.claude/skills/autodev

That's it. In Claude Code:

/autodev project-init "your project goal in one line"
/autodev M2

How it works — the pipeline

When you say "build X for me", this is what runs end-to-end:

flowchart TD
    Start([You: 'Build X' or<br/>'Finish milestone M2']) --> Mode{Which mode?}

    Mode -->|"Mode 1: whole project"| P1
    Mode -->|"Mode 2: one milestone"| TP

    P1["1. Research<br/>Find similar projects,<br/>common patterns, pitfalls"] --> P2
    P2["2. Plan<br/>Split into milestones,<br/>each with its features"] --> P3
    P3["3. Design<br/>Pick tech stack +<br/>write decision records"] --> Loop
    Loop[Next milestone] --> TP
    TP["4a. Test plan<br/>(exact commands per feature)"] --> Spec

    Spec["4b. Break milestone<br/>into work items"] --> Impl
    Impl["4c. For each work item,<br/>one at a time:<br/>write code → check diff →<br/>run tests → commit"] --> AfterImpl{All units done?}

    AfterImpl -->|No: next unit| Impl
    AfterImpl -->|Yes| Review["4d. Two reviewers in parallel:<br/>architecture check +<br/>independent user-perspective check"]

    Review --> Verdict{Both pass?}
    Verdict -->|Yes: next milestone| Loop
    Verdict -->|Conflict / fail| Stop["⏸ Stop & ask you"]

    Loop -->|All milestones done| Final[5. Final report]
    Stop --> You([You decide:<br/>retry / adjust / abort])
    Final --> You

What each step does, in plain words

Research — Claude searches the web for projects similar to yours (e.g. "browser markdown editor live preview") and reports the closest matches with their tech stacks and known pitfalls. How many it returns is up to the agent based on what actually exists — a popular domain may yield 6-8 references, a novel one may yield just 1-2. Output: docs/autoplan/research.md.
Plan — Splits your goal into milestones (M0 is always "build setup + hello world", M1 onwards are features). The agent decides how many milestones make sense for your project — a tiny CLI tool might be 2 milestones, a full product might be 12. Each milestone gets a feature list at whatever granularity fits. (For reference: ~4-8 milestones × 3-7 features per milestone is a comfortable size for a single autonomous run, but it's guidance, not a cap.) Output: docs/autoplan/plan.md.
Design — Picks specific libraries (e.g. "Vite over Webpack because faster HMR for small projects") and writes each choice as an architecture decision record (so future milestones won't drift from the plan). Output: docs/autoplan/architecture.md and docs/autoplan/adr/.
Per milestone:
- Test plan — exact commands that prove each feature works (one of: pixel-by-pixel screenshot match, log-line check, smoke test that the program starts without crashing).
- Work units — milestone broken into chunks. The agent decides the chunk count — a simple milestone might be one unit, a complex one might be ten. Always one at a time, never in parallel — parallel coding agents tend to make conflicting assumptions (the well-known "Cognition Flappy Bird" failure case).
- Implement → diff check → test → commit for each chunk.
- Two reviewers then check the milestone independently: one against the architecture decisions, one purely from a user-visible-behavior angle. They run in parallel and must agree before moving on.
Final report at the end summarises every milestone + any items still needing your eye.

Visual checks — what's automatic vs. what asks you

The skill tries three automatic checks before bothering you, in this order:

Pixel-by-pixel screenshot match — when a "correct image" already exists in tests/golden/.
Log inspection — e.g. "the dev server printed 'preview ready' within 3 seconds".
Claude looks at the screenshot itself — for new features without a baseline image, Claude attaches the screenshot and judges in plain words whether it matches the feature description.

Only when all three fail or the feature is purely subjective (color choice, micro-animation timing) does the skill stop and put it on the ✋ visual queue for you to review at the end.

When the skill stops and asks you

You stay hands-off unless one of these fires:

Signal	What it means	What you do
✋ visual queue	A few features couldn't be fully auto-checked	Open each URL, click as described, confirm it looks right
Loop budget exceeded (3 tries on one milestone)	The skill couldn't get one milestone green after 3 attempts	Read the last error + diff, decide: try a different approach, simplify the milestone, or stop
Architecture drift detected	Implementation diverged from the design records	Either accept the proposed decision-record update, or rework that part
Independent reviewer says FAIL	The user-perspective reviewer found a silent failure or scope creep	Read the findings, fix, re-run that milestone
Two reviewers disagree (one PASS, one FAIL)	Usually means the independent reviewer caught something the architecture-focused one missed	Read both reports, decide manually
Diff over 5000 lines per work item	Likely scope creep	Confirm legitimate or split
`OPEN_QUESTIONS:` from an agent	A spec was ambiguous	Answer the question, the skill resumes

If none fire, you don't open the chat until the final report.

Files in your project after a run

your-project/
├── docs/autoplan/                # everything Mode 1 produces
│   ├── research.md
│   ├── plan.md
│   ├── architecture.md
│   ├── adr/                      # architecture decision records (one per choice)
│   ├── m0/, m1/, ...             # per-milestone folders
│   │   ├── feature-checklist.md  # frozen at milestone start
│   │   ├── test-plan.md
│   │   ├── spec.md
│   │   ├── exec-validation.md    # test results + health score
│   │   ├── arch-review.md
│   │   ├── external-review.md
│   │   └── state.json
│   └── final-report.md
├── tests/golden/                 # auto-saved screenshots awaiting your sign-off
├── CHANGELOG.md                  # updated per milestone
└── (commits, local-only — never auto-pushed)

Compatibility with Superpowers / gstack installed alongside

You can install all three together. autodev includes a safety check: just before it writes code in the implementation phase, it scans the installed skill list and warns you if any "parallel agent dispatch" skill (e.g. Superpowers' dispatching-parallel-agents) might auto-activate. That kind of skill conflicts with autodev's one-at-a-time coding policy (the policy that prevents conflicting assumptions between parallel agents — see Cognition's Don't Build Multi-Agents writeup). If detected, the skill asks you to disable that one skill for the implementation phase. Everything else from Superpowers / gstack composes cleanly.

Walkthrough — building a markdown editor

A condensed example of what you'd see in the chat when you say:

/autodev project-init "browser-based markdown editor with split-pane live preview"

[Bootstrap]    Goal captured. Project type: web frontend. Starting research.
[Research]    Found 6 similar tools (StackEdit, HackMD, Marked, ...). Common stack:
              Vite + textarea + marked.js + iframe preview. 4 pitfalls noted
              (XSS in unsanitised HTML, scroll-sync flicker, ...).
[Plan]        6 milestones, 23 features. Cycle-free.
[Design]      4 ADRs: Vite, marked.js, iframe sandbox, localStorage.
[M0 build]    test-plan ready (3 features). 1 work item. PASS. Reviewers: PASS+PASS.
[M1 editor]   test-plan ready (4 features). 4 work items. PASS. Reviewers: PASS+PASS.
[M2 render]   test-plan ready (4 features). 4 work items. WU-3 visual: golden image
              auto-created → ✋ added to queue. PASS. Reviewers: PASS+PASS.
[M3 sync]     ...
[M5 deploy]   PASS.

✋ visual queue (1):
  WU-3 preview pane (M2)
    URL:      http://localhost:5173
    Action:   type "# Hello world" in the left pane
    Expected: right pane shows "Hello world" rendered as <h1>

Final report saved to docs/autoplan/final-report.md.
Total: 45 minutes, 23/23 features green, 1 visual to confirm.

That's it for the user-visible part. Detailed phase docs are in references/.

Requirements

Claude Code (latest stable). Older versions use the deprecated Task tool name; autodev uses the current Agent tool but is backward-compatible.
Git — commits are local-only by default, never auto-pushed.
Bash + standard Unix tools — recipes assume Linux/macOS/WSL. Windows users should expect to substitute screenshot/display tools.
Optional: web access for the research phase (it falls back gracefully if blocked).

Limitations to know up front

Long autonomous runs (5+ milestones) are not bulletproof. Industry benchmarks (e.g. SWE-EVO) show ~25% success rate for multi-milestone autonomous coding. autodev's two-reviewer step at milestone boundaries is designed to catch drift, but very long projects may still need you to check in.
Coding agents at the implementation step are not allowed to run in parallel. This is by design (parallel coding agents tend to make incompatible assumptions). If you've seen videos of "5 agents in parallel build a feature in 10 minutes", that is the workflow autodev explicitly does not do.
Cost. Default model is opus for accuracy-critical steps (architecture, review). You can override per step if cost matters more than precision.
Subagents can't spawn further subagents (Claude Code platform limit). This skill must run in your main Claude Code session, not nested inside another agent's context.

Why absorb other projects' patterns instead of depending on them?

No second install. A user who installs only autodev gets the full set of patterns and checklists — no dependency chain to manage.
No runtime conflicts. By copying only the patterns that align with autodev's one-at-a-time coding policy, and explicitly excluding the parallel-dispatch ones, you avoid the conflicts described in safety rails anti-pattern E.
Version-pinned. The absorbed text is checked into this repo, so upstream changes don't silently alter behavior. Original sources retain attribution and are linked from NOTICES.md.

Contributing

PRs welcome, especially:

More recipes in references/execution-tests.md (per project type)
Windows / macOS guards for Linux-specific shell tricks
English translation of the body docs (currently Korean-first)
Real post-mortems of runaway autonomous loops — we'll add them to the anti-pattern catalogue

Please open an issue first for major changes (new phases, new agent roles).

License + attribution

MIT — see LICENSE.

Detailed change list: CHANGELOG.md
Third-party attribution (which patterns came from which project, into which file): NOTICES.md
Body documentation (currently Korean-first): SKILL.md, references/

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
references		references
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
NOTICES.md		NOTICES.md
README.ko.md		README.ko.md
README.md		README.md
SKILL.md		SKILL.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

autodev — autonomous development cycle for Claude Code

Use it for

Skip it for

Install (no other dependencies)

How it works — the pipeline

What each step does, in plain words

Visual checks — what's automatic vs. what asks you

When the skill stops and asks you

Files in your project after a run

Compatibility with Superpowers / gstack installed alongside

Walkthrough — building a markdown editor

Requirements

Limitations to know up front

Why absorb other projects' patterns instead of depending on them?

Contributing

License + attribution

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

autodev — autonomous development cycle for Claude Code

Use it for

Skip it for

Install (no other dependencies)

How it works — the pipeline

What each step does, in plain words

Visual checks — what's automatic vs. what asks you

When the skill stops and asks you

Files in your project after a run

Compatibility with Superpowers / gstack installed alongside

Walkthrough — building a markdown editor

Requirements

Limitations to know up front

Why absorb other projects' patterns instead of depending on them?

Contributing

License + attribution

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Packages