🌐 Language: English · 한국어
Tell Claude Code one line — "build me a markdown editor with live preview" — and this skill drives the whole project: research → plan → design → milestone-by-milestone implementation → tests → review → commit. You step in only at the very end, or when something genuinely needs you.
The skill includes proven patterns from two other open-source projects, so you don't need to install them separately:
- From Superpowers by Jesse Vincent — write tests first, debug step by step, verify before committing, use git worktrees safely, request code reviews, brainstorm a design before coding.
- From gstack by Garry Tan — QA test in a real browser, security audit (OWASP / STRIDE / LLM / skill-supply-chain), spot scope drift in pull requests, investigate root causes.
Everything is MIT-licensed; full attribution is in NOTICES.md.
- A new project — you have a one-line goal and want it autonomously split into milestones and built.
- One milestone in an existing project — you've already defined
M2, want it specced, implemented, tested, and committed without manual steps.
- A quick one-line patch ("add a missing semicolon")
- A question you just want answered without code changes
git clone https://github.com/lwc0917/autodev ~/.claude/skills/autodevThat's it. In Claude Code:
/autodev project-init "your project goal in one line"
/autodev M2
When you say "build X for me", this is what runs end-to-end:
flowchart TD
Start([You: 'Build X' or<br/>'Finish milestone M2']) --> Mode{Which mode?}
Mode -->|"Mode 1: whole project"| P1
Mode -->|"Mode 2: one milestone"| TP
P1["1. Research<br/>Find similar projects,<br/>common patterns, pitfalls"] --> P2
P2["2. Plan<br/>Split into milestones,<br/>each with its features"] --> P3
P3["3. Design<br/>Pick tech stack +<br/>write decision records"] --> Loop
Loop[Next milestone] --> TP
TP["4a. Test plan<br/>(exact commands per feature)"] --> Spec
Spec["4b. Break milestone<br/>into work items"] --> Impl
Impl["4c. For each work item,<br/>one at a time:<br/>write code → check diff →<br/>run tests → commit"] --> AfterImpl{All units done?}
AfterImpl -->|No: next unit| Impl
AfterImpl -->|Yes| Review["4d. Two reviewers in parallel:<br/>architecture check +<br/>independent user-perspective check"]
Review --> Verdict{Both pass?}
Verdict -->|Yes: next milestone| Loop
Verdict -->|Conflict / fail| Stop["⏸ Stop & ask you"]
Loop -->|All milestones done| Final[5. Final report]
Stop --> You([You decide:<br/>retry / adjust / abort])
Final --> You
- Research — Claude searches the web for projects similar to yours (e.g. "browser markdown editor live preview") and reports the closest matches with their tech stacks and known pitfalls. How many it returns is up to the agent based on what actually exists — a popular domain may yield 6-8 references, a novel one may yield just 1-2. Output:
docs/autoplan/research.md. - Plan — Splits your goal into milestones (
M0is always "build setup + hello world",M1onwards are features). The agent decides how many milestones make sense for your project — a tiny CLI tool might be 2 milestones, a full product might be 12. Each milestone gets a feature list at whatever granularity fits. (For reference: ~4-8 milestones × 3-7 features per milestone is a comfortable size for a single autonomous run, but it's guidance, not a cap.) Output:docs/autoplan/plan.md. - Design — Picks specific libraries (e.g. "Vite over Webpack because faster HMR for small projects") and writes each choice as an architecture decision record (so future milestones won't drift from the plan). Output:
docs/autoplan/architecture.mdanddocs/autoplan/adr/. - Per milestone:
- Test plan — exact commands that prove each feature works (one of: pixel-by-pixel screenshot match, log-line check, smoke test that the program starts without crashing).
- Work units — milestone broken into chunks. The agent decides the chunk count — a simple milestone might be one unit, a complex one might be ten. Always one at a time, never in parallel — parallel coding agents tend to make conflicting assumptions (the well-known "Cognition Flappy Bird" failure case).
- Implement → diff check → test → commit for each chunk.
- Two reviewers then check the milestone independently: one against the architecture decisions, one purely from a user-visible-behavior angle. They run in parallel and must agree before moving on.
- Final report at the end summarises every milestone + any items still needing your eye.
The skill tries three automatic checks before bothering you, in this order:
- Pixel-by-pixel screenshot match — when a "correct image" already exists in
tests/golden/. - Log inspection — e.g. "the dev server printed 'preview ready' within 3 seconds".
- Claude looks at the screenshot itself — for new features without a baseline image, Claude attaches the screenshot and judges in plain words whether it matches the feature description.
Only when all three fail or the feature is purely subjective (color choice, micro-animation timing) does the skill stop and put it on the ✋ visual queue for you to review at the end.
You stay hands-off unless one of these fires:
| Signal | What it means | What you do |
|---|---|---|
| ✋ visual queue | A few features couldn't be fully auto-checked | Open each URL, click as described, confirm it looks right |
| Loop budget exceeded (3 tries on one milestone) | The skill couldn't get one milestone green after 3 attempts | Read the last error + diff, decide: try a different approach, simplify the milestone, or stop |
| Architecture drift detected | Implementation diverged from the design records | Either accept the proposed decision-record update, or rework that part |
| Independent reviewer says FAIL | The user-perspective reviewer found a silent failure or scope creep | Read the findings, fix, re-run that milestone |
| Two reviewers disagree (one PASS, one FAIL) | Usually means the independent reviewer caught something the architecture-focused one missed | Read both reports, decide manually |
| Diff over 5000 lines per work item | Likely scope creep | Confirm legitimate or split |
OPEN_QUESTIONS: from an agent |
A spec was ambiguous | Answer the question, the skill resumes |
If none fire, you don't open the chat until the final report.
your-project/
├── docs/autoplan/ # everything Mode 1 produces
│ ├── research.md
│ ├── plan.md
│ ├── architecture.md
│ ├── adr/ # architecture decision records (one per choice)
│ ├── m0/, m1/, ... # per-milestone folders
│ │ ├── feature-checklist.md # frozen at milestone start
│ │ ├── test-plan.md
│ │ ├── spec.md
│ │ ├── exec-validation.md # test results + health score
│ │ ├── arch-review.md
│ │ ├── external-review.md
│ │ └── state.json
│ └── final-report.md
├── tests/golden/ # auto-saved screenshots awaiting your sign-off
├── CHANGELOG.md # updated per milestone
└── (commits, local-only — never auto-pushed)
You can install all three together. autodev includes a safety check: just before it writes code in the implementation phase, it scans the installed skill list and warns you if any "parallel agent dispatch" skill (e.g. Superpowers' dispatching-parallel-agents) might auto-activate. That kind of skill conflicts with autodev's one-at-a-time coding policy (the policy that prevents conflicting assumptions between parallel agents — see Cognition's Don't Build Multi-Agents writeup). If detected, the skill asks you to disable that one skill for the implementation phase. Everything else from Superpowers / gstack composes cleanly.
A condensed example of what you'd see in the chat when you say:
/autodev project-init "browser-based markdown editor with split-pane live preview"
[Bootstrap] Goal captured. Project type: web frontend. Starting research.
[Research] Found 6 similar tools (StackEdit, HackMD, Marked, ...). Common stack:
Vite + textarea + marked.js + iframe preview. 4 pitfalls noted
(XSS in unsanitised HTML, scroll-sync flicker, ...).
[Plan] 6 milestones, 23 features. Cycle-free.
[Design] 4 ADRs: Vite, marked.js, iframe sandbox, localStorage.
[M0 build] test-plan ready (3 features). 1 work item. PASS. Reviewers: PASS+PASS.
[M1 editor] test-plan ready (4 features). 4 work items. PASS. Reviewers: PASS+PASS.
[M2 render] test-plan ready (4 features). 4 work items. WU-3 visual: golden image
auto-created → ✋ added to queue. PASS. Reviewers: PASS+PASS.
[M3 sync] ...
[M5 deploy] PASS.
✋ visual queue (1):
WU-3 preview pane (M2)
URL: http://localhost:5173
Action: type "# Hello world" in the left pane
Expected: right pane shows "Hello world" rendered as <h1>
Final report saved to docs/autoplan/final-report.md.
Total: 45 minutes, 23/23 features green, 1 visual to confirm.
That's it for the user-visible part. Detailed phase docs are in references/.
- Claude Code (latest stable). Older versions use the deprecated
Tasktool name; autodev uses the currentAgenttool but is backward-compatible. - Git — commits are local-only by default, never auto-pushed.
- Bash + standard Unix tools — recipes assume Linux/macOS/WSL. Windows users should expect to substitute screenshot/display tools.
- Optional: web access for the research phase (it falls back gracefully if blocked).
- Long autonomous runs (5+ milestones) are not bulletproof. Industry benchmarks (e.g. SWE-EVO) show ~25% success rate for multi-milestone autonomous coding. autodev's two-reviewer step at milestone boundaries is designed to catch drift, but very long projects may still need you to check in.
- Coding agents at the implementation step are not allowed to run in parallel. This is by design (parallel coding agents tend to make incompatible assumptions). If you've seen videos of "5 agents in parallel build a feature in 10 minutes", that is the workflow autodev explicitly does not do.
- Cost. Default model is
opusfor accuracy-critical steps (architecture, review). You can override per step if cost matters more than precision. - Subagents can't spawn further subagents (Claude Code platform limit). This skill must run in your main Claude Code session, not nested inside another agent's context.
- No second install. A user who installs only
autodevgets the full set of patterns and checklists — no dependency chain to manage. - No runtime conflicts. By copying only the patterns that align with autodev's one-at-a-time coding policy, and explicitly excluding the parallel-dispatch ones, you avoid the conflicts described in safety rails anti-pattern E.
- Version-pinned. The absorbed text is checked into this repo, so upstream changes don't silently alter behavior. Original sources retain attribution and are linked from NOTICES.md.
PRs welcome, especially:
- More recipes in
references/execution-tests.md(per project type) - Windows / macOS guards for Linux-specific shell tricks
- English translation of the body docs (currently Korean-first)
- Real post-mortems of runaway autonomous loops — we'll add them to the anti-pattern catalogue
Please open an issue first for major changes (new phases, new agent roles).
MIT — see LICENSE.
- Detailed change list: CHANGELOG.md
- Third-party attribution (which patterns came from which project, into which file): NOTICES.md
- Body documentation (currently Korean-first): SKILL.md, references/