craft-cli
A developer workflow toolkit for Claude Code
Install · Skills · How It Works · Examples · Report Bug
This is a personal project. I built craft-cli to codify my own development workflow — the way I think about scoping, designing, challenging, planning, reviewing, and shipping software. It reflects my beliefs about quality and how I want to work with AI. It may not fit your workflow exactly, and that's fine. Fork it, rip out what you don't need, add what you do. This is the beginning of a journey, not the destination.
craft-cli is a Claude Code plugin that turns Claude into a structured development partner. Instead of ad-hoc prompting, you get 13 skills that chain together, 2 specialized agents, and 5 automated quality gates — all working as a connected workflow.
The idea is simple: every phase of development — from "what should we build?" to "is this deployed and working?" — should have a structured approach that passes context forward.
You don't need to memorize commands. Describe a design problem and /think activates. Say "ship it" and /ship runs the full pipeline. Report a bug and /debug takes over. The skills auto-trigger based on what you're saying and doing.
Most Claude Code plugins give you isolated commands. You run one, get a result, and start from scratch on the next. Context is lost between steps.
craft-cli is different:
- Skills chain together.
/scopefeeds into/think, which feeds into/challenge, which feeds into/plan. Each skill reads what the previous one wrote. - Context is structured. Every skill writes YAML-frontmattered artifacts to
.craft/context/. Downstream skills parse these programmatically — not just as text blobs. - Knowledge persists across sessions. Postmortems, design decisions, and risk patterns are saved to
.craft/knowledge/and checked automatically when relevant. - Quality gates are automated. Five hooks run on every edit, every commit, every session start, and every session end. No opt-out.
- Auto-triggering works. Skills activate based on natural language cues. You don't need to remember which command to run.
| Other plugins | craft-cli | |
|---|---|---|
| Scope | Individual commands | Connected workflow (13 skills, 2 agents, 5 hooks) |
| Context | Lost between commands | Passed forward via structured .craft/context/ artifacts |
| Memory | None | .craft/knowledge/ persists insights across sessions |
| Triggering | Manual slash commands | Auto-triggers on natural language patterns |
| Chaining | None | Each skill recommends the logical next step |
| Quality | Trust the developer | Five automated gates that enforce standards |
| Reliability | Skills loaded on-demand (~53% invocation rate) | Three-layer injection: CLAUDE.md + SessionStart hook + meta-skill |
The reliability layer is worth highlighting. Vercel's research found that skills alone only get invoked ~53% of the time. craft-cli solves this by injecting workflow rules at three levels — your project's CLAUDE.md (always in context), a SessionStart hook (loaded every session), and the meta-skill as a detailed fallback.
curl -fsSL https://raw.githubusercontent.com/itsbariscan/craft-cli/main/install.sh | bashThis does three things:
- Installs the plugin —
claude plugin add itsbariscan/craft-cli - Appends workflow rules to your project's
CLAUDE.md— never overwrites existing content, uses HTML markers for clean updates - Adds
.craft/to.gitignore— session artifacts and knowledge entries shouldn't be in source control
The script is idempotent. Run it again and it skips steps that are already done.
If you prefer to do it step by step:
# Install the plugin
claude plugin add itsbariscan/craft-cli
# In your project root, run the setup script
cd your-project
bash ~/.claude/plugins/cache/craft-cli/install.shThe script appends a block wrapped in <!-- craft-cli:start --> and <!-- craft-cli:end --> markers. Your existing content is never touched. To remove craft-cli rules later, just delete everything between those markers.
The appended rules include:
- Auto-trigger conditions (11 natural language patterns → skill mappings)
- Skill chaining recommendations
- Context passing table
- Knowledge system configuration
- Verification requirements
claude plugin remove craft-cliThen remove the <!-- craft-cli:start --> ... <!-- craft-cli:end --> block from your CLAUDE.md if you added it.
craft-cli organizes development into a directed flow. Each skill knows what comes before it and what comes after.
/scope Define what you're building — and what you're not
│
/think Design with three gears: expand, hold, or reduce
│
/challenge Stress-test the design before committing
│
/plan Break the design into ordered, verifiable steps
│
implement Write code, step by step
│ ╰── /docs Fetch library docs when you hit an unfamiliar API
│ ╰── /debug When something breaks → test-writer agent
│
/review Two-pass code review with confidence scoring
│
/ship Build → test → eval → review → commit → PR
│
/qa Test the live deployment
You don't need to follow the full flow every time. Fixing a bug? Start at /debug. Quick change? Skip to /review. The workflow adapts to the size of the task.
Every skill writes a structured artifact to .craft/context/ with YAML frontmatter. The next skill reads it and makes decisions based on the structured data — not just free text.
# Example: .craft/context/challenge.md
---
skill: challenge
mode: invert
timestamp: 2025-01-15T10:30:00Z
verdict: proceed_with_mitigations
risk_count: { fatal: 0, high: 2, medium: 1, low: 3 }
mitigations:
- Add email prompt for OAuth users without email scope
- Set up production-scale test data before writing resolvers
---
## Pre-mortem: GraphQL Migration
Failure Path 1: The Long Tail ...When /plan reads this file, it sees verdict: proceed_with_mitigations and automatically adds each mitigation as a plan step. If the verdict were reconsider, it would warn you before planning.
| Skill | Writes | Read by | Key frontmatter fields |
|---|---|---|---|
/scope |
scope.md |
/think, /plan, /challenge |
suggested_gear, constraints |
/think |
design.md |
/challenge, /plan |
gear, problem, decisions |
/challenge |
challenge.md |
/plan |
verdict, mitigations, risk_count |
/plan |
plan.md |
implementation | steps, risks, out_of_scope |
/review |
review.md |
/ship |
status (clean/has_criticals) |
/eval |
eval.md |
/ship |
has_regressions, overall_pass_rate |
/debug |
postmortem.md |
/review, test-writer agent |
root_cause, affected_files |
/qa |
qa-report.md |
/debug |
health_score, lighthouse |
Some things are worth remembering across sessions. craft-cli persists insights to .craft/knowledge/:
/debugauto-saves postmortems after--postmortem/thinkauto-saves design decisions (ADRs)/challengeauto-saves risk patterns when Fatal risks are found/remembersaves anything manually
Before starting work, skills check knowledge automatically. /debug looks for past postmortems in the same files. /think checks for previous design decisions. /challenge looks for known risk patterns. This means Claude gets smarter about your project over time.
You don't need to type slash commands. Skills activate on natural language:
| You say... | craft-cli invokes |
|---|---|
| "I want to build...", "new feature:", unclear scope | /scope |
| "How should we approach...", choosing between options | /think |
| "What could go wrong?", "poke holes", high-conviction idea | /challenge |
| Design is agreed, needs implementation steps | /plan |
| "How does X work?" about a library | /docs |
| Error appears, tests fail, bug report | /debug |
| "Review this", "check the code" | /review |
| "Ship it", "create a PR", "push this" | /ship |
| URL to test, deployment completed | /qa |
| Prompt quality, LLM evaluation | /eval |
| "Remember this", "have we seen this before?" | /remember |
After each skill completes, it recommends the next step:
/scope → /think (with gear suggestion) or /plan (if trivial)
/think → /challenge (stress-test) or /plan (low-risk)
/challenge proceed → /plan with mitigations
/challenge reconsider → back to /think
/plan → start implementing step 1
/debug → test-writer agent → /review
/review clean → /ship
/review criticals → fix, then re-review
/ship → /qa <deployed-url>
/qa → /debug (if issues) or close the loop
The first step. Before designing anything, define what you're building, what constrains you, and what you're not building.
Three fields:
- Building — one sentence, no ambiguity
- Constraints — time, complexity, dependencies, tech stack
- Not Building — explicit exclusions that prevent scope creep
Based on your constraints, /scope recommends a /think gear: tight deadline suggests REDUCE, open exploration suggests EXPAND, spec'd out design suggests HOLD. For trivial tasks, it recommends skipping /think entirely and going straight to /plan.
/craft-cli:scope add user notifications to the app
/craft-cli:scope rebuild the auth system
Three-gear brainstorming for non-trivial decisions.
| Gear | When | What it does |
|---|---|---|
| EXPAND | New idea, early exploration | Push scope up. "What's the 10x version?" Generate options without evaluating. |
| HOLD | Spec'd out, needs stress-testing | Lock scope exactly as stated. Walk every edge case, every failure mode. |
| REDUCE | Overscoped, need focus | Cut to core value. Remove everything that isn't load-bearing. |
The Commitment Rule: Once a gear is picked, commit fully. No silent drift between gears. If the gear needs to change, call it out explicitly.
/craft-cli:think expand — what if we rebuilt auth from scratch?
/craft-cli:think reduce — we have 2 days, what ships?
/craft-cli:think hold — walk me through every failure mode
Example: Designing a notification system
You: "We need to add notifications."
Claude auto-invokes /think, recommends EXPAND.
/think expand:
"What if notifications were a platform — webhooks,
email digests, Slack, push, in-app, notification
center with filters?"
"What if users could create custom rules?"
You: "That's too much. We have 2 weeks."
Claude: "Drifting from EXPAND to REDUCE. Switch gears?"
You: "Yes, reduce."
/think reduce:
"Core value: users don't miss things that need attention."
"Minimum: bell icon, unread count, 3 event types,
mark as read. No email, no push, no filters."
→ Saves ADR + design.md + knowledge entry
"Stress-test with /challenge? Or jump to /plan?"
Stress-test ideas before committing. Three modes, three angles of attack.
| Mode | Approach |
|---|---|
| Advocate | List every assumption — stated and unstated. Attack each with the strongest counterargument. Produce risk map. |
| Invert | "It's 12 months from now. This failed completely. Why?" Work backwards through the causal chain. |
| Backcast | Write the success announcement first. Then: what must be true? Gaps between "must be true" and "is true today" are your risks. |
The Honesty Rule: No softening language. No false balance. Specific over general. Steel-man before attacking.
/craft-cli:challenge advocate our pricing model
/craft-cli:challenge invert the migration plan
/craft-cli:challenge backcast the v2 launch
Example: Pre-mortem on a migration
You: "Migrating REST to GraphQL. Should take 6 weeks."
Claude auto-invokes /challenge invert.
"It's 12 months from now. The migration failed. What happened?"
Failure Path 1: The Long Tail
"80% migrated in 3 weeks. The remaining 20% had complex
nested relationships — took 4 months. REST couldn't be
deprecated because mobile clients pinned to v2.1."
Early signal: velocity drops below 3 endpoints/day after week 2
Prevention: audit ALL endpoints by complexity before committing
Failure Path 2: The N+1 Apocalypse
"p99 went from 200ms to 3.2s. DataLoader wasn't implemented.
Dev dataset had 50 rows — nobody noticed."
Early signal: no load testing plan in sprint 1
Prevention: production-scale test data before writing resolvers
Verdict: proceed with mitigations
→ Saves challenge.md with structured frontmatter
"/plan with mitigations?"
Turn decisions into ordered, executable steps. Each step specifies: what changes, where, how, and how to verify it worked.
Reads upstream context automatically — scope.md, design.md, challenge.md. If /challenge found mitigations, they become plan steps. If the verdict was reconsider, you get a warning before planning.
/craft-cli:plan add user avatar upload with S3
/craft-cli:plan refactor the payment flow
Example: Planning from upstream context
/plan
Reads design.md: "In-app notifications. Bell icon, 3 events."
Reads challenge.md: "Risk: notification volume overwhelms DB.
Mitigation: batch writes, TTL on old notifications."
## Plan: In-app notification system
1. Create notifications table with TTL
Files: src/db/schema.ts, src/db/migrations/
Verify: pnpm db:migrate succeeds
2. Build notification service with batch writes
Files: src/services/notifications.ts
Verify: unit test with 1000 concurrent notifications
3. Add bell icon component
Files: src/components/NotificationBell.tsx
Verify: component renders with mock data
Risks: batch write timing may need tuning (from /challenge)
Out of scope: email, push, filters
"Start implementing step 1?"
Four-phase protocol. No shortcuts, no guessing.
reproduce → isolate → understand → fix
Each phase has a gate — you don't move forward until the current phase is complete. Root cause is stated in one sentence before any fix is proposed. After the fix, the test-writer agent automatically generates a regression test.
/craft-cli:debug
/craft-cli:debug --postmortem # save incident report to knowledge
Example: Debugging a race condition
You: "Users sometimes see stale data after saving."
Phase 1 — Reproduce:
"Open two tabs. Edit in tab 1, save. Tab 2 shows old data ~30%."
Phase 2 — Isolate:
Checks .craft/knowledge/ for past postmortems
Traces: save → API → DB → revalidate → fetch
"Revalidation returns before DB write commits."
Phase 3 — Understand:
Root cause: "revalidatePath() fires on API response, but
DB uses eventual consistency. Cache reads stale data."
Phase 4 — Fix:
Adds await to ensure write commits before revalidation
Verifies: 50 save-reload cycles, 0 stale reads
→ test-writer agent generates regression test
→ Postmortem saved to context + knowledge
"/review to check for similar patterns?"
Structured two-pass review with confidence scoring.
Pass 1 — Critical (blocks merge): RLS bypass, auth violations, unvalidated mutations, XSS, exposed secrets, SQL injection, race conditions, missing error handling at boundaries.
Pass 2 — Informational (noted): Missing states (error/loading/empty), N+1 queries, dead code, accessibility, performance.
For branches with 5+ changed files, dispatches the code-reviewer agent for parallel analysis. Only surfaces findings at confidence >= 75/100.
/craft-cli:review
Full pipeline from feature branch to PR. Stops on failure at any step.
preflight → sync → build → test → eval gate → review → commit → PR
Reads upstream context: if review.md has status: has_criticals, merge is blocked. If eval.md has has_regressions: true, merge is blocked.
/craft-cli:ship # full pipeline
/craft-cli:ship --dry-run # steps 1-6, no commit/push
/craft-cli:ship --resume # pick up where you left off
/craft-cli:ship --hotfix # skip eval gate for urgent fixes
Chrome DevTools-powered QA with health scoring (0-100) and regression tracking.
| Mode | What it does |
|---|---|
| Full | Lighthouse + console + network + interactive + responsive (375/768/1280px) + accessibility |
| Quick | Lighthouse + console + network errors only |
| Regression | Full test compared against saved baseline |
Every issue includes: screenshot, severity, category, description, repro steps, expected vs actual.
/craft-cli:qa https://myapp.vercel.app
/craft-cli:qa quick https://staging.myapp.com
/craft-cli:qa regression https://myapp.com
Eight modes for rigorous LLM evaluation.
| Mode | What it does |
|---|---|
audit |
Diagnose eval infrastructure across 6 areas |
analyze |
Error analysis on 20-50 traces |
judge <criterion> |
Design a binary Pass/Fail LLM judge |
validate |
Judge vs human labels (target TPR/TNR > 90%) |
run |
Execute judges with confidence intervals |
rag |
Evaluate RAG pipelines (retrieval vs generation) |
synthetic |
Generate dimension-based test data |
dashboard |
Summary of all eval runs with trends |
/craft-cli:eval audit
/craft-cli:eval judge factual_accuracy
/craft-cli:eval dashboard
Fetches fresh documentation via Context7 MCP. Three modes:
| Mode | Example |
|---|---|
| Quick | /craft-cli:docs next.js app router |
| Deep | /craft-cli:docs supabase rls policies |
| Compare | /craft-cli:docs drizzle vs prisma |
Auto-triggers when Claude detects unfamiliar API usage or deprecated patterns.
Save and retrieve insights across sessions.
/craft-cli:remember we chose Resend over SendGrid because of webhook reliability
/craft-cli:remember search auth bugs
/craft-cli:remember search GraphQL decisions
Five knowledge types: postmortem, decision, risk-pattern, convention, insight. Other skills auto-save entries — /remember is for manual saves and retrieval.
Not invoked directly. Baked into every skill.
The rule: No "done", "that should work", or "I've fixed it" without fresh evidence from the current session.
| Claim | Required evidence |
|---|---|
| "Tests pass" | Actual test runner output |
| "Build succeeds" | Actual build output |
| "Bug is fixed" | Reproduction steps now produce correct behavior |
| "No regressions" | Test suite output after changes |
Dispatched by /review for branches with 5+ changed files. Performs parallel analysis with confidence scoring (0-100). Only surfaces findings >= 75. Categories: auth/authz bypass, injection vectors, missing validation, race conditions, accessibility, performance.
Dispatched by /debug after a bug fix. Reads the postmortem, understands the root cause, detects your test framework (jest/vitest/pytest/mocha/cargo/go test), and writes a regression test that would fail if the fix were reverted. One test per bug. Matches your project's test style.
Five quality gates that run automatically. No opt-out.
| Hook | When | What it does |
|---|---|---|
| Pre-commit validator | Before git commit |
TypeScript type checks + lint must pass |
| Secret scanner | Before any file write/edit | Blocks API keys, tokens, passwords (14 patterns) |
| Post-edit test runner | After any file write/edit | Runs project test suite |
| Session context | Every session start | Loads git state + injects workflow rules |
| Completion check | Before session ends | Blocks if modified files have any types, console.log, or TS errors |
You: "We need to add team invitations to the app."
/scope → Building: team invitation system.
Constraints: 2 weeks, medium complexity, depends on existing auth.
Not building: bulk invite, role selection, link invites.
→ Suggests /think reduce (tight timeline)
/think reduce → Core: email invite with accept/decline.
One role (member). 7-day expiry. No bulk.
→ Saves design.md + knowledge entry
/challenge advocate → "Assumption: users have valid email.
Counter: 23% of accounts use OAuth without email scope."
Verdict: proceed with mitigation (add email prompt for OAuth)
→ Saves challenge.md
/plan → 5 steps, incorporating the email mitigation:
1. Add email field to OAuth onboarding
2. Create invitations table
3. Build invite API
4. Email sending via Resend
5. Accept/decline UI
implement → works through each step with verification
/review → 0 critical, 2 informational
/ship → PR created
/qa https://staging.app.com → health score: 94
You: "Users are getting 500 errors on the dashboard."
/debug →
Checks .craft/knowledge/ — finds past postmortem on dashboard queries
reproduce: GET /api/dashboard returns 500
isolate: new deployment 2 hours ago changed the query
understand: "GROUP BY references column removed in migration 047"
fix: restore column reference
→ test-writer generates regression test
→ postmortem saved to knowledge
/review → checks for similar query patterns
/ship --hotfix → PR created (skips eval gate)
/qa https://app.com/dashboard → health score: 98
/eval analyze → reads 40 traces, "hallucinated dates" in 35%
/eval judge date_accuracy → designs binary Pass/Fail judge
/eval validate → TPR 96%, TNR 89% → refine → TNR 93%
/eval run → pass rate 71% (CI: 65-77%)
/eval dashboard → trend across last 5 runs
You: "The submit button doesn't work on mobile."
/debug →
reproduce: tap submit on iPhone Safari, nothing happens
isolate: click handler uses mousedown, not touch events
understand: "Safari doesn't fire mousedown on tap"
fix: change to pointerdown event
→ test-writer generates touch event test
/review → clean
/ship → PR created
Start a new Claude Code session in your project directory. You'll see craft-cli load automatically (the SessionStart hook injects workflow rules).
Try this first:
You: "I want to build a dark mode toggle for the app."
craft-cli will auto-invoke /scope to help you define boundaries. Then it'll suggest /think to design the approach. Follow the chain — each skill tells you what's next.
No. The three you'll use most often:
/think— when you're designing something/debug— when something breaks/ship— when you're ready to merge
Everything else flows naturally from these. As you get comfortable, the other skills will click into place.
Skip whatever you want. The workflow is a suggestion, not a requirement. Working on a quick fix? Jump straight to the code, then /review and /ship. The skills are designed to be useful individually — chaining makes them more powerful, but each one stands alone.
A local directory where craft-cli stores session artifacts:
.craft/
├── context/ # Current session: scope.md, design.md, plan.md, etc.
└── knowledge/ # Cross-session: postmortems, decisions, risk patterns
It's in .gitignore — these files are personal working state, not source code.
| Dependency | Required for | Install |
|---|---|---|
| Claude Code | Everything | npm install -g @anthropic-ai/claude-code |
| Context7 MCP | /docs |
See Context7 repo |
| Chrome DevTools MCP | /qa |
See chrome-devtools-mcp repo |
| gh CLI | /ship PR creation |
brew install gh |
| Node.js | Hooks (type check, lint, tests) | nodejs.org |
All dependencies except Claude Code are optional. Skills that need them will tell you when they're missing.
craft-cli/
├── .claude-plugin/
│ └── plugin.json # Plugin manifest (v3.0.0)
├── CLAUDE.md # Workflow rules template
├── install.sh # Setup script
├── skills/
│ ├── using-craft-cli/SKILL.md # Meta-skill (orchestrator)
│ ├── scope/SKILL.md # Project scoping
│ ├── think/SKILL.md # Design thinking + ADRs
│ ├── challenge/SKILL.md # Adversarial: advocate/invert/backcast
│ ├── plan/SKILL.md # Implementation planning
│ ├── review/SKILL.md # Code review + confidence scoring
│ ├── qa/SKILL.md # Live URL testing
│ ├── ship/SKILL.md # Ship pipeline
│ ├── eval/SKILL.md # Evaluation engineering
│ ├── debug/SKILL.md # Debugging + postmortem
│ ├── docs/SKILL.md # Library docs via Context7
│ ├── remember/SKILL.md # Knowledge persistence
│ └── verification/SKILL.md # Completion gate
├── agents/
│ ├── code-reviewer.md # Parallel review agent
│ └── test-writer.md # Regression test generator
├── hooks/
│ ├── hooks.json # Hook configuration
│ └── scripts/
│ ├── pre-commit-validate.sh # Type check + lint gate
│ ├── secret-scanner.sh # API key blocker
│ ├── post-edit-test.sh # Auto test runner
│ ├── session-context.sh # Git context + rule injection
│ └── completion-check.sh # Quality gate on stop
├── LICENSE
└── README.md
This is the beginning of a journey. I built craft-cli to crystallize how I think about building software — the discipline of scoping before designing, designing before coding, challenging before committing, and verifying before claiming "done."
It's opinionated by design. It reflects my workflow, my standards, my beliefs about quality. Some of these opinions will resonate with you; others won't.
This is not a one-size-fits-all tool. It's a starting point.
Fork it. Rip out the skills that don't match your workflow. Add skills for your domain. Change the auto-trigger rules. Rewrite the hooks. Make it yours.
The plugin architecture makes this easy — each skill is a self-contained SKILL.md file. Add a folder, write a prompt, and it's a new skill. The context passing system means your new skills can read and write the same .craft/context/ artifacts.
If you build something interesting on top of this, I'd love to see it.
This is a personal project, but contributions are welcome. If you have ideas for new skills, better prompt engineering, or workflow improvements:
- Fork the repository
- Create your branch (
git checkout -b feature/amazing-skill) - Make your changes
- Open a PR
MIT — do whatever you want with it.
Quality is the strategy. Momentum is the method. Craft is the practice.