Productize is an engineered distribution and runtime system for building, shipping, operating, and growing AI products inside your coding harness.
Three playbooks: /productize-0-1 (raw idea → shipped product),
/productize-operate (production, incidents, error budgets),
/productize-grow (aha-moment → retention → PLG). Nine gates
that gate every move. One meta-runner. 238 routed skills underneath.
Five host adapters. One workflow runtime. MIT.
Your AI coding agent — Codex, Claude Code, Cursor, OpenCode, Factory,
or the next harness you adopt — is the harness. It writes the code,
runs the tools, edits the files. Productize is the product team
around the harness: thesis-review sharpens the bet,
product-review locks the spec, design-review kills the slop,
eng-review challenges the architecture, qa runs the evals,
release owns the ship gate, docs catches drift, dx-review
protects developer experience, and comms-review writes the narrative.
Without a harness, Productize is inert. Without Productize, the harness is undirected. It will write any code you ask, beautifully, for the wrong product.
Install Productize into the harness you use:
npm install -g @productize/agent-skills
productize --host autoOr install from source:
git clone --single-branch --depth 1 https://github.com/itseffi/productize.git productize
cd productize
./setup --host autoThen ask your harness to build a product:
/productize-0-1
Build a Twitter clone. Make it secure. Have real AI agents use it.
Productize routes the request, fires the right gate at every move, produces the artifacts, edits the code through the harness, runs the evals, applies the fixes, and tells you what to decide next.
You: /productize-0-1
Build a Twitter clone. Make it secure. Have real AI agents use it.
Agent: Before I write anything, I'm tightening the ask.
A Twitter clone is the implementation shape, not the product.
The product is an agent-native social arena: humans and AI
agents can post, reply, follow, report, coordinate, spam,
hit rate limits, and expose the trust and moderation
problems that only appear when autonomous users are active.
[Move 1 — Scope capability + curate data]
[thesis-review + product-review fire — rate the premise, find
three risky assumptions, ask one blocking question about the
wedge]
Beachhead: builders testing real agent behavior in social
products. Wedge: a small social graph where agent accounts are
first-class users and adversarial QA from day one. First
capability: human accounts, agent accounts, posts, replies,
follows, reports, rate limits, audit logs, seeded agents, and
evals that score whether agents can use the product without
breaking trust or safety.
You: Approve. Continue.
Agent: [Move 2a — Design] design-review fires on the feed, composer,
profile, notifications, reports, and
moderation queue.
[Move 2b — Architect] eng-review fires on auth, permissions,
agent identity, event logs, prompt
injection boundaries, content provenance,
rate limits, and abuse controls.
[Move 2c — Spec] product-review fires on acceptance criteria.
[Move 2d — Build] writes the first slice: accounts, agent
accounts, feed, posting, replies, follows,
reports, seeded agent scripts, and
browser-driven evals.
You: /productize-autoplan
Agent: [Runs thesis-review, product-review, design-review, eng-review,
qa, release, docs, dx-review, and comms-review. Auto-applies
safe decisions. Surfaces two product/security calls. You decide.
It proceeds.]
You: Move 3 — design the evals.
Agent: [Move 3 — qa + product-review fire. Eval users defined:
normal poster, reply farmer, spammer, brigader, reporter,
lurker, and moderator. Ship gates cover signup, posting,
following, reporting, moderation, rate limits, and abuse paths.]
[Deploy gate — release fires. One block: no kill switch for
seeded agents. Add it before running live evals.]
[Move 4 — qa runs the evals in a browser. One failure:
agent-generated text leaks unsafe instructions into notifications.]
[Move 5 — eng-review analyzes the trace and finds the boundary:
feed content is escaped, notification previews are not.]
[Move 6 — design-review + qa apply and verify the fix: sanitize
previews, add regression coverage, rerun the abuse-path evals.]
You: Ship.
Agent: [Move 7 — release + docs + comms-review fire together.
Launch checklist, security notes, moderation runbook, seeded
agent scenarios, public narrative, first-week operate loop,
next-lap learning plan.]
You said "build a Twitter clone." Productize turned it into the smallest real social platform where AI agents are users, adversaries, testers, and the first source of product evidence. One builder gets the thesis pressure, product spec, architecture review, security posture, QA, release, docs, and launch narrative of a full product team.
Productize is a process, not a catalog. One playbook drives the work from raw idea to shipped product:
Scope → Design → Architect → Spec → Build → Evals → Deploy → Run → Analyze → Fix → Ship → Learn
Then back to Scope with new evidence. Same moves, every lap. First lap is sparse data and wild evals. Hundredth lap is dense data and tight evals. Same shape.
| Move | Gate | What it does |
|---|---|---|
| 1. Scope capability + curate data | thesis-review + product-review |
Sharpen who, win, why now, why us. Kill the request that isn't the real job. Surface risky assumptions before code. |
| 2a. Design | design-review |
Hierarchy, friction, AI slop detection. Rate each dimension 0-10. Edit the plan to ship a 10. |
| 2b. Architect | eng-review |
Data flow, agent shape, edge cases, build risk. Force hidden assumptions into the open. |
| 2c. Spec | product-review |
Requirements, acceptance criteria, dependency risk, scope cuts. |
| 2d. Build | eng-review + qa |
The harness writes the code. Productize checks build risk, test shape, and verification coverage. |
| 3. Design evals | qa + product-review |
Golden examples, adversarial inputs, behavioral traces, ship gates — defined before you measure. |
| Deploy gate | release |
Check launch readiness, rollback path, kill switch, and release blockers before live evals or production exposure. |
| 4. Run evals | qa |
Real traffic. Pass/fail. Trace evidence. |
| 5. Analyze | eng-review |
Error clusters, root cause, prompt failure modes, UX friction. |
| 6. Apply fixes | design-review + qa |
Atomic fixes. Before/after evidence. Re-review until clean. |
| 7. Ship + Learn | release + docs + comms-review |
Launch readiness, kill switch, comms plan, board narrative, doc drift caught, next-lap learning plan. |
For developer-facing products, dx-review fires at Move 2 and Move 7:
onboarding, API ergonomics, TTHW, error messages.
Three playbooks. Nine gates. One meta-runner. 238 routed skills underneath.
| Layer | Entry points |
|---|---|
| Playbooks | /productize-0-1 (build loop), /productize-operate (production), /productize-grow (PMF → scale) |
| Gates | thesis-review, product-review, design-review, eng-review, qa, release, docs, dx-review, comms-review |
| Meta | /productize-autoplan — runs every relevant gate in parallel, auto-applies safe decisions, surfaces only taste calls |
| Routed skills | 238 skills called internally by the playbooks and gates |
You almost never call a routed skill directly. You enter at a playbook. The playbook walks the moves. The gates fire at decision points. The routed skills do the tactical work. You only see the decisions that matter.
Productize treats HTML as an artifact format, not as a separate playbook. Route by the product job first, then choose the format:
- use Markdown for short notes, repo-native docs, changelog fragments, and artifacts where clean diffs matter
- use self-contained HTML for long reviews, visual plans, diagrams, dashboards, PR explainers, shareable reports, and interactive tuning artifacts
HTML artifacts must be portable by default: one file, embedded CSS/JS, no remote dependencies unless requested, readable without a dev server, responsive, accessible, and organized for fast human review.
The Build With AI lifecycle also includes implementation-notes.
When a user asks the harness to implement a spec and maintain
implementation-notes.md or implementation-notes.html, Productize
records:
- design decisions where the spec was ambiguous
- intentional deviations from the spec and why
- tradeoffs considered and accepted
- open questions for user confirmation
- verification evidence and remaining gaps
This lets the harness make reasonable implementation decisions without hiding the interpretation work from the user.
| Need | Productize routes to |
|---|---|
| "I have an idea" | /productize-0-1 Move 1 — thesis-review + product-review, thesis framing, beachhead, risky assumptions |
| "Messy transcript → PRD" | /productize-0-1 Move 2c — product-review + JTBD synthesis, requirements, acceptance criteria |
| "Design the UX" | /productize-0-1 Move 2a — design-review + flows, hierarchy, edge cases |
| "Architect the agent" | /productize-0-1 Move 2b — eng-review + agent shape, retrieval, edge cases |
| "Evals before shipping" | /productize-0-1 Move 3 — qa + product-review, golden/adversarial examples, ship gates |
| "Production is broken" | /productize-operate — incident response, root cause, comms |
| "Activation is flat" | /productize-grow — aha-moment, PLG diagnostics, lifecycle triggers |
| "Board narrative" | comms-review + executive update, decision logic, tradeoffs |
| "Deal math" | Routed skills: DCF, WACC, CAPM, cap tables, VC modeling |
| "DX audit" | dx-review — onboarding, API ergonomics, error messages |
Every generated skill receives the shared Productize preamble. The preamble forces the harness to classify:
- persona — founder, product leader, AI PM, AI builder, stakeholder, or unknown
- product stage — idea, validation, PMF search, growth, scale, pivot, or unknown
- artifact mode — strategy memo, PRD, research plan, positioning, experiment, deck, roadmap, execution brief, diagnostic, decision record
- artifact format — Markdown for short/diff-sensitive work; HTML for long, visual, shareable, interactive, or explicitly requested artifacts
- evidence state — known facts, assumptions, missing inputs, risky leaps
- decision mode — recommend, ask for a blocking input, or proceed with explicit assumptions
Rules that matter:
- No generic strategy filler.
- Tie recommendations to evidence, stage, and decision pressure.
- Ask only for inputs that materially change the output.
- Produce the artifact, not a description of the method.
- End with a concrete owner, validation step, metric, or handoff.
Productize generates and installs skills for multiple harnesses:
./setup
./setup --host auto
./setup --host codex
./setup --host claude
./setup --host cursor
./setup --host opencode
./setup --host factory
./setup --host allSetup regenerates skills, validates outputs, installs or symlinks
host skills, creates runtime sidecars, checks dependencies, detects
installed harnesses for --host auto, saves prefix preference, uses
restrictive file permissions for local state, and falls back to copy
mode on Windows unless symlinks are explicitly allowed.
Prefix options:
./setup --prefix
./setup --no-prefixTeam mode:
./setup --team
bin/productize-team-init requiredUse optional instead of required when teammates should be nudged
but not blocked.
Productize ships local sidecars for routing, state, context, artifacts, completion, and upgrade workflows:
| Command | Purpose |
|---|---|
productize-workflow |
Start and complete durable product workflows with routing, context, session, artifact, and completion logs |
productize-skill-router |
Resolve a product request to the best skill or skill sequence |
productize-registry-search |
Search the generated skill registry |
productize-session-log |
Record workflow decisions |
productize-artifact-log |
Record produced artifacts |
productize-context-save |
Save durable context |
productize-context-restore |
Restore previous context before restarting work |
productize-config |
Read and write local or team preferences |
productize-update-check |
Check generated distribution freshness and update state |
productize-completion-status |
Log completion, blocked, deferred, or needs-review state |
productize-upgrade |
Upgrade installed Productize skills and runtime sidecars |
productize-team-init |
Bootstrap shared repo-local Productize instructions |
Canonical source stays separate from generated distribution output:
SKILL.md.tmpl root router template
SKILL.md generated root router skill
skills/ canonical one-level skill directories
skills/*/SKILL.md.tmpl skill template source
skills/*/SKILL.md generated canonical skill with shared preamble
skills/*/productize.json explicit metadata and routing fields
hosts/ host adapters (one per harness)
.agents/skills/ generated Codex skills
.claude/skills/ generated Claude skills
.cursor/skills/ generated Cursor skills
.opencode/skills/ generated OpenCode skills
.factory/skills/ generated Factory skills
plugins/ generated plugin bundles
registry/ generated skill, lifecycle, category, plugin, and site indexes
bin/ runtime utilities
test/ parser, generator, registry, setup, release, runtime, routing, and eval tests
Host adapters control frontmatter, description limits, generated output paths, metadata behavior, path rewrites, skipped skills, and runtime sidecars per harness.
npm run build
npm run skill:check
npm run test:freeThe build normalizes metadata, generates canonical and host skills,
audits skills, builds registries, builds plugin bundles, builds the
site index, and runs skill:check.
Validation checks:
- frontmatter and metadata
- lifecycle and category values
- reference paths
- generated host output freshness
- Codex description limits
- plugin manifests
- registry files
- sample router resolution
- metadata quality and artifact specificity
- generated distribution boundaries
Productize has separate CI lanes for freshness, evals, periodic evals, and version gates:
.github/workflows/ci.yml
.github/workflows/skill-docs.yml
.github/workflows/evals.yml
.github/workflows/evals-periodic.yml
.github/workflows/version-gate.yml
Useful commands:
npm run eval
npm run eval:router
npm run eval:e2e
npm run eval:llm:offline
npm run eval:report
npm run install:smoke
npm run release:check
npm run upgrade:checkOffline evals do not require paid model calls. External LLM evals are opt-in only when a command is configured by the runner.
Release metadata lives in:
VERSION
CHANGELOG.md
RELEASE.md
UPGRADE.md
Release commands:
npm run release:check
npm run release:dry-run
npm run release
productize-upgrade --host all --forcerelease:check is the non-mutating gate. release:dry-run must not
mutate files.
Each skill should have:
skills/<skill-name>/
SKILL.md.tmpl
SKILL.md
productize.json
references/ optional
examples/ optional
evals/ optional
The metadata must state:
- when to use the skill
- when not to use it
- the expected artifact
- framework fit, when relevant
- failure modes
- routing signals
- examples or eval cases
Run before proposing changes:
npm run build
npm run test:free
npm run eval