feat: explicit skill declaration for .prose.md programs#59
Closed
rawwerks wants to merge 31 commits into
Closed
Conversation
…aration Adds docs/superpowers/plans/2026-05-03-skills-section.md describing how to add a skills: frontmatter key and ### Skills section so .prose.md authors can deterministically preload required agent skills, with Forme preflight failing closed when a declared skill is not installed in the user's harness search paths. Design agreed with user: - Colon-form names (document-skills:pdf), matches plugin marketplace - Levenshtein fuzzy fallback for v1; LLM-fuzzy is a follow-up RFC - BYO harness is sacred: OpenProse never installs or edits user skills - Service-level skills: is additive, not exclusive - Resolved canonical name pinned into IR for reproducibility Plan is execution-ready; subagent dispatch awaits user go-ahead.
Wires the skill resolver into preflightPath so a `.prose.md` author can
declare `skills:` and have preflight fail closed if any are not
installed on the user's machine. Skill resolution mutates each
SkillRefIR in place (canonical_name + resolution + fuzzy_distance) so
subsequent runs of the IR are reproducible across machines. Fuzzy hits
emit an `info` diagnostic nudging the author to pin the canonical name.
Adds a `--skill-search-path` CLI flag (repeatable, comma-/colon-
separated) on `prose preflight` that overrides the default
./skills + ~/.claude/skills + ~/.codex/skills lookup. Tests use it to
point at a tmpdir fixture and never touch the real harness — BYO
harness invariant intact.
Adapted from the plan's pseudocode (`runPreflight(file, opts)` returning
`{ok, diagnostics}`) to the actual exported entrypoint
`preflightPath(file, opts)` returning `{status, diagnostics}`. Fixtures
use `kind: program` instead of `kind: system` so the unrelated
`preflight_not_program` diagnostic does not interfere with assertions
about skill diagnostics; skill iteration walks every component in the
target file regardless, so `kind: system` would still be checked at
runtime — the test just keeps the assertion focused on skill behavior.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…stub Adds the on-disk fixture pair (with-pdf.prose.md + a stub document-skills:pdf SKILL.md) and an end-to-end test that runs the real preflight pipeline against them. Both pass scenarios (skill installed) and fail scenarios (empty search path) are covered. The fixture program declares `kind: program` rather than the plan's `kind: system` so preflightPath has a `main` to anchor on; skill checking iterates all components in the target file regardless, so the declared semantics still hold. Test entrypoint adapted to `preflightPath` (the real exported async function) and `result.status` (the real shape) instead of the plan's hypothetical `runPreflight` / `result.ok`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Walks through happy-path, missing-skill, fuzzy-match, service-scope, docs-as-newcomer, and BYO-invariant scenarios against the feat/skills-section branch. Surfaces one correctness bug (silent fuzzy mis-resolution of short typos like pfd -> claude-skills:xf) and several adoption-blocking rough edges (skills invisible in PASS text output, default search path layout mismatch with real Claude Code installs, docs example uses kind:system but preflight requires kind:program, canonical-name pinning is in-memory only). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Synthetic-user dogfood found that resolveSkill("pfd", ...) silently bound
to claude-skills:xf (a tweet-search skill) when pfd was a typo for pdf and
only xf was installed two-deep. The old threshold formula
Math.max(2, floor(len/3)) lets every 3-char declared name match anything
within distance 2, and the clear-winner check only required second.d to be
strictly greater than best.d. Both failures combined to silently activate
a wholly unrelated skill with exit 0.
Tighten the fuzzy guardrails:
- Short declared names (length <= 4) require best.d <= 1. A 3-char name
with distance 2 can match almost anything; refuse to bind silently.
- Longer names cap distance at min(2, floor(len/3)).
- Require either a margin of victory >= 2 or a shared >=2-char common
prefix/suffix between declared and best.leaf. A bare-name match that is
only 1 edit better than an equally-plausible competitor is too risky.
Tests:
- The original synthetic-user typo (pfd vs only xf installed) now
resolves to unresolved with candidates.
- pdf -> document-skills:pdf still fuzzy when only pdf is installed.
- Longer typo (spreadshet) still fuzzy-resolves to spreadsheet when the
only competitor is far away.
- All four pre-existing resolver tests stay green.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The doc snippets used `kind: system` which made `prose preflight` emit `preflight_not_program` for any user who copy-pasted them verbatim. Switch the examples to `kind: program` and add a minimal `### Ensures` section so they are real, complete, runnable contracts. Adds `test/skills-doc-examples.test.ts` which extracts every prose-shaped fenced block from the docs and runs it through compileSource and preflightPath to prevent this regression.
Synthetic-user dogfood found that resolveSkill only enumerated the two-level <root>/<namespace>/<name>/SKILL.md layout, but stock Claude Code installs are flat one-level: ~/.claude/skills/<name>/SKILL.md. This meant most users would see skill_unresolved for skills they actually had installed — a real adoption blocker. Teach enumerateInstalledSkills to discover both layouts: - One-level: <root>/<name>/SKILL.md, canonical = "<name>", no namespace. - Two-level: <root>/<namespace>/<name>/SKILL.md, canonical = "<ns>:<name>". Both layouts can coexist under the same root. Adjust resolveSkill so: - A bare declared name (e.g. `pdf`) prefers an exact one-level install before falling back to fuzzy leaf-match — `pdf` -> canonical `pdf` is resolution: "exact", not fuzzy. - A bare declared name with only a two-level install still fuzzy-matches to "namespace:name" (existing behavior preserved). - A colon-form declared name (`document-skills:pdf`) only matches a two-level install. A flat one-level install whose leaf happens to be `pdf` is NOT a match — the namespace was never declared by the install, so silently binding to it would be wrong. Tests cover all four corners: bare->one-level exact, colon-form->no one-level match, mixed-layout disambiguation, and the existing two-level-only fuzzy path stays green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When `prose preflight` passes the user previously had no way to confirm their declared skills were actually checked, what they resolved to, or whether the fuzzy-pin nudge fired — the text formatter dropped all info-severity diagnostics and never rendered any skills. The whole fuzzy-resolver UX was invisible outside of `--format json`. Adds a `skills: PreflightSkillCheck[]` field to `PreflightResult` and renders a "Skills:" section listing each declared name, its canonical resolution, and the resolution kind (exact/fuzzy/unresolved) with the component/service scope. Always renders info-severity diagnostics in a "Notices:" section so the pin-canonical nudge reaches the user. Section is omitted entirely when no skills are declared anywhere, to keep the noise floor low.
Adds an Unreleased/Added bullet describing the explicit `skills:` feature (frontmatter list, `### Skills` section, search path stack, fail-closed preflight, fuzzy nudge, BYO-harness invariant). Also drops a single-line comment above `ServiceIR.skills = []` in `parseServices` explaining that inline `## sub-services` carry their own `ComponentIR.skills` so the bare service-reference form intentionally has no skills of its own.
… runs The skills feature plan promised that resolved canonical names get pinned into the IR so subsequent runs of the same IR are reproducible across machines. Only `prose preflight` was actually running the resolver; the on-disk IR from `prose compile` had `canonical_name: ""` and `resolution: "unresolved"` even when the skill resolved cleanly. Anything downstream consuming the compiled IR (run, deployment, manifest snapshot) saw stale skills metadata. Fix: - Extract the per-component resolver pass into a shared `pinSkillsInComponents(components, searchPaths)` helper in `src/skills.ts`. Mutates each `SkillRefIR` in place (canonical_name, resolution, fuzzy_distance) and returns the diagnostics it produced. - Refactor `src/preflight.ts` to use the shared helper instead of its own private `checkSkills` copy. Behavior is unchanged. - In the `prose compile` CLI command, after compile, run the resolver against `--skill-search-path` (or `defaultSearchPaths(packageRoot)`) and merge the diagnostics into the IR. Unresolved skills become compile errors and exit non-zero — fail-loud per the design constraint. - Wire `--skill-search-path` into the compile command (the parser already accepted the flag for preflight; just plumb it through and document it in the help text). Test: - `test/skills-compile-pinning.test.ts` shells out to `bun bin/prose.ts compile` end to end, asserts the on-disk IR JSON has `canonical_name: "document-skills:pdf"` / `resolution: "exact"` for an exact match, and asserts that an unresolved declared skill produces a non-zero exit and surfaces the offending skill name in CLI output. - BYO-harness invariant intact: tests use `tmpdir()` fixtures only, the resolver is read-only.
# Conflicts: # src/preflight.ts
All five blockers from the prior synthetic-user pass behave correctly: fuzzy mis-resolution fail-closes (pfd no longer silently binds to xf), one-level Claude Code skill layout is discovered, doc examples preflight cleanly under kind: program, Skills section + fuzzy nudge surface in text output, and prose compile pins canonical names with non-zero exit on unresolved skills. BYO-harness invariant still holds: zero mtime changes across 3 preflights + 1 compile against tmp fixtures. GO for branch push. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a north-star example program that turns a prior investor letter PDF and
this quarter's operating notes into a polished .docx investor update. The
contract demonstrates the new skills: declaration on feat/skills-section in
the way it is meant to be used:
- System-level frontmatter declares document-skills:pdf, applying to every
sub-service (the program reads the prior PDF for citations everywhere).
- The investor-letter-formatter sub-service additively declares
document-skills:docx via inline frontmatter on its ## heading,
exercising the additive scope semantics described in
skills/open-prose/contract-markdown.md.
- prose preflight resolves both skills against examples/skills/ and renders a
Skills section listing each canonical name with its scope.
- prose compile pins canonical_name into the on-disk IR for cross-machine
reproducibility.
Stub skills live at examples/skills/document-skills/{pdf,docx}/ so the package
default search root (./skills/) resolves them without --skill-search-path,
keeping CI green without depending on the user's harness skills (BYO-harness
invariant).
Includes a test at test/quarterly-investor-update-example.test.ts covering
compilation (system + service skill scoping), preflight PASS against the
fixture stubs, and preflight FAIL closed against an empty search path. Updates
examples-tour.test.ts and the package-ir snapshot to reflect the new program,
and adds the example to examples/prose.package.json.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…-section Captures every design decision (naming, scope, fuzzy strategy, search paths, layout support, IR pinning, BYO invariant, fail-closed default, Forme placement, single-source-of-truth resolution, doc-example kind) with the trade-off taken for each. Documents the harness-engineering workflow used (plan, TDD, parallel-worktree subagent waves, two-pass dogfooding) and how the change fits the existing OpenProse repo conventions. Includes a 'Scope: out of scope' section making explicit that runtime activation (parent VM telling harness to load pinned skills, sub-agent briefing inheritance) is NOT wired by this PR — the verification half is done; the activation half is a clean follow-up.
The previous Declaring required skills section described how authors WRITE the skills: list and how preflight VERIFIES them, but the open-prose skill itself never told the AI-as-VM what to DO with declared skills at execution time. A controlled experiment (single program declaring document-skills:pdf, real PDF input, subagent told to act as the OpenProse VM) showed the agent read the program, saw the declaration, and then silently fell back to the Read tool's built-in PDF rendering — zero Skill tool invocations, contract violated, output correct only by accident. Add 'Activating declared skills at runtime' as the new load-bearing section. Tells the agent: when you embody a .prose.md, you ARE the VM; declared skills are runtime requirements, not metadata; activate first, then work; do not silently fall back to built-ins; pass canonical names into sub-agent briefings for child runs. The OpenProse SKILL.md is the compiler when the AI is the VM. New language features must be reflected here or they remain inert at runtime. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds HandoffSkill to SingleRunHandoff.component and a 'Required Skills'
section to renderSingleRunHandoffMarkdown. The section explicitly tells the
receiving harness to invoke Skill('open-prose-raw:open-prose') first, then
Skill(<each declared canonical>) before producing outputs, with a 'do not
fall back to built-in tools' warning. Programs without skills emit nothing
extra (no noise).
This is the runtime adapter that closes the loop for sub-agent dispatch:
SKILL.md teaches the contract for parent agents already in OpenProse
context; this brief enforces the same contract for delegate harnesses
that don't auto-load the skill. Empirical experiments on 2026-05-04
showed delegate subagents do NOT auto-activate open-prose from a .prose.md
path, so the brief must carry the activation directives explicitly.
TDD: test/skills-handoff.test.ts written first, asserted on the schema,
the rendered markdown content (open-prose + canonical-name Skill calls,
'do not fall back' warning), and the no-noise case for skill-less programs.
Watched all three fail, implemented minimally, watched them pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…tion
Captures four controlled experiments (agent-IDs in the document) showing
the progression from no-fix → docs-only fix → handoff-bridge fix. Run 4
is the load-bearing proof: a fresh general-purpose subagent given only
the unedited 'prose handoff' output invokes Skill('open-prose-raw:open-prose')
then Skill('document-skills:pdf') before doing any work, and uses the
skill's prescribed pdftotext tooling rather than Claude's built-in Read
PDF rendering — activation drove behavior change.
Updates the draft PR doc to reflect that runtime activation is now wired
end-to-end (was previously framed as 'out of scope'). Verification +
runtime-activation are both empirically proven.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Tight, copy-paste-ready PR description for openprose/prose maintainers. Distilled from the longer internal design log at docs/superpowers/drafts/2026-05-04-skills-section-pr-draft.md. Includes: summary, motivation, surface table, key design decisions, testing approach (TDD + empirical proof reference), one-shot 'try it locally' block, compatibility notes, branch-base note (RFC dependency), files-for-review pointer list, and out-of-scope follow-ups. raw@raw.works will review and decide where to publish.
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an explicit
skills:declaration to.prose.mdso authors can name the agent skills their programs require, plus the verification and activation machinery around it.prose preflightandprose compileresolve declared names against the user's installed harness skills and fail closed if anything is missing.prose handoffinjects the resolved canonical names asSkill('<name>')activation directives into the single-run brief, so a delegate harness deterministically loads the right skills before producing outputs.Programs that do not declare
skills:go through every code path unchanged.Why
An OpenProse program orchestrates sub-agents and currently trusts the harness's skill auto-router to pick the right skill at each step. Different model versions, different context windows, and different routers can pick differently — authors have no way to say "this program needs
document-skills:pdf; if it is not loaded, do not run." This PR adds that.What changed
### Skillssection +skills:frontmatter, colon form (document-skills:pdf)SkillRefIRonComponentIRandServiceIR; resolved canonical name + resolution kind pinned in placesrc/skills.ts: exact match, Levenshtein fuzzy fallback (hardened against silent typo bind), supports both one-level and two-level skill install layoutsprose preflightskill_unresolved(error) andskill_fuzzy_resolved(info), surfaces them in text outputprose compileresolution: "unresolved"prose handoff## Required Skillssection into the rendered brief, listing each canonical as aSkill('<name>')activation directive--skill-search-pathonpreflightandcompileskills/open-prose/SKILL.mdcontract-markdown.md,CHANGELOG.md, andexamples/north-star/quarterly-investor-update.prose.md(idiomatic demo with program-scope and service-scope skills)Key design decisions
/skilllistings. Bare names accepted as a convenience via the fuzzy fallback.~/.claude/skills/or~/.codex/skills/. The resolver verifies presence; users install missing skills with their normal workflow.pinSkillsInComponentsinsrc/skills.tsis the only place skills get mutated; both preflight and compile call it.SKILL.mdteaches the activation contract. For delegate dispatches (prose handoffto a fresh harness), the rendered brief carries the activation directives. Both paths are exercised in the empirical proof below.Testing
bun test— 395 pass / 1 skip / 0 fail. (The skip is a pre-existing network-flake test unrelated to this branch.)New test files, all TDD (failing test written first, watched red, then implemented):
test/skills-section.test.ts,skills-resolver.test.ts,skills-preflight.test.ts,skills-manifest.test.ts,skills-e2e.test.ts,skills-compile-pinning.test.ts,skills-handoff.test.tstest/skills-doc-examples.test.tsextracts every fenced example fromSKILL.mdandcontract-markdown.mdand asserts each parses + preflights cleanly — no copy-paste footguns in the docsEmpirical proof of runtime activation
Captured in
docs/superpowers/findings/2026-05-04-skills-runtime-empirical-proof.md. Four controlled experiments tracing the progression from no-fix to docs-only-fix to handoff-bridge fix. The load-bearing run: a freshgeneral-purposeClaude subagent given only the uneditedprose handoffbrief — no orchestrator priming — invokedSkill('open-prose-raw:open-prose')thenSkill('document-skills:pdf')before producing outputs, and used the skill's prescribedpdftotext -layoutrather than the harness's built-in PDF rendering. Activation drove behavior change, not just registration.Try it locally
Compatibility
skills:produce an emptyskillsarray on the IR and emit no extra output anywhere. No code path changes for them.fixtures/hosted-runtime/*.jsongolden snapshots regenerated to track the newskillsfield onComponentIR.--skill-search-pathflag replaces defaults when supplied so test fixtures can run against tmp dirs without touching real harness skills.Branch base note
This branch is based on
rfc/reactive-openprose. The compiler surface (src/markdown.ts,src/sections.ts,src/compiler.ts,src/preflight.ts,src/handoff.ts, etc.) only exists on the RFC branch. Please target the RFC branch when merging, or rebase onto whichever branch carries that surface in your flow.Files for review
If triaging where to look first:
src/skills.ts— single source of truth for resolutionsrc/handoff.ts—renderSingleRunHandoffMarkdownskills/open-prose/contract-markdown.md,skills/open-prose/SKILL.mddocs/superpowers/findings/2026-05-04-skills-runtime-empirical-proof.mdexamples/north-star/quarterly-investor-update.prose.mdFollow-ups (intentionally out of scope)
prose runcommand that wrapshandoff → dispatch → output-collectfor AI execution. Today the user runsprose handoffand feeds the brief to an agent themselves; a future command would automate that bridge using exactly the brief this PR generates.