feat: explicit skill declaration for .prose.md programs by rawwerks · Pull Request #59 · openprose/prose

rawwerks · 2026-05-04T18:32:02Z

Summary

Adds an explicit skills: declaration to .prose.md so authors can name the agent skills their programs require, plus the verification and activation machinery around it. prose preflight and prose compile resolve declared names against the user's installed harness skills and fail closed if anything is missing. prose handoff injects the resolved canonical names as Skill('<name>') activation directives into the single-run brief, so a delegate harness deterministically loads the right skills before producing outputs.

Programs that do not declare skills: go through every code path unchanged.

Why

An OpenProse program orchestrates sub-agents and currently trusts the harness's skill auto-router to pick the right skill at each step. Different model versions, different context windows, and different routers can pick differently — authors have no way to say "this program needs document-skills:pdf; if it is not loaded, do not run." This PR adds that.

What changed

Area	Change
Contract markdown	New `### Skills` section + `skills:` frontmatter, colon form (`document-skills:pdf`)
IR	`SkillRefIR` on `ComponentIR` and `ServiceIR`; resolved canonical name + resolution kind pinned in place
Resolver	New `src/skills.ts`: exact match, Levenshtein fuzzy fallback (hardened against silent typo bind), supports both one-level and two-level skill install layouts
`prose preflight`	Walks declared skills, emits `skill_unresolved` (error) and `skill_fuzzy_resolved` (info), surfaces them in text output
`prose compile`	Same resolver pass; fails closed on unresolved so the on-disk IR never carries `resolution: "unresolved"`
`prose handoff`	Injects a `## Required Skills` section into the rendered brief, listing each canonical as a `Skill('<name>')` activation directive
CLI	`--skill-search-path` on `preflight` and `compile`
`skills/open-prose/SKILL.md`	New "Activating declared skills at runtime" section instructing the AI-as-VM to invoke the harness Skill tool before doing the work
Docs / examples	`contract-markdown.md`, `CHANGELOG.md`, and `examples/north-star/quarterly-investor-update.prose.md` (idiomatic demo with program-scope and service-scope skills)

Key design decisions

Colon form for skill names matches the plugin marketplace convention users already see in /skill listings. Bare names accepted as a convenience via the fuzzy fallback.
BYO harness is read-only. OpenProse never installs, edits, or removes anything in ~/.claude/skills/ or ~/.codex/skills/. The resolver verifies presence; users install missing skills with their normal workflow.
Service-level skills are additive to system-level, not an exclusive allowlist. A sub-service declaration unions with the program-level set.
Compile fails loud on unresolved skills. Preflight is more lenient. Compile must produce an IR with every canonical name pinned because that IR may ship to other machines for reruns.
Single source of truth for resolution. pinSkillsInComponents in src/skills.ts is the only place skills get mutated; both preflight and compile call it.
Runtime activation has two paths. For the parent VM (the user's own AI session that already has the OpenProse skill loaded), SKILL.md teaches the activation contract. For delegate dispatches (prose handoff to a fresh harness), the rendered brief carries the activation directives. Both paths are exercised in the empirical proof below.

Testing

bun test — 395 pass / 1 skip / 0 fail. (The skip is a pre-existing network-flake test unrelated to this branch.)

New test files, all TDD (failing test written first, watched red, then implemented):

test/skills-section.test.ts, skills-resolver.test.ts, skills-preflight.test.ts, skills-manifest.test.ts, skills-e2e.test.ts, skills-compile-pinning.test.ts, skills-handoff.test.ts
test/skills-doc-examples.test.ts extracts every fenced example from SKILL.md and contract-markdown.md and asserts each parses + preflights cleanly — no copy-paste footguns in the docs

Empirical proof of runtime activation

Captured in docs/superpowers/findings/2026-05-04-skills-runtime-empirical-proof.md. Four controlled experiments tracing the progression from no-fix to docs-only-fix to handoff-bridge fix. The load-bearing run: a fresh general-purpose Claude subagent given only the unedited prose handoff brief — no orchestrator priming — invoked Skill('open-prose-raw:open-prose') then Skill('document-skills:pdf') before producing outputs, and used the skill's prescribed pdftotext -layout rather than the harness's built-in PDF rendering. Activation drove behavior change, not just registration.

Try it locally

cat > /tmp/demo.prose.md <<'EOF'
---
name: demo
kind: program
skills:
  - document-skills:pdf
---

### Description
Extract a summary from a PDF.

### Requires
- `pdf_path`: the file to read

### Ensures
- `summary`: a markdown bullet list
EOF

# Verify against your installed skills (defaults to ./skills, ~/.claude/skills, ~/.codex/skills)
bun bin/prose.ts preflight /tmp/demo.prose.md

# Compile with canonical name pinning
bun bin/prose.ts compile /tmp/demo.prose.md --out /tmp/demo.ir.json
jq '.components[0].skills' /tmp/demo.ir.json

# Generate the runtime brief — the Required Skills section is the activation contract
bun bin/prose.ts handoff /tmp/demo.prose.md --input pdf_path=/path/to/file.pdf

Compatibility

Programs that do not declare skills: produce an empty skills array on the IR and emit no extra output anywhere. No code path changes for them.
Two fixtures/hosted-runtime/*.json golden snapshots regenerated to track the new skills field on ComponentIR.
No public API removed. The --skill-search-path flag replaces defaults when supplied so test fixtures can run against tmp dirs without touching real harness skills.

Branch base note

This branch is based on rfc/reactive-openprose. The compiler surface (src/markdown.ts, src/sections.ts, src/compiler.ts, src/preflight.ts, src/handoff.ts, etc.) only exists on the RFC branch. Please target the RFC branch when merging, or rebase onto whichever branch carries that surface in your flow.

Files for review

If triaging where to look first:

Resolver: src/skills.ts — single source of truth for resolution
Brief generator (the runtime adapter): src/handoff.ts — renderSingleRunHandoffMarkdown
Spec: skills/open-prose/contract-markdown.md, skills/open-prose/SKILL.md
Empirical proof: docs/superpowers/findings/2026-05-04-skills-runtime-empirical-proof.md
Demo program: examples/north-star/quarterly-investor-update.prose.md

Follow-ups (intentionally out of scope)

A prose run command that wraps handoff → dispatch → output-collect for AI execution. Today the user runs prose handoff and feeds the brief to an agent themselves; a future command would automate that bridge using exactly the brief this PR generates.
LLM-based fuzzy matching as an alternative to Levenshtein, if user demand justifies trading determinism for resilience.
Per-harness adapters for non-Claude/Codex environments. The brief is harness-agnostic Markdown today.

…aration Adds docs/superpowers/plans/2026-05-03-skills-section.md describing how to add a skills: frontmatter key and ### Skills section so .prose.md authors can deterministically preload required agent skills, with Forme preflight failing closed when a declared skill is not installed in the user's harness search paths. Design agreed with user: - Colon-form names (document-skills:pdf), matches plugin marketplace - Levenshtein fuzzy fallback for v1; LLM-fuzzy is a follow-up RFC - BYO harness is sacred: OpenProse never installs or edits user skills - Service-level skills: is additive, not exclusive - Resolved canonical name pinned into IR for reproducibility Plan is execution-ready; subagent dispatch awaits user go-ahead.

Wires the skill resolver into preflightPath so a `.prose.md` author can declare `skills:` and have preflight fail closed if any are not installed on the user's machine. Skill resolution mutates each SkillRefIR in place (canonical_name + resolution + fuzzy_distance) so subsequent runs of the IR are reproducible across machines. Fuzzy hits emit an `info` diagnostic nudging the author to pin the canonical name. Adds a `--skill-search-path` CLI flag (repeatable, comma-/colon- separated) on `prose preflight` that overrides the default ./skills + ~/.claude/skills + ~/.codex/skills lookup. Tests use it to point at a tmpdir fixture and never touch the real harness — BYO harness invariant intact. Adapted from the plan's pseudocode (`runPreflight(file, opts)` returning `{ok, diagnostics}`) to the actual exported entrypoint `preflightPath(file, opts)` returning `{status, diagnostics}`. Fixtures use `kind: program` instead of `kind: system` so the unrelated `preflight_not_program` diagnostic does not interfere with assertions about skill diagnostics; skill iteration walks every component in the target file regardless, so `kind: system` would still be checked at runtime — the test just keeps the assertion focused on skill behavior. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…stub Adds the on-disk fixture pair (with-pdf.prose.md + a stub document-skills:pdf SKILL.md) and an end-to-end test that runs the real preflight pipeline against them. Both pass scenarios (skill installed) and fail scenarios (empty search path) are covered. The fixture program declares `kind: program` rather than the plan's `kind: system` so preflightPath has a `main` to anchor on; skill checking iterates all components in the target file regardless, so the declared semantics still hold. Test entrypoint adapted to `preflightPath` (the real exported async function) and `result.status` (the real shape) instead of the plan's hypothetical `runPreflight` / `result.ok`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Walks through happy-path, missing-skill, fuzzy-match, service-scope, docs-as-newcomer, and BYO-invariant scenarios against the feat/skills-section branch. Surfaces one correctness bug (silent fuzzy mis-resolution of short typos like pfd -> claude-skills:xf) and several adoption-blocking rough edges (skills invisible in PASS text output, default search path layout mismatch with real Claude Code installs, docs example uses kind:system but preflight requires kind:program, canonical-name pinning is in-memory only). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Synthetic-user dogfood found that resolveSkill("pfd", ...) silently bound to claude-skills:xf (a tweet-search skill) when pfd was a typo for pdf and only xf was installed two-deep. The old threshold formula Math.max(2, floor(len/3)) lets every 3-char declared name match anything within distance 2, and the clear-winner check only required second.d to be strictly greater than best.d. Both failures combined to silently activate a wholly unrelated skill with exit 0. Tighten the fuzzy guardrails: - Short declared names (length <= 4) require best.d <= 1. A 3-char name with distance 2 can match almost anything; refuse to bind silently. - Longer names cap distance at min(2, floor(len/3)). - Require either a margin of victory >= 2 or a shared >=2-char common prefix/suffix between declared and best.leaf. A bare-name match that is only 1 edit better than an equally-plausible competitor is too risky. Tests: - The original synthetic-user typo (pfd vs only xf installed) now resolves to unresolved with candidates. - pdf -> document-skills:pdf still fuzzy when only pdf is installed. - Longer typo (spreadshet) still fuzzy-resolves to spreadsheet when the only competitor is far away. - All four pre-existing resolver tests stay green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The doc snippets used `kind: system` which made `prose preflight` emit `preflight_not_program` for any user who copy-pasted them verbatim. Switch the examples to `kind: program` and add a minimal `### Ensures` section so they are real, complete, runnable contracts. Adds `test/skills-doc-examples.test.ts` which extracts every prose-shaped fenced block from the docs and runs it through compileSource and preflightPath to prevent this regression.

Synthetic-user dogfood found that resolveSkill only enumerated the two-level <root>/<namespace>/<name>/SKILL.md layout, but stock Claude Code installs are flat one-level: ~/.claude/skills/<name>/SKILL.md. This meant most users would see skill_unresolved for skills they actually had installed — a real adoption blocker. Teach enumerateInstalledSkills to discover both layouts: - One-level: <root>/<name>/SKILL.md, canonical = "<name>", no namespace. - Two-level: <root>/<namespace>/<name>/SKILL.md, canonical = "<ns>:<name>". Both layouts can coexist under the same root. Adjust resolveSkill so: - A bare declared name (e.g. `pdf`) prefers an exact one-level install before falling back to fuzzy leaf-match — `pdf` -> canonical `pdf` is resolution: "exact", not fuzzy. - A bare declared name with only a two-level install still fuzzy-matches to "namespace:name" (existing behavior preserved). - A colon-form declared name (`document-skills:pdf`) only matches a two-level install. A flat one-level install whose leaf happens to be `pdf` is NOT a match — the namespace was never declared by the install, so silently binding to it would be wrong. Tests cover all four corners: bare->one-level exact, colon-form->no one-level match, mixed-layout disambiguation, and the existing two-level-only fuzzy path stays green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

When `prose preflight` passes the user previously had no way to confirm their declared skills were actually checked, what they resolved to, or whether the fuzzy-pin nudge fired — the text formatter dropped all info-severity diagnostics and never rendered any skills. The whole fuzzy-resolver UX was invisible outside of `--format json`. Adds a `skills: PreflightSkillCheck[]` field to `PreflightResult` and renders a "Skills:" section listing each declared name, its canonical resolution, and the resolution kind (exact/fuzzy/unresolved) with the component/service scope. Always renders info-severity diagnostics in a "Notices:" section so the pin-canonical nudge reaches the user. Section is omitted entirely when no skills are declared anywhere, to keep the noise floor low.

Adds an Unreleased/Added bullet describing the explicit `skills:` feature (frontmatter list, `### Skills` section, search path stack, fail-closed preflight, fuzzy nudge, BYO-harness invariant). Also drops a single-line comment above `ServiceIR.skills = []` in `parseServices` explaining that inline `## sub-services` carry their own `ComponentIR.skills` so the bare service-reference form intentionally has no skills of its own.

… runs The skills feature plan promised that resolved canonical names get pinned into the IR so subsequent runs of the same IR are reproducible across machines. Only `prose preflight` was actually running the resolver; the on-disk IR from `prose compile` had `canonical_name: ""` and `resolution: "unresolved"` even when the skill resolved cleanly. Anything downstream consuming the compiled IR (run, deployment, manifest snapshot) saw stale skills metadata. Fix: - Extract the per-component resolver pass into a shared `pinSkillsInComponents(components, searchPaths)` helper in `src/skills.ts`. Mutates each `SkillRefIR` in place (canonical_name, resolution, fuzzy_distance) and returns the diagnostics it produced. - Refactor `src/preflight.ts` to use the shared helper instead of its own private `checkSkills` copy. Behavior is unchanged. - In the `prose compile` CLI command, after compile, run the resolver against `--skill-search-path` (or `defaultSearchPaths(packageRoot)`) and merge the diagnostics into the IR. Unresolved skills become compile errors and exit non-zero — fail-loud per the design constraint. - Wire `--skill-search-path` into the compile command (the parser already accepted the flag for preflight; just plumb it through and document it in the help text). Test: - `test/skills-compile-pinning.test.ts` shells out to `bun bin/prose.ts compile` end to end, asserts the on-disk IR JSON has `canonical_name: "document-skills:pdf"` / `resolution: "exact"` for an exact match, and asserts that an unresolved declared skill produces a non-zero exit and surfaces the offending skill name in CLI output. - BYO-harness invariant intact: tests use `tmpdir()` fixtures only, the resolver is read-only.

# Conflicts: # src/preflight.ts

All five blockers from the prior synthetic-user pass behave correctly: fuzzy mis-resolution fail-closes (pfd no longer silently binds to xf), one-level Claude Code skill layout is discovered, doc examples preflight cleanly under kind: program, Skills section + fuzzy nudge surface in text output, and prose compile pins canonical names with non-zero exit on unresolved skills. BYO-harness invariant still holds: zero mtime changes across 3 preflights + 1 compile against tmp fixtures. GO for branch push. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds a north-star example program that turns a prior investor letter PDF and this quarter's operating notes into a polished .docx investor update. The contract demonstrates the new skills: declaration on feat/skills-section in the way it is meant to be used: - System-level frontmatter declares document-skills:pdf, applying to every sub-service (the program reads the prior PDF for citations everywhere). - The investor-letter-formatter sub-service additively declares document-skills:docx via inline frontmatter on its ## heading, exercising the additive scope semantics described in skills/open-prose/contract-markdown.md. - prose preflight resolves both skills against examples/skills/ and renders a Skills section listing each canonical name with its scope. - prose compile pins canonical_name into the on-disk IR for cross-machine reproducibility. Stub skills live at examples/skills/document-skills/{pdf,docx}/ so the package default search root (./skills/) resolves them without --skill-search-path, keeping CI green without depending on the user's harness skills (BYO-harness invariant). Includes a test at test/quarterly-investor-update-example.test.ts covering compilation (system + service skill scoping), preflight PASS against the fixture stubs, and preflight FAIL closed against an empty search path. Updates examples-tour.test.ts and the package-ir snapshot to reflect the new program, and adds the example to examples/prose.package.json. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…-section Captures every design decision (naming, scope, fuzzy strategy, search paths, layout support, IR pinning, BYO invariant, fail-closed default, Forme placement, single-source-of-truth resolution, doc-example kind) with the trade-off taken for each. Documents the harness-engineering workflow used (plan, TDD, parallel-worktree subagent waves, two-pass dogfooding) and how the change fits the existing OpenProse repo conventions. Includes a 'Scope: out of scope' section making explicit that runtime activation (parent VM telling harness to load pinned skills, sub-agent briefing inheritance) is NOT wired by this PR — the verification half is done; the activation half is a clean follow-up.

The previous Declaring required skills section described how authors WRITE the skills: list and how preflight VERIFIES them, but the open-prose skill itself never told the AI-as-VM what to DO with declared skills at execution time. A controlled experiment (single program declaring document-skills:pdf, real PDF input, subagent told to act as the OpenProse VM) showed the agent read the program, saw the declaration, and then silently fell back to the Read tool's built-in PDF rendering — zero Skill tool invocations, contract violated, output correct only by accident. Add 'Activating declared skills at runtime' as the new load-bearing section. Tells the agent: when you embody a .prose.md, you ARE the VM; declared skills are runtime requirements, not metadata; activate first, then work; do not silently fall back to built-ins; pass canonical names into sub-agent briefings for child runs. The OpenProse SKILL.md is the compiler when the AI is the VM. New language features must be reflected here or they remain inert at runtime. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds HandoffSkill to SingleRunHandoff.component and a 'Required Skills' section to renderSingleRunHandoffMarkdown. The section explicitly tells the receiving harness to invoke Skill('open-prose-raw:open-prose') first, then Skill(<each declared canonical>) before producing outputs, with a 'do not fall back to built-in tools' warning. Programs without skills emit nothing extra (no noise). This is the runtime adapter that closes the loop for sub-agent dispatch: SKILL.md teaches the contract for parent agents already in OpenProse context; this brief enforces the same contract for delegate harnesses that don't auto-load the skill. Empirical experiments on 2026-05-04 showed delegate subagents do NOT auto-activate open-prose from a .prose.md path, so the brief must carry the activation directives explicitly. TDD: test/skills-handoff.test.ts written first, asserted on the schema, the rendered markdown content (open-prose + canonical-name Skill calls, 'do not fall back' warning), and the no-noise case for skill-less programs. Watched all three fail, implemented minimally, watched them pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…tion Captures four controlled experiments (agent-IDs in the document) showing the progression from no-fix → docs-only fix → handoff-bridge fix. Run 4 is the load-bearing proof: a fresh general-purpose subagent given only the unedited 'prose handoff' output invokes Skill('open-prose-raw:open-prose') then Skill('document-skills:pdf') before doing any work, and uses the skill's prescribed pdftotext tooling rather than Claude's built-in Read PDF rendering — activation drove behavior change. Updates the draft PR doc to reflect that runtime activation is now wired end-to-end (was previously framed as 'out of scope'). Verification + runtime-activation are both empirically proven. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Tight, copy-paste-ready PR description for openprose/prose maintainers. Distilled from the longer internal design log at docs/superpowers/drafts/2026-05-04-skills-section-pr-draft.md. Includes: summary, motivation, surface table, key design decisions, testing approach (TDD + empirical proof reference), one-shot 'try it locally' block, compatibility notes, branch-base note (RFC dependency), files-for-review pointer list, and out-of-scope follow-ups. raw@raw.works will review and decide where to publish.

rawwerks and others added 30 commits May 3, 2026 21:20

spec: declare ### Skills section and skills: frontmatter

038166d

types: add SkillRefIR and skills field to ComponentIR/ServiceIR

cb05c5f

parse: recognize skills: list in component frontmatter

7a6b497

parse: add parseSkills for ### Skills section

0cb816d

ir: populate ComponentIR/ServiceIR.skills from frontmatter and section

136094c

skills: add resolver with exact match + Levenshtein fuzzy fallback

b5fcd9a

manifest: project skills with declared/canonical/resolution

bbc8d97

docs: teach skills declaration in OpenProse SKILL.md

62cd459

merge: T6 skills resolver from worktree

bd50baa

merge: T8 manifest projection + T10 doc true-up from worktree

5f80384

review: final review of feat/skills-section

967d1f4

merge: FIX-A resolver hardening (fuzzy guardrails + one-level layout)

94e0da9

merge: FIX-C compile-time skill pinning

670f89b

# Conflicts: # src/preflight.ts

chore: gitignore docs/superpowers/ (internal agent workflow material)

78e3e94

rawwerks closed this May 4, 2026

rawwerks mentioned this pull request May 4, 2026

feat(spec): declared ### Skills section with fail-closed compile resolution #62

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: explicit skill declaration for .prose.md programs#59

feat: explicit skill declaration for .prose.md programs#59
rawwerks wants to merge 31 commits into
rfc/reactive-openprosefrom
feat/skills-section

rawwerks commented May 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rawwerks commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What changed

Key design decisions

Testing

Empirical proof of runtime activation

Try it locally

Compatibility

Branch base note

Files for review

Follow-ups (intentionally out of scope)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rawwerks commented May 4, 2026 •

edited

Loading