Skip to content

feat: explicit skill declaration for .prose.md programs#59

Closed
rawwerks wants to merge 31 commits into
rfc/reactive-openprosefrom
feat/skills-section
Closed

feat: explicit skill declaration for .prose.md programs#59
rawwerks wants to merge 31 commits into
rfc/reactive-openprosefrom
feat/skills-section

Conversation

@rawwerks
Copy link
Copy Markdown
Contributor

@rawwerks rawwerks commented May 4, 2026

Summary

Adds an explicit skills: declaration to .prose.md so authors can name the agent skills their programs require, plus the verification and activation machinery around it. prose preflight and prose compile resolve declared names against the user's installed harness skills and fail closed if anything is missing. prose handoff injects the resolved canonical names as Skill('<name>') activation directives into the single-run brief, so a delegate harness deterministically loads the right skills before producing outputs.

Programs that do not declare skills: go through every code path unchanged.

Why

An OpenProse program orchestrates sub-agents and currently trusts the harness's skill auto-router to pick the right skill at each step. Different model versions, different context windows, and different routers can pick differently — authors have no way to say "this program needs document-skills:pdf; if it is not loaded, do not run." This PR adds that.

What changed

Area Change
Contract markdown New ### Skills section + skills: frontmatter, colon form (document-skills:pdf)
IR SkillRefIR on ComponentIR and ServiceIR; resolved canonical name + resolution kind pinned in place
Resolver New src/skills.ts: exact match, Levenshtein fuzzy fallback (hardened against silent typo bind), supports both one-level and two-level skill install layouts
prose preflight Walks declared skills, emits skill_unresolved (error) and skill_fuzzy_resolved (info), surfaces them in text output
prose compile Same resolver pass; fails closed on unresolved so the on-disk IR never carries resolution: "unresolved"
prose handoff Injects a ## Required Skills section into the rendered brief, listing each canonical as a Skill('<name>') activation directive
CLI --skill-search-path on preflight and compile
skills/open-prose/SKILL.md New "Activating declared skills at runtime" section instructing the AI-as-VM to invoke the harness Skill tool before doing the work
Docs / examples contract-markdown.md, CHANGELOG.md, and examples/north-star/quarterly-investor-update.prose.md (idiomatic demo with program-scope and service-scope skills)

Key design decisions

  • Colon form for skill names matches the plugin marketplace convention users already see in /skill listings. Bare names accepted as a convenience via the fuzzy fallback.
  • BYO harness is read-only. OpenProse never installs, edits, or removes anything in ~/.claude/skills/ or ~/.codex/skills/. The resolver verifies presence; users install missing skills with their normal workflow.
  • Service-level skills are additive to system-level, not an exclusive allowlist. A sub-service declaration unions with the program-level set.
  • Compile fails loud on unresolved skills. Preflight is more lenient. Compile must produce an IR with every canonical name pinned because that IR may ship to other machines for reruns.
  • Single source of truth for resolution. pinSkillsInComponents in src/skills.ts is the only place skills get mutated; both preflight and compile call it.
  • Runtime activation has two paths. For the parent VM (the user's own AI session that already has the OpenProse skill loaded), SKILL.md teaches the activation contract. For delegate dispatches (prose handoff to a fresh harness), the rendered brief carries the activation directives. Both paths are exercised in the empirical proof below.

Testing

bun test395 pass / 1 skip / 0 fail. (The skip is a pre-existing network-flake test unrelated to this branch.)

New test files, all TDD (failing test written first, watched red, then implemented):

  • test/skills-section.test.ts, skills-resolver.test.ts, skills-preflight.test.ts, skills-manifest.test.ts, skills-e2e.test.ts, skills-compile-pinning.test.ts, skills-handoff.test.ts
  • test/skills-doc-examples.test.ts extracts every fenced example from SKILL.md and contract-markdown.md and asserts each parses + preflights cleanly — no copy-paste footguns in the docs

Empirical proof of runtime activation

Captured in docs/superpowers/findings/2026-05-04-skills-runtime-empirical-proof.md. Four controlled experiments tracing the progression from no-fix to docs-only-fix to handoff-bridge fix. The load-bearing run: a fresh general-purpose Claude subagent given only the unedited prose handoff brief — no orchestrator priming — invoked Skill('open-prose-raw:open-prose') then Skill('document-skills:pdf') before producing outputs, and used the skill's prescribed pdftotext -layout rather than the harness's built-in PDF rendering. Activation drove behavior change, not just registration.

Try it locally

cat > /tmp/demo.prose.md <<'EOF'
---
name: demo
kind: program
skills:
  - document-skills:pdf
---

### Description
Extract a summary from a PDF.

### Requires
- `pdf_path`: the file to read

### Ensures
- `summary`: a markdown bullet list
EOF

# Verify against your installed skills (defaults to ./skills, ~/.claude/skills, ~/.codex/skills)
bun bin/prose.ts preflight /tmp/demo.prose.md

# Compile with canonical name pinning
bun bin/prose.ts compile /tmp/demo.prose.md --out /tmp/demo.ir.json
jq '.components[0].skills' /tmp/demo.ir.json

# Generate the runtime brief — the Required Skills section is the activation contract
bun bin/prose.ts handoff /tmp/demo.prose.md --input pdf_path=/path/to/file.pdf

Compatibility

  • Programs that do not declare skills: produce an empty skills array on the IR and emit no extra output anywhere. No code path changes for them.
  • Two fixtures/hosted-runtime/*.json golden snapshots regenerated to track the new skills field on ComponentIR.
  • No public API removed. The --skill-search-path flag replaces defaults when supplied so test fixtures can run against tmp dirs without touching real harness skills.

Branch base note

This branch is based on rfc/reactive-openprose. The compiler surface (src/markdown.ts, src/sections.ts, src/compiler.ts, src/preflight.ts, src/handoff.ts, etc.) only exists on the RFC branch. Please target the RFC branch when merging, or rebase onto whichever branch carries that surface in your flow.

Files for review

If triaging where to look first:

  • Resolver: src/skills.ts — single source of truth for resolution
  • Brief generator (the runtime adapter): src/handoff.tsrenderSingleRunHandoffMarkdown
  • Spec: skills/open-prose/contract-markdown.md, skills/open-prose/SKILL.md
  • Empirical proof: docs/superpowers/findings/2026-05-04-skills-runtime-empirical-proof.md
  • Demo program: examples/north-star/quarterly-investor-update.prose.md

Follow-ups (intentionally out of scope)

  • A prose run command that wraps handoff → dispatch → output-collect for AI execution. Today the user runs prose handoff and feeds the brief to an agent themselves; a future command would automate that bridge using exactly the brief this PR generates.
  • LLM-based fuzzy matching as an alternative to Levenshtein, if user demand justifies trading determinism for resilience.
  • Per-harness adapters for non-Claude/Codex environments. The brief is harness-agnostic Markdown today.

rawwerks and others added 30 commits May 3, 2026 21:20
…aration

Adds docs/superpowers/plans/2026-05-03-skills-section.md describing how to
add a skills: frontmatter key and ### Skills section so .prose.md authors
can deterministically preload required agent skills, with Forme preflight
failing closed when a declared skill is not installed in the user's
harness search paths.

Design agreed with user:
- Colon-form names (document-skills:pdf), matches plugin marketplace
- Levenshtein fuzzy fallback for v1; LLM-fuzzy is a follow-up RFC
- BYO harness is sacred: OpenProse never installs or edits user skills
- Service-level skills: is additive, not exclusive
- Resolved canonical name pinned into IR for reproducibility

Plan is execution-ready; subagent dispatch awaits user go-ahead.
Wires the skill resolver into preflightPath so a `.prose.md` author can
declare `skills:` and have preflight fail closed if any are not
installed on the user's machine. Skill resolution mutates each
SkillRefIR in place (canonical_name + resolution + fuzzy_distance) so
subsequent runs of the IR are reproducible across machines. Fuzzy hits
emit an `info` diagnostic nudging the author to pin the canonical name.

Adds a `--skill-search-path` CLI flag (repeatable, comma-/colon-
separated) on `prose preflight` that overrides the default
./skills + ~/.claude/skills + ~/.codex/skills lookup. Tests use it to
point at a tmpdir fixture and never touch the real harness — BYO
harness invariant intact.

Adapted from the plan's pseudocode (`runPreflight(file, opts)` returning
`{ok, diagnostics}`) to the actual exported entrypoint
`preflightPath(file, opts)` returning `{status, diagnostics}`. Fixtures
use `kind: program` instead of `kind: system` so the unrelated
`preflight_not_program` diagnostic does not interfere with assertions
about skill diagnostics; skill iteration walks every component in the
target file regardless, so `kind: system` would still be checked at
runtime — the test just keeps the assertion focused on skill behavior.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…stub

Adds the on-disk fixture pair (with-pdf.prose.md + a stub
document-skills:pdf SKILL.md) and an end-to-end test that runs the real
preflight pipeline against them. Both pass scenarios (skill installed)
and fail scenarios (empty search path) are covered.

The fixture program declares `kind: program` rather than the plan's
`kind: system` so preflightPath has a `main` to anchor on; skill
checking iterates all components in the target file regardless, so the
declared semantics still hold. Test entrypoint adapted to
`preflightPath` (the real exported async function) and `result.status`
(the real shape) instead of the plan's hypothetical
`runPreflight` / `result.ok`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Walks through happy-path, missing-skill, fuzzy-match, service-scope,
docs-as-newcomer, and BYO-invariant scenarios against the
feat/skills-section branch. Surfaces one correctness bug (silent fuzzy
mis-resolution of short typos like pfd -> claude-skills:xf) and several
adoption-blocking rough edges (skills invisible in PASS text output,
default search path layout mismatch with real Claude Code installs,
docs example uses kind:system but preflight requires kind:program,
canonical-name pinning is in-memory only).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Synthetic-user dogfood found that resolveSkill("pfd", ...) silently bound
to claude-skills:xf (a tweet-search skill) when pfd was a typo for pdf and
only xf was installed two-deep. The old threshold formula
Math.max(2, floor(len/3)) lets every 3-char declared name match anything
within distance 2, and the clear-winner check only required second.d to be
strictly greater than best.d. Both failures combined to silently activate
a wholly unrelated skill with exit 0.

Tighten the fuzzy guardrails:

- Short declared names (length <= 4) require best.d <= 1. A 3-char name
  with distance 2 can match almost anything; refuse to bind silently.
- Longer names cap distance at min(2, floor(len/3)).
- Require either a margin of victory >= 2 or a shared >=2-char common
  prefix/suffix between declared and best.leaf. A bare-name match that is
  only 1 edit better than an equally-plausible competitor is too risky.

Tests:

- The original synthetic-user typo (pfd vs only xf installed) now
  resolves to unresolved with candidates.
- pdf -> document-skills:pdf still fuzzy when only pdf is installed.
- Longer typo (spreadshet) still fuzzy-resolves to spreadsheet when the
  only competitor is far away.
- All four pre-existing resolver tests stay green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The doc snippets used `kind: system` which made `prose preflight` emit
`preflight_not_program` for any user who copy-pasted them verbatim. Switch
the examples to `kind: program` and add a minimal `### Ensures` section so
they are real, complete, runnable contracts. Adds
`test/skills-doc-examples.test.ts` which extracts every prose-shaped fenced
block from the docs and runs it through compileSource and preflightPath to
prevent this regression.
Synthetic-user dogfood found that resolveSkill only enumerated the
two-level <root>/<namespace>/<name>/SKILL.md layout, but stock Claude
Code installs are flat one-level: ~/.claude/skills/<name>/SKILL.md. This
meant most users would see skill_unresolved for skills they actually had
installed — a real adoption blocker.

Teach enumerateInstalledSkills to discover both layouts:

- One-level: <root>/<name>/SKILL.md, canonical = "<name>", no namespace.
- Two-level: <root>/<namespace>/<name>/SKILL.md, canonical = "<ns>:<name>".

Both layouts can coexist under the same root. Adjust resolveSkill so:

- A bare declared name (e.g. `pdf`) prefers an exact one-level install
  before falling back to fuzzy leaf-match — `pdf` -> canonical `pdf` is
  resolution: "exact", not fuzzy.
- A bare declared name with only a two-level install still fuzzy-matches
  to "namespace:name" (existing behavior preserved).
- A colon-form declared name (`document-skills:pdf`) only matches a
  two-level install. A flat one-level install whose leaf happens to be
  `pdf` is NOT a match — the namespace was never declared by the
  install, so silently binding to it would be wrong.

Tests cover all four corners: bare->one-level exact, colon-form->no
one-level match, mixed-layout disambiguation, and the existing
two-level-only fuzzy path stays green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When `prose preflight` passes the user previously had no way to confirm
their declared skills were actually checked, what they resolved to, or
whether the fuzzy-pin nudge fired — the text formatter dropped all
info-severity diagnostics and never rendered any skills. The whole
fuzzy-resolver UX was invisible outside of `--format json`.

Adds a `skills: PreflightSkillCheck[]` field to `PreflightResult` and renders
a "Skills:" section listing each declared name, its canonical resolution,
and the resolution kind (exact/fuzzy/unresolved) with the component/service
scope. Always renders info-severity diagnostics in a "Notices:" section so
the pin-canonical nudge reaches the user. Section is omitted entirely when
no skills are declared anywhere, to keep the noise floor low.
Adds an Unreleased/Added bullet describing the explicit `skills:` feature
(frontmatter list, `### Skills` section, search path stack, fail-closed
preflight, fuzzy nudge, BYO-harness invariant). Also drops a single-line
comment above `ServiceIR.skills = []` in `parseServices` explaining that
inline `## sub-services` carry their own `ComponentIR.skills` so the bare
service-reference form intentionally has no skills of its own.
… runs

The skills feature plan promised that resolved canonical names get pinned
into the IR so subsequent runs of the same IR are reproducible across
machines. Only `prose preflight` was actually running the resolver; the
on-disk IR from `prose compile` had `canonical_name: ""` and
`resolution: "unresolved"` even when the skill resolved cleanly. Anything
downstream consuming the compiled IR (run, deployment, manifest snapshot)
saw stale skills metadata.

Fix:

- Extract the per-component resolver pass into a shared
  `pinSkillsInComponents(components, searchPaths)` helper in `src/skills.ts`.
  Mutates each `SkillRefIR` in place (canonical_name, resolution,
  fuzzy_distance) and returns the diagnostics it produced.
- Refactor `src/preflight.ts` to use the shared helper instead of its
  own private `checkSkills` copy. Behavior is unchanged.
- In the `prose compile` CLI command, after compile, run the resolver
  against `--skill-search-path` (or `defaultSearchPaths(packageRoot)`)
  and merge the diagnostics into the IR. Unresolved skills become
  compile errors and exit non-zero — fail-loud per the design constraint.
- Wire `--skill-search-path` into the compile command (the parser
  already accepted the flag for preflight; just plumb it through and
  document it in the help text).

Test:

- `test/skills-compile-pinning.test.ts` shells out to `bun bin/prose.ts
  compile` end to end, asserts the on-disk IR JSON has
  `canonical_name: "document-skills:pdf"` / `resolution: "exact"` for
  an exact match, and asserts that an unresolved declared skill
  produces a non-zero exit and surfaces the offending skill name in
  CLI output.
- BYO-harness invariant intact: tests use `tmpdir()` fixtures only, the
  resolver is read-only.
# Conflicts:
#	src/preflight.ts
All five blockers from the prior synthetic-user pass behave correctly:
fuzzy mis-resolution fail-closes (pfd no longer silently binds to xf),
one-level Claude Code skill layout is discovered, doc examples preflight
cleanly under kind: program, Skills section + fuzzy nudge surface in text
output, and prose compile pins canonical names with non-zero exit on
unresolved skills. BYO-harness invariant still holds: zero mtime changes
across 3 preflights + 1 compile against tmp fixtures. GO for branch push.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a north-star example program that turns a prior investor letter PDF and
this quarter's operating notes into a polished .docx investor update. The
contract demonstrates the new skills: declaration on feat/skills-section in
the way it is meant to be used:

- System-level frontmatter declares document-skills:pdf, applying to every
  sub-service (the program reads the prior PDF for citations everywhere).
- The investor-letter-formatter sub-service additively declares
  document-skills:docx via inline frontmatter on its ## heading,
  exercising the additive scope semantics described in
  skills/open-prose/contract-markdown.md.
- prose preflight resolves both skills against examples/skills/ and renders a
  Skills section listing each canonical name with its scope.
- prose compile pins canonical_name into the on-disk IR for cross-machine
  reproducibility.

Stub skills live at examples/skills/document-skills/{pdf,docx}/ so the package
default search root (./skills/) resolves them without --skill-search-path,
keeping CI green without depending on the user's harness skills (BYO-harness
invariant).

Includes a test at test/quarterly-investor-update-example.test.ts covering
compilation (system + service skill scoping), preflight PASS against the
fixture stubs, and preflight FAIL closed against an empty search path. Updates
examples-tour.test.ts and the package-ir snapshot to reflect the new program,
and adds the example to examples/prose.package.json.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…-section

Captures every design decision (naming, scope, fuzzy strategy, search paths,
layout support, IR pinning, BYO invariant, fail-closed default, Forme
placement, single-source-of-truth resolution, doc-example kind) with the
trade-off taken for each. Documents the harness-engineering workflow used
(plan, TDD, parallel-worktree subagent waves, two-pass dogfooding) and how
the change fits the existing OpenProse repo conventions.

Includes a 'Scope: out of scope' section making explicit that runtime
activation (parent VM telling harness to load pinned skills, sub-agent
briefing inheritance) is NOT wired by this PR — the verification half is
done; the activation half is a clean follow-up.
The previous Declaring required skills section described how authors WRITE
the skills: list and how preflight VERIFIES them, but the open-prose skill
itself never told the AI-as-VM what to DO with declared skills at execution
time. A controlled experiment (single program declaring document-skills:pdf,
real PDF input, subagent told to act as the OpenProse VM) showed the agent
read the program, saw the declaration, and then silently fell back to the
Read tool's built-in PDF rendering — zero Skill tool invocations, contract
violated, output correct only by accident.

Add 'Activating declared skills at runtime' as the new load-bearing
section. Tells the agent: when you embody a .prose.md, you ARE the VM;
declared skills are runtime requirements, not metadata; activate first,
then work; do not silently fall back to built-ins; pass canonical names
into sub-agent briefings for child runs.

The OpenProse SKILL.md is the compiler when the AI is the VM. New language
features must be reflected here or they remain inert at runtime.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds HandoffSkill to SingleRunHandoff.component and a 'Required Skills'
section to renderSingleRunHandoffMarkdown. The section explicitly tells the
receiving harness to invoke Skill('open-prose-raw:open-prose') first, then
Skill(<each declared canonical>) before producing outputs, with a 'do not
fall back to built-in tools' warning. Programs without skills emit nothing
extra (no noise).

This is the runtime adapter that closes the loop for sub-agent dispatch:
SKILL.md teaches the contract for parent agents already in OpenProse
context; this brief enforces the same contract for delegate harnesses
that don't auto-load the skill. Empirical experiments on 2026-05-04
showed delegate subagents do NOT auto-activate open-prose from a .prose.md
path, so the brief must carry the activation directives explicitly.

TDD: test/skills-handoff.test.ts written first, asserted on the schema,
the rendered markdown content (open-prose + canonical-name Skill calls,
'do not fall back' warning), and the no-noise case for skill-less programs.
Watched all three fail, implemented minimally, watched them pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…tion

Captures four controlled experiments (agent-IDs in the document) showing
the progression from no-fix → docs-only fix → handoff-bridge fix. Run 4
is the load-bearing proof: a fresh general-purpose subagent given only
the unedited 'prose handoff' output invokes Skill('open-prose-raw:open-prose')
then Skill('document-skills:pdf') before doing any work, and uses the
skill's prescribed pdftotext tooling rather than Claude's built-in Read
PDF rendering — activation drove behavior change.

Updates the draft PR doc to reflect that runtime activation is now wired
end-to-end (was previously framed as 'out of scope'). Verification +
runtime-activation are both empirically proven.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Tight, copy-paste-ready PR description for openprose/prose maintainers.
Distilled from the longer internal design log at
docs/superpowers/drafts/2026-05-04-skills-section-pr-draft.md.

Includes: summary, motivation, surface table, key design decisions,
testing approach (TDD + empirical proof reference), one-shot 'try it
locally' block, compatibility notes, branch-base note (RFC dependency),
files-for-review pointer list, and out-of-scope follow-ups.

raw@raw.works will review and decide where to publish.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant