From 4a3f34ab1f603b70ba501a24059a4303ef311db0 Mon Sep 17 00:00:00 2001 From: David Cramer Date: Fri, 24 Apr 2026 12:15:53 -0400 Subject: [PATCH 1/5] docs(prompt-optimizer): clarify tool policy vs. native tool disclosure Rename the `` marker to `` in the portable skeleton, Example 1, and the "good section names" list. Add a note in core-patterns explaining that tool schemas are disclosed by the provider-native tools parameter (Anthropic, OpenAI, Gemini all accept a `tools` field that is surfaced to the model automatically), so the user-authored prompt should carry policy, not enumeration. Add one matching line per provider section in model-family-notes. The old `` label invited authors to restate tool inventories and schemas in the prompt text, which duplicates what the API already passes and bloats the prompt without adding signal. Fixes #120 Co-Authored-By: Claude Opus 4.7 (1M context) --- .../references/core-patterns.md | 27 ++++++++++++++----- .../references/model-family-notes.md | 3 +++ .../references/transformed-examples.md | 4 +-- 3 files changed, 25 insertions(+), 9 deletions(-) diff --git a/skills/prompt-optimizer/references/core-patterns.md b/skills/prompt-optimizer/references/core-patterns.md index 2583f7e..d1712fb 100644 --- a/skills/prompt-optimizer/references/core-patterns.md +++ b/skills/prompt-optimizer/references/core-patterns.md @@ -18,7 +18,7 @@ Good section names are concrete and stable: - `` - `` - `` -- `` +- `` - `` - `` - `` @@ -72,7 +72,20 @@ Do not cargo-cult this ordering into short prompts that do not need it. ## Portable agent prompt skeleton -Use this as a starting point and adapt it: +Use this as a starting point and adapt it. + +Tool schemas are disclosed to the model by the provider-native tools parameter +(Anthropic `tools`, OpenAI `tools`, Gemini `tools`). On Anthropic this is +explicit — the API constructs a special system prompt that injects the tool +definitions from the `tools` parameter alongside the user-authored system +prompt. Well-tuned harnesses (Codex CLI, pi-agent-core) pass tools natively +and keep the prompt text free of schema restatements. + +The prompt text should carry tool *policy* — when to call tools, when to avoid +them, what evidence to gather before acting — not a restated list of tool +names or argument schemas. Naming a specific tool in a policy rule ("prefer +`Read` over a `Bash` cat") is fine; re-enumerating the tool inventory or its +schemas is not. ```text @@ -91,11 +104,11 @@ Available files or documents: Known constraints: - -Available tools: -When to use them: -When to avoid them: - + +When to use tools: +When to avoid tools: +Evidence to gather before acting: + 1. Clarify only if required. diff --git a/skills/prompt-optimizer/references/model-family-notes.md b/skills/prompt-optimizer/references/model-family-notes.md index ada81e3..2e3c260 100644 --- a/skills/prompt-optimizer/references/model-family-notes.md +++ b/skills/prompt-optimizer/references/model-family-notes.md @@ -12,6 +12,7 @@ Use this file to adapt prompts to model behavior instead of assuming all model f - Use delimiters such as markdown headings, XML tags, or section titles when the prompt mixes multiple content blocks. - Try zero-shot first. Add few-shot examples only when the output contract or edge cases need them. - Be explicit about constraints, success criteria, and completion conditions. +- Tool schemas are disclosed via the Responses API `tools` parameter. Keep tool policy (when/why/whether to call) in the prompt; do not restate tool names or argument schemas. ### GPT-style non-reasoning models @@ -29,6 +30,7 @@ Use this file to adapt prompts to model behavior instead of assuming all model f - For long context, place long documents before the question and put the actual query near the end. - When grounding in long documents, asking for relevant quotes first can improve downstream analysis. - If tool use or progress-update behavior matters, specify it explicitly rather than assuming the model will infer it. +- When you call the Messages API with `tools`, the API injects the tool definitions into a special system prompt automatically. Keep your user-authored system prompt focused on policy; put tool detail in each tool's `description` field rather than re-listing schemas in prose. ## Gemini @@ -39,6 +41,7 @@ Use this file to adapt prompts to model behavior instead of assuming all model f - Use system instructions when the target runtime supports them. - Thinking is dynamic by default on modern Gemini thinking models; tune it only when latency or deeper reasoning warrants it. - Gemini long-context workflows can benefit from many-shot in-context learning when you have a large bank of representative examples. +- Tool schemas are disclosed via the Gemini API `tools` (function declarations) parameter. Keep the prompt focused on tool policy; do not re-list function names or parameter schemas. ## Cross-family adapter rules diff --git a/skills/prompt-optimizer/references/transformed-examples.md b/skills/prompt-optimizer/references/transformed-examples.md index ba64d1e..2360fe1 100644 --- a/skills/prompt-optimizer/references/transformed-examples.md +++ b/skills/prompt-optimizer/references/transformed-examples.md @@ -20,10 +20,10 @@ Implement the user's requested change end-to-end when feasible. Do not stop at analysis if you can safely gather facts and act. - + Use tools to inspect the workspace before assuming facts. Read before write. Validate the changed surface before finishing. - + 1. Restate the objective briefly. From 5391c98ba695c777f99f308f0208456fe5b3d748 Mon Sep 17 00:00:00 2001 From: David Cramer Date: Fri, 24 Apr 2026 12:16:36 -0400 Subject: [PATCH 2/5] docs(prompt-optimizer): add layered-prompt architecture guidance MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add a "Layered prompts with multiple owners" subsection to core-patterns covering the platform-layer vs. deployer/persona-layer split used by SOUL.md, CLAUDE.md, AGENTS.md, Hermes, OpenClaw, and similar runtimes. The invariant is that the deployer layer is voice-only — every platform behavior rule must still fire if the deployer layer is empty or sparse. Point at the new subsection from SKILL.md Step 3. The existing layering model assumed a single author owns the whole system prompt, which nudged authors to delete platform bullets on the theory that a downstream persona file would cover them. In practice deployer-authored files are often five lines of voice and tone, so load-bearing rules silently drop when they live there. Fixes #121 Co-Authored-By: Claude Opus 4.7 (1M context) --- skills/prompt-optimizer/SKILL.md | 1 + .../references/core-patterns.md | 28 +++++++++++++++++++ 2 files changed, 29 insertions(+) diff --git a/skills/prompt-optimizer/SKILL.md b/skills/prompt-optimizer/SKILL.md index b3e6fa9..2298cac 100644 --- a/skills/prompt-optimizer/SKILL.md +++ b/skills/prompt-optimizer/SKILL.md @@ -58,6 +58,7 @@ Read `references/core-patterns.md`. 1. Separate durable behavior from task-local context: - stable policy and behavioral defaults belong in `system` or `developer` - variable inputs, retrieved context, and task instances belong in templated user-facing sections +- when the system prompt is assembled at runtime from a platform layer and a deployer-authored persona layer (e.g., `SOUL.md`, `CLAUDE.md`, `AGENTS.md`), see "Layered prompts with multiple owners" in `references/core-patterns.md` — platform behavior rules must not depend on what the deployer layer contains 2. Keep one authoritative instruction per behavior: - if a rule appears in more than one layer, choose one owner for it diff --git a/skills/prompt-optimizer/references/core-patterns.md b/skills/prompt-optimizer/references/core-patterns.md index d1712fb..4532ea0 100644 --- a/skills/prompt-optimizer/references/core-patterns.md +++ b/skills/prompt-optimizer/references/core-patterns.md @@ -70,6 +70,34 @@ When prompts are long, separate policy from evidence explicitly: For long-context prompts, place long evidence before the final query and keep the actual ask in a terminal section. Do not cargo-cult this ordering into short prompts that do not need it. +### Layered prompts with multiple owners + +The layers above assume a single author owns the whole system prompt. Many +runtimes concatenate the system prompt from multiple layers with different +owners at request time: + +- a **platform layer** owned by the product or framework team (harness rules, + tool-use policy, output contract, safety boundaries) +- a **deployer or persona layer** authored by the downstream user or customer + (voice, tone, identity files such as `SOUL.md`, `CLAUDE.md`, `AGENTS.md`) + +When this is the case, treat the deployer layer as **voice-only**: + +- every platform behavior rule — evidence gathering, tool-use policy, narration + rules, output contract, escalation boundaries — must live in the + platform-owned layer and must still fire if the deployer layer is empty, + five lines of voice, or customized in unexpected ways +- do not delete a platform bullet on the assumption that a persona file + "probably covers it"; deployers ship sparse persona files in practice +- if a rule is load-bearing, it belongs in the platform layer by default; + the deployer layer gets voice and domain framing, not policy + +Hermes Agent, OpenClaw, and similar SOUL.md-style frameworks use this split +explicitly: platform behavior is code-level, SOUL.md carries identity and +tone, and the platform falls back to a built-in default identity if SOUL.md +is absent or sparse. Mirror that invariant whenever a prompt is assembled +from more than one authorship layer. + ## Portable agent prompt skeleton Use this as a starting point and adapt it. From 1b65adcb601dbcf92640d45027ac6a0919d5daef Mon Sep 17 00:00:00 2001 From: David Cramer Date: Fri, 24 Apr 2026 12:17:34 -0400 Subject: [PATCH 3/5] docs(prompt-optimizer): warn on directives in descriptive markers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add a "Where rules live" subsection to core-patterns noting that descriptive markers (``, ``, ``, ``) read as data and state-conditional directives buried in them can underperform the same directive placed in a canonical rules section (``, ``, ``, ``). Add a matching bullet to "High-value prompt moves" and a new Example 4 showing the before/after (resume-notice case, 0.5 → ≥0.75 on the relevant eval with no other change). The previous guidance said each rule needs one owner but said nothing about *where* that owner should be. Authors following the "collapse duplicates and group related content" advice could land a directive in a state block that reads correctly but underperforms. Fixes #122 Co-Authored-By: Claude Opus 4.7 (1M context) --- .../references/core-patterns.md | 24 ++++++++++++ .../references/transformed-examples.md | 39 +++++++++++++++++++ 2 files changed, 63 insertions(+) diff --git a/skills/prompt-optimizer/references/core-patterns.md b/skills/prompt-optimizer/references/core-patterns.md index 4532ea0..826c7b9 100644 --- a/skills/prompt-optimizer/references/core-patterns.md +++ b/skills/prompt-optimizer/references/core-patterns.md @@ -27,6 +27,29 @@ Do not add markup around every sentence. Markers are useful when they carve the If the target stack or model family responds better to plain markdown, use headings and bullets instead of XML-style tags. The structure matters more than the syntax. +### Where rules live + +Markers signal to the model what kind of content a block carries. Descriptive +or state markers (``, ``, ``, ``, +``) read as facts about the situation — data, not policy. +Canonical rules markers (``, ``, ``, +``) read as directives the model should follow. + +A directive buried in a descriptive block can underperform the same directive +placed in a rules block, especially for state-conditional rules. Observed in +the field: a resume-notice instruction placed inside `resumed` +scored 0.5 on the relevant eval; the identical sentence moved into +`` passed at ≥0.75 with no other change. + +Rules of thumb: + +- keep descriptive markers descriptive — put facts about the situation there, + not directives +- directives live in a canonical rules section +- for state-conditional rules, phrase them in the rules section and reference + the state by name: "When `` is `resumed`, post a brief + continuation notice, then answer." + ## Layer the prompt correctly Keep these layers separate: @@ -171,6 +194,7 @@ Use markdown headings instead of tags if that fits the target stack better. - Keep progress-update style explicit if the user should see it. - Use the shortest wording that preserves the intended behavioral constraint. - Remove persona, motivation, or reminder text that does not change measured behavior. +- Place directives in canonical rules sections (``, ``, ``, ``), not buried inside descriptive markers like ``, ``, or ``. ## Examples diff --git a/skills/prompt-optimizer/references/transformed-examples.md b/skills/prompt-optimizer/references/transformed-examples.md index 2360fe1..807efb9 100644 --- a/skills/prompt-optimizer/references/transformed-examples.md +++ b/skills/prompt-optimizer/references/transformed-examples.md @@ -127,3 +127,42 @@ Why it is better: - removes chain-of-thought demand - replaces absolute slogans with operational rules - turns style goals into specific output behavior + +## Example 4: Directive placement — state marker vs. rules section + +### Before + +```text +resumed +This turn continues from a prior checkpoint. Post a brief continuation +notice (e.g., "Connected — continuing.") and then the resumed answer +as a separate message. +``` + +The directive is buried inside a descriptive state marker. In the field, +this variant scored 0.5 on the relevant LLM-judged eval — the model read +the block as situational data and combined both messages into one. + +### After + +```text +resumed + + +When `` is `resumed`, post a brief continuation notice +("Connected — continuing.") first, then send the resumed answer as a +separate message. + +``` + +Same rule, stronger placement. The state marker stays descriptive; the +directive moves to the canonical rules section and references the state +by name. In the same eval, this variant passed at ≥0.75 with no other +change. + +Why it is better: + +- keeps descriptive markers descriptive — facts, not policy +- places the directive where the model reads directives +- makes the state-conditional nature explicit instead of implicit +- preserves a single authoritative owner for the rule From e53a7922260d36753b6d94544ebe56495b140782 Mon Sep 17 00:00:00 2001 From: David Cramer Date: Fri, 24 Apr 2026 12:18:24 -0400 Subject: [PATCH 4/5] docs(prompt-optimizer): align "Layer the prompt correctly" with tool-policy terminology Fold tool-schema disclosure into the long-prompt separation guidance so the "Layer the prompt correctly" section stays consistent with the new tool-policy language. Previously the bullet read "tool rules and schemas in their own labeled sections" which still nudged authors to restate schemas in the prompt text. Co-Authored-By: Claude Opus 4.7 (1M context) --- skills/prompt-optimizer/references/core-patterns.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/prompt-optimizer/references/core-patterns.md b/skills/prompt-optimizer/references/core-patterns.md index 826c7b9..4e1606e 100644 --- a/skills/prompt-optimizer/references/core-patterns.md +++ b/skills/prompt-optimizer/references/core-patterns.md @@ -88,7 +88,7 @@ When prompts are long, separate policy from evidence explicitly: - instructions in one block - retrieved documents in another - examples in another -- tool rules and schemas in their own labeled sections +- tool policy (when/why/whether) in its own labeled section; tool schemas stay in the provider-native tools parameter For long-context prompts, place long evidence before the final query and keep the actual ask in a terminal section. Do not cargo-cult this ordering into short prompts that do not need it. From 70c1980646241dff764ffb76c97ed1d4da9533ad Mon Sep 17 00:00:00 2001 From: David Cramer Date: Fri, 24 Apr 2026 13:10:20 -0400 Subject: [PATCH 5/5] docs(prompt-optimizer): add dedicated tools and skills references MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The skill previously routed all cross-surface guidance through `core-patterns.md`. Prompt authors who are actively shaping a tool-using surface or a skill layer need deeper, dedicated material that the earlier issues (#120-#122) only nudged at. Adds two references plus SKILL.md and SOURCES.md wiring: `references/tools.md` — where tool schemas are actually disclosed (provider-native `tools` parameter, not the prompt), what the prompt *should* carry about tools, progressive/deferred disclosure (Claude Code `ToolSearch`, MCP `tools/list` + `list_changed`), tool-count ceilings, tool-call narration placement, parallel/sequential control, a cross-cutting tool-call error policy, and a symptom-to-fix anti-pattern table. `references/skills.md` — the skill-vs-tool distinction, disclosure patterns (eager inline, lazy index, hybrid) with resumability tradeoffs, invocation conventions (slash, meta-tool, description matching, implicit), platform-vs-deployer authorship (extends `core-patterns.md` "Layered prompts"), skill-bundled tool registration and namespacing, skill routing between adjacent skills, lifecycle across resumable sessions, and a belongs-in-skill vs belongs-in-core-prompt decision test. SKILL.md task table gets two new routing rows, and Step 3 now names both references when the prompt surface includes tools or skills. SOURCES.md captures the Anthropic Messages tool mechanics, MCP spec, Agent Skills spec, Claude Code skill/tool docs, Semantic Kernel plugins, OpenAI function calling, Gemini function declarations, LangGraph ToolNode, and the skills-based routing research. Validation: - uv run ~/.claude/skills/skill-writer/scripts/quick_validate.py \ skills/prompt-optimizer --strict-depth → valid, 0 errors/warnings - uv run skills/skill-scanner/scripts/scan_skill.py \ skills/prompt-optimizer → 0 findings, 0 untrusted URLs Co-Authored-By: Claude Opus 4.7 (1M context) --- skills/prompt-optimizer/SKILL.md | 4 +- skills/prompt-optimizer/SOURCES.md | 29 ++++ skills/prompt-optimizer/references/skills.md | 136 +++++++++++++++++++ skills/prompt-optimizer/references/tools.md | 129 ++++++++++++++++++ 4 files changed, 297 insertions(+), 1 deletion(-) create mode 100644 skills/prompt-optimizer/references/skills.md create mode 100644 skills/prompt-optimizer/references/tools.md diff --git a/skills/prompt-optimizer/SKILL.md b/skills/prompt-optimizer/SKILL.md index 2298cac..c6788f9 100644 --- a/skills/prompt-optimizer/SKILL.md +++ b/skills/prompt-optimizer/SKILL.md @@ -14,6 +14,8 @@ Load only the references you need: |------|------| | Create a new agent prompt | `references/core-patterns.md`, `references/model-family-notes.md`, `references/transformed-examples.md` | | Refine an existing prompt | `references/meta-optimization-loop.md`, `references/core-patterns.md`, `references/model-family-notes.md`, `references/transformed-examples.md` | +| Shape tool disclosure, tool policy, or tool-call narration | `references/tools.md`, `references/core-patterns.md` | +| Shape skill disclosure, invocation, or routing between skills | `references/skills.md`, `references/core-patterns.md` | | Port a prompt between model families | `references/model-family-notes.md`, `references/core-patterns.md` | | Diagnose repeated prompt failures | `references/meta-optimization-loop.md`, `references/core-patterns.md` | | Explain the provenance behind this workflow | `SOURCES.md` | @@ -53,7 +55,7 @@ Read `references/model-family-notes.md`. ## Step 3: Shape the prompt deliberately -Read `references/core-patterns.md`. +Read `references/core-patterns.md`. When the prompt surface includes tools or a skill layer, also read `references/tools.md` or `references/skills.md` respectively. 1. Separate durable behavior from task-local context: - stable policy and behavioral defaults belong in `system` or `developer` diff --git a/skills/prompt-optimizer/SOURCES.md b/skills/prompt-optimizer/SOURCES.md index 6a46391..3bc7249 100644 --- a/skills/prompt-optimizer/SOURCES.md +++ b/skills/prompt-optimizer/SOURCES.md @@ -33,6 +33,16 @@ Why: this skill is a repeatable prompt-optimization workflow with explicit preco | `https://arxiv.org/abs/2303.17651` | research paper | canonical | 2026-04-18 | high | FEEDBACK -> REFINE loop and test-time improvement | research result, not product guarantee | Supports iterative refinement loop | | `https://arxiv.org/abs/2303.11366` | research paper | canonical | 2026-04-18 | high | Reflection memory across trials for agents | research result, not product guarantee | Supports optimization log and reflection memory | | `https://dspy.ai/` | official project docs | canonical | 2026-04-18 | high | Current prompt optimizers such as GEPA and MIPROv2; score-driven instruction search; composable optimization | framework-specific guidance | Supports modern optimizer framing | +| `https://platform.claude.com/docs/en/api/messages` | official docs | canonical | 2026-04-24 | high | Native `tools` parameter mechanics on Anthropic Messages; system-prompt auto-injection of tool definitions | provider-specific | Used in `tools.md` | +| `https://modelcontextprotocol.io/specification/2025-11-25` | standards spec | canonical | 2026-04-24 | high | MCP tool/capability/prompt disclosure; `tools/list` and `tools/list_changed` notifications | spec version-sensitive | Used in `tools.md` and `skills.md` for capability-negotiation pattern | +| `https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview` | official docs | canonical | 2026-04-24 | high | Agent Skills primitive, `SKILL.md` format, frontmatter fields, progressive disclosure | product surface may evolve | Used in `skills.md` | +| `https://agentskills.io/specification` | standards spec | canonical | 2026-04-24 | high | Cross-framework SKILL.md specification, frontmatter schema, portability contract | early-stage spec | Used in `skills.md` | +| `https://code.claude.com/docs/en/skills` | official docs | canonical | 2026-04-24 | medium | Claude Code skill-tool pattern and deferred tool mechanism (`ToolSearch`) | product surface may evolve | Used in `tools.md` and `skills.md` for deferred-disclosure pattern | +| `https://learn.microsoft.com/en-us/semantic-kernel/concepts/plugins/` | official docs | canonical | 2026-04-24 | medium | Plugin/skill primitive; function-calling-native disclosure; plugin-per-folder structure | provider-specific | Used in `skills.md` | +| `https://developers.openai.com/docs/guides/function-calling` | official docs | canonical | 2026-04-24 | high | OpenAI native tool-array disclosure, parallel tool calls, tool search for large suites | product surface may evolve | Used in `tools.md` | +| `https://ai.google.dev/docs/function_calling` | official docs | canonical | 2026-04-24 | high | Gemini native function declarations and tool-array disclosure | product surface may evolve | Used in `tools.md` | +| `https://langchain-ai.github.io/langgraphjs/reference/classes/langgraph_prebuilt.ToolNode.html` | official docs | canonical | 2026-04-24 | medium | LangGraph ToolNode concurrent tool execution; tools-first binding pattern | framework-specific | Used in `tools.md` | +| `https://arxiv.org/abs/2601.04748` | research paper | canonical | 2026-04-24 | medium | Skills-based routing at scale; semantic confusability dominates library-size effects | research result, not product guarantee | Used in `skills.md` for routing guidance | ## Decisions @@ -68,6 +78,22 @@ Why: this skill is a repeatable prompt-optimization workflow with explicit preco Status: adopted Why: Gemini and Anthropic both document that long-context prompts perform better when evidence comes before the final query. +9. Treat tool schemas as disclosed by the provider-native tool array; keep the prompt free of tool-schema restatement. + Status: adopted + Why: Anthropic, OpenAI, Gemini, Semantic Kernel, LangGraph, and MCP all expose tools via a native parameter or protocol handshake. Restating schemas in the system prompt is redundant and costs tokens every turn. + +10. Document progressive and deferred tool disclosure as the scaling pattern beyond ~20 tools. + Status: adopted + Why: Claude Code (`ToolSearch`) and MCP (`tools/list` + `list_changed`) converge on deferred/progressive disclosure as the scaling strategy. Prompt-level enumeration degrades routing and context budget. + +11. Require explicit handling of layered prompts where part of the system prompt is deployer-authored. + Status: adopted + Why: `SOUL.md`/`WORLD.md`-style layered runtimes risk platform rules drifting into deployer-authored files that may be sparse or absent. Platform behavior must survive independent of the deployer layer. + +12. Treat skills as a distinct prompt surface from tools, with their own disclosure, invocation, and lifecycle decisions. + Status: adopted + Why: Skills carry procedural instructions that must reach the model at the right time; tools do not. Disclosure (eager vs lazy vs hybrid), invocation (slash vs meta-tool vs description-match vs implicit), and resumability (does a loaded skill survive a pause?) are all distinct design axes not covered by tool guidance. + ## Coverage matrix | Dimension | Coverage status | Evidence | @@ -81,6 +107,8 @@ Why: this skill is a repeatable prompt-optimization workflow with explicit preco | Safety and escalation boundaries | complete | provider docs plus repo workflow conventions | | Output and acceptance checks | complete | OpenAI prompting and optimizer docs, skill-writer output patterns | | Transformed example artifacts | complete | `references/transformed-examples.md` | +| Tool disclosure and policy | complete | `references/tools.md` | +| Skill disclosure, invocation, and lifecycle | complete | `references/skills.md` | | Future-family coverage beyond OpenAI/Claude/Gemini | partial | currently deferred until there is a concrete repo need | ## Description QA @@ -123,3 +151,4 @@ Further retrieval is currently low-yield for this first version. The source pack - 2026-04-18: Created the initial `prompt-optimizer` skill, references, and provenance record. - 2026-04-18: Added an explicit prompt learnings pass covering compaction, deduplication, and context ordering. - 2026-04-18: Folded the prompt learnings back into the core shaping and iteration guidance to keep the workflow compact. +- 2026-04-24: Added `references/tools.md` (tool disclosure, policy, deferred/progressive disclosure, tool-count ceilings, narration, error policy) and `references/skills.md` (skill vs tool, eager/lazy/hybrid disclosure, invocation conventions, platform-vs-deployer layering, skill-bundled tools, routing, resumable-session lifecycle). diff --git a/skills/prompt-optimizer/references/skills.md b/skills/prompt-optimizer/references/skills.md new file mode 100644 index 0000000..9586082 --- /dev/null +++ b/skills/prompt-optimizer/references/skills.md @@ -0,0 +1,136 @@ +# Skills + +Use this file when designing how an agent prompt discloses, invokes, and routes between skills. A skill here means a named capability bundle that carries procedural instructions — not a bare function-calling tool. Examples: Anthropic Agent Skills and SKILL.md files, Semantic Kernel plugins, MCP prompts, CrewAI agent specializations. + +## Contents + +- Skill vs tool +- Disclosure: eager, lazy, or hybrid +- Invocation conventions +- Platform-authored vs deployer-authored skills +- Skill-bundled tools +- Routing between adjacent skills +- Lifecycle across resumable sessions +- What belongs in a skill vs the core prompt +- Anti-patterns + +## Skill vs tool + +A tool is a single callable function with a schema. A skill is a named bundle that typically includes: + +- a trigger description (used for routing and discovery) +- when-to-use guidance +- procedural steps or reference material +- optionally bundled tools (e.g., an MCP sub-server, per-skill CLI commands) +- optionally declared configuration keys (`uses-config`) +- optionally an `allowed-tools` restriction + +If the capability is "call this function with these arguments," it is a tool. If the capability is "run this multi-step procedure, which may involve several tools and references," it is a skill. + +For prompt-author decisions the main difference is procedural content: skill bodies must reach the model at the right time. Tools don't carry procedural content — the schema is enough. + +## Disclosure: eager, lazy, or hybrid + +Three viable disclosure patterns for skills. Each has clear tradeoffs; pick one per product and state it in the prompt. + +1. **Eager inline.** Every loaded skill's body is embedded in the system prompt. Best when skill bodies are small, the runtime knows which skills are relevant per turn, and turns may need to resume across pauses (the skill body must survive resumption without additional tool calls). + +2. **Lazy index.** The system prompt carries only a short index of available skills (name, description, path). The model calls a meta-tool (`loadSkill`, MCP `prompts/get`, or equivalent) to fetch a body on demand. Best when there are many skills and only a few are relevant per turn. + +3. **Hybrid.** An always-emitted `` index of discoverable skills plus an always-emitted `` section with the bodies of skills activated this turn. Best when both routing and execution are needed in one prompt, and the runtime can pre-activate skills based on user intent or explicit invocation. + +Rule of thumb: if the turn pauses and resumes (OAuth, long-running tool, timeout retry), skill bodies must still be present after the resume. A pure lazy model forces the runtime to re-trigger loading, which is brittle under interruption. + +Lazy disclosure saves tokens but costs latency and adds a failure mode (the model may not realize it needs to load). Eager costs tokens but is robust. Hybrid balances both. + +Token shape for reference: roughly 100 tokens of metadata per available skill, plus the full body per loaded skill (often 300–2000 tokens). + +## Invocation conventions + +Four invocation styles, in roughly declining specificity: + +- **Slash command** (`/skillname args`). Explicit, easy to parse, unambiguous. Use when users know which skill they want. Junior and Claude Code both recognize this form. +- **Meta-tool by name** (`loadSkill({ name: "..." })`, MCP `prompts/get`). The model has the index and decides based on task description. Good when selection is the model's job. +- **Description matching** (no special syntax; model reads the index and picks by semantic match). Unreliable above a dozen adjacent skills; failure mode is silent mis-routing. +- **Implicit / always-on** (`.cursorrules`, per-directory instruction files, persistent persona). No explicit invocation; applies whenever the scope matches. Cannot be declined by the model. + +Pick one primary convention per product. Mixing (`/candidate-brief` and natural-language matching both supported) is fine if the prompt states both. Do not require the user to memorize which skills are slash-triggered and which are natural-language. + +## Platform-authored vs deployer-authored skills + +Layered runtimes ship two kinds of skill content: + +- **Platform skills**, authored by the framework team. Stable contract. Platform behavior rules may depend on these. +- **Deployer skills**, authored by the install owner (per-customer `SKILL.md`, persona files, `WORLD.md`, `SOUL.md`). Content varies across installs. + +Platform behavior rules must never depend on deployer skills being present or containing anything specific. A platform rule like "gather evidence before answering" belongs in the platform prompt; do not assume any deployer-authored file covers it. See `core-patterns.md` / "Layered prompts with multiple owners" for the general principle. + +Conversely, deployer skills should not encode platform behavior. They are voice, domain context, and organization-specific procedures. + +Litmus test: if you replace the deployer layer with a five-line minimal persona, does every platform-level rule still fire? If not, a platform rule has drifted into the deployer layer and needs to move back. + +## Skill-bundled tools + +Many skills expose their own tools (MCP sub-server functions, shell helpers, auth-gated provider calls). Two requirements for the prompt and the runtime: + +1. **Registration timing.** Tools bundled with a skill reach the native tool array only after the skill loads. The prompt must either (a) only disclose those tools after skill activation, or (b) note that the tools exist but are dormant until activation. +2. **Namespacing.** Skill-bundled tools should be namespaced (e.g., `mcp__github__search_issues`) so the model can distinguish skill-tools from platform-tools. Otherwise natural-language references to "the X tool" become ambiguous. + +Do not eagerly register every skill's bundled tools at harness startup; that inflates the tool inventory and contradicts whatever disclosure pattern you picked above. + +See `references/tools.md` for broader tool-disclosure guidance. + +## Routing between adjacent skills + +When two skills match a request, the model must pick one. Three routing hints that work: + +- **Action-over-noun descriptions.** Skills described by operation route better than skills described by surface-level noun or product. `"Creates pull requests from a branch"` routes better than `"GitHub integration"`. +- **Explicit disambiguation in the prompt.** One bullet, not many: "Pick the skill whose description matches the requested operation, not incidental nouns or product names." +- **Hierarchical categories (for large skill sets).** Partition the index into categories so the model routes category-then-skill rather than picking among dozens of flat entries. Research on skills-based routing shows semantic confusability, not library size, is the bottleneck; flat selection with many similar descriptions degrades before a hierarchy of the same size does. + +If adjacent skills routinely mis-route, the fix is usually in the descriptions, not the prompt body. + +## Lifecycle across resumable sessions + +When a turn pauses (OAuth flow, long-running tool, timeout-and-resume) and later resumes, the prompt is rebuilt from scratch. Three things must hold: + +1. Skills loaded before the pause are still listed as loaded after resumption. +2. The model has an explicit "this is a resume" signal — and the rule about how to behave on resume lives in the canonical rules section, not only in a state marker (see `core-patterns.md` / "Where rules live"). +3. Skill-bundled tools are re-registered to the native tool array on resume. + +Without all three, the model may act as if starting fresh and repeat work, or fail to use a tool that was available pre-pause. No cross-framework standard yet exists for serializing "which skills were loaded" across resumption; each runtime solves this in its checkpoint format. + +## What belongs in a skill vs the core prompt + +**Belongs in a skill:** + +- multi-step procedures with branching logic +- reference material (syntax, schemas, command lists) only needed when the skill is active +- tool restrictions specific to this procedure (`allowed-tools`) +- configuration keys only this skill reads (`uses-config`) +- domain-specific guardrails ("require explicit confirmation before destructive merges") + +**Belongs in the core prompt:** + +- cross-cutting behavior (evidence first, never claim success without a tool result) +- output contract (format, length, markdown style) +- identity, role, tone (with deployer-authored personality blocks extending but not replacing platform behavior) +- skill routing and disclosure rules themselves +- resumption signals and session-state markers + +Test: if you removed a piece of content from every skill, would the platform still be correct? If yes, that content probably belongs in a skill. If no, it belongs in the core prompt. + +Meta-skills whose value is their body (prompt-engineering guidelines, style guides, reference atlases) break the lazy disclosure pattern — they need to be read, not called. Some frameworks support an eager flag in frontmatter for exactly this case. + +## Anti-patterns + +| Symptom | Fix | +|---------|-----| +| Platform prompt relies on a deployer persona file to encode behavior rules | Move the rules into the platform prompt; treat deployer files as voice-only | +| A skill body is copy-pasted into the platform prompt as permanent content | Treat it as a skill; disclose eagerly only when loaded for this turn | +| Two skills match a request; the model picks inconsistently | Tighten descriptions to name the operation, not the noun or domain | +| Resumed turn forgets a skill was loaded | Ensure the skill loader re-runs on resume; surface a turn-state marker *and* the behavior rule about resumes | +| Skill-bundled tools leak into the native tool array before the skill loads | Gate tool registration on skill activation; do not pre-load | +| Model claims to have "used" a skill it never loaded | Add a rule: "Never apply skill-specific behavior unless the skill is in `` or `loadSkill` succeeded this turn" | +| Skill index has 50+ entries with similar descriptions | Partition into categories or combine near-duplicates; routing accuracy degrades with confusable descriptions | +| Meta-skill (style guide, glossary) is lazy-loaded and rarely called | Mark it eager or inline it; meta-skills are unusable under pure lazy disclosure | diff --git a/skills/prompt-optimizer/references/tools.md b/skills/prompt-optimizer/references/tools.md new file mode 100644 index 0000000..c2b9fa7 --- /dev/null +++ b/skills/prompt-optimizer/references/tools.md @@ -0,0 +1,129 @@ +# Tools + +Use this file when designing how an agent prompt refers to tools, when to suppress tool enumeration, and how to write tool policy. Covers what belongs in the prompt vs. the provider-native tool array, progressive and deferred disclosure, tool-count ceilings, and tool-call narration. + +## Contents + +- Where tools are actually disclosed +- What the prompt should say about tools +- Progressive and deferred disclosure +- Tool count and context budget +- Tool-call narration +- Parallel and sequential tool calls +- Tool-call error policy +- Skill-bundled and runtime-gated tools +- Anti-patterns + +## Where tools are actually disclosed + +Modern provider APIs (Anthropic Messages, OpenAI Responses and Chat Completions, Gemini function calling, Bedrock Converse) transmit tool schemas to the model via a native `tools` parameter on the request. Each entry carries the tool's name, description, and a JSON Schema for arguments. The provider presents these to the model as first-class callable functions; no system-prompt text is needed to make a tool visible. + +On Anthropic, the API assembles a system prompt that automatically injects the definitions from `tools` alongside the caller's system text. Pi-agent-core, Codex CLI, and other well-tuned harnesses pass tools natively and leave schema restatement out of the prompt body. + +Two consequences: + +1. Re-listing tool names or descriptions inside the system prompt is redundant and costs tokens every turn. +2. The `description` field in the native schema is the primary place to shape when the model reaches for each tool. Prose about a tool in the system prompt is a second-order effect on top of the description. + +Exceptions where prompt-level tool mention is warranted: + +- **Deferred tools**. The tool name appears in a short index (often a system reminder), but the schema loads only when the model calls a meta-tool such as `ToolSearch`. The prompt must surface both the fact that deferred tools exist and the fetch mechanism; otherwise the model will never call it. +- **Just-activated tools**. When the native tool array changes mid-session (skill load, auth completion), a one-line "these tools are now registered" reminder may be justified. Skip it when the change is already obvious from conversation history. + +## What the prompt should say about tools + +Reserve prompt-level text for behavior the tool `description` cannot encode: + +- Ordering and preconditions: "Run `searchIndex` before `editFile`." +- Cross-tool policy: "Gather evidence with tools before answering factual questions." +- Negative rules: "Never call side-effecting tools when the user asked only for analysis." +- Escalation: "When a destructive tool returns an ambiguous result, ask before retrying." + +Do not put in the prompt: + +- Tool names, signatures, or argument schemas. +- Generic "use tools when appropriate" without a real trigger. +- "When to use tool X" — that belongs in tool X's `description`. + +If the same rule applies to every tool, write it once as a general behavior rule. If the rule is tool-specific, push it into the tool's native `description`. + +## Progressive and deferred disclosure + +When a runtime can expose more than ~20 tools, per-turn token cost and routing reliability both degrade. Two patterns scale better: + +1. **Lazy schema via meta-tool.** Tool names appear in a short index (often delivered as a system reminder), but full schemas load only on request. Claude Code's `ToolSearch` is the reference implementation: a deferred tool is visible by name, and calling the meta-tool fetches its schema as a callable tool for the rest of the session. The prompt must explicitly say that deferred tools exist and how to fetch them. + +2. **Capability negotiation.** MCP clients list available tools at connection time via `tools/list`, and servers advertise changes via `notifications/tools/list_changed`. The harness can filter which tools are exposed per turn (by skill, scope, or auth state). The prompt does not need to re-describe the inventory. + +Do not combine both mechanisms for the same tool surface in one prompt. Pick one model and state it. Progressive disclosure with deferred fetch only helps when the model knows to fetch; if the prompt implies all tools are already loaded, the model will never call the meta-tool. + +## Tool count and context budget + +No provider publishes a hard ceiling; plan for your model family and measure. Rough shape: + +- ≤10 tools: routing is reliable without extra structure. +- 10–20 tools: ensure descriptions are high-signal (not just names) and disambiguate overlap. +- Above ~20: expect routing noise. Use deferred disclosure, scope tools per skill, or gate by auth state. +- Per-tool schema overhead: ~100 tokens. 100 tools eagerly disclosed ≈ ~10k tokens of prompt budget repeated every turn. + +When a user reports "the model picked the wrong tool," the fix is almost always in the description of the intended tool and the competitors, not in the system prompt. + +## Tool-call narration + +Streaming, assistant-status, and Slack-style UIs surface tool-call events as UI. Narration in the assistant's text reply restates what the UI already shows and pushes the first useful message later. Even without streaming UI, narration inflates output tokens and hurts judged output quality. + +Write these rules in the canonical behavior section: + +- Do not narrate in advance ("Let me check...", "Fetching...", "Looking this up..."). +- Do not restate what the tool is about to do. +- Send an interim reply only when blocked or waiting on user input. + +Placement matters. A "no narration" rule buried inside a descriptive marker (``, ``, ``) is reliably weaker than the same rule in `` or ``. See `core-patterns.md` / "Where rules live." + +## Parallel and sequential tool calls + +Parallel tool use is a provider capability, not a prompt feature. Anthropic defaults to parallel and exposes `disable_parallel_tool_use`; OpenAI auto-parallels unless `parallel_tool_calls: false`; LangGraph's `ToolNode` runs concurrent calls. The prompt cannot enable or disable parallelism; it can only influence choice. + +When the tool flow is order-dependent, say so explicitly: + +- "Run `readFile` for every referenced path before writing `edit`." +- "Do not call `postMessage` until the analysis tools have resolved." + +When parallelism is safe and desired, do not write any rule about it. Providers default to parallel; silent defaults are cheaper than restating them. + +## Tool-call error policy + +Tool errors fall into a small set, each handled in a different place: + +- **Transient** (network, rate limit, timeout) — the runtime retries; the prompt should not tell the model to retry. +- **Auth / permission** — usually a pause-and-resume handled by the runtime; the prompt may state how the model should behave after auth completes. +- **Malformed input** — the model produced bad arguments; it should re-attempt with corrected input, not narrate the retry. +- **Real failure** — external state disallows the action; the model should report plainly and not pretend success. + +Write one cross-cutting rule that protects against hallucinated outcomes: + +> Do not claim a tool call succeeded unless the tool returned a success result this turn. When it did and the tool returned a link or identifier, include it. + +This is more robust than enumerating error codes. It also doubles as a honesty rule around attachments, canvases, and channel posts. + +## Skill-bundled and runtime-gated tools + +Skills often bundle their own tools (MCP sub-servers, skill-specific shell commands, provider SDKs). Two clarity requirements: + +1. **Registration timing.** Skill-bundled tools should reach the native tool array only after the skill loads. Pre-loading every skill's tools inflates the tool inventory and breaks whatever routing ceiling you were planning for. +2. **Namespacing.** Skill-bundled tools should be namespaced (e.g., `mcp__sentry__search_issues`) so the model can distinguish skill-tools from platform-tools at a glance. + +For broader guidance on how skills disclose themselves and persist across resumable turns, see `references/skills.md`. + +## Anti-patterns + +| Symptom | Fix | +|---------|-----| +| System prompt re-lists tool names and descriptions that already appear in the native tool array | Delete the listing; promote any tool-selection hints into the tool's `description` field | +| "Use tool X when Y" appears in both prompt and tool description | Keep the rule in the `description`; remove it from the prompt | +| Prompt has a rule about a specific tool that no longer exists | Delete it; tool-specific rules drift silently when tools are renamed | +| Model narrates tool calls despite a "no narration" rule | Check placement — the rule must live in the rules section, not inside a context or state block | +| Model never discovers a deferred tool | The prompt is missing the "deferred tools exist; fetch via ``" disclosure | +| 50+ tools, slow or confused routing | Scope tools per skill, per auth state, or gate behind a meta-tool; do not flatten everything into one array | +| Assistant claims an attachment or canvas succeeded when the tool failed | Add the cross-cutting success-only rule under Tool-call error policy above | +| Tool flow needs strict ordering but runs in parallel | State the ordering explicitly in the rules section; do not rely on provider defaults |