getsentry · dcramer · Apr 24, 2026 · Apr 24, 2026 · Apr 24, 2026 · Apr 24, 2026
diff --git a/skills/prompt-optimizer/SKILL.md b/skills/prompt-optimizer/SKILL.md
@@ -14,6 +14,8 @@ Load only the references you need:
 |------|------|
 | Create a new agent prompt | `references/core-patterns.md`, `references/model-family-notes.md`, `references/transformed-examples.md` |
 | Refine an existing prompt | `references/meta-optimization-loop.md`, `references/core-patterns.md`, `references/model-family-notes.md`, `references/transformed-examples.md` |
+| Shape tool disclosure, tool policy, or tool-call narration | `references/tools.md`, `references/core-patterns.md` |
+| Shape skill disclosure, invocation, or routing between skills | `references/skills.md`, `references/core-patterns.md` |
 | Port a prompt between model families | `references/model-family-notes.md`, `references/core-patterns.md` |
 | Diagnose repeated prompt failures | `references/meta-optimization-loop.md`, `references/core-patterns.md` |
 | Explain the provenance behind this workflow | `SOURCES.md` |
@@ -53,11 +55,12 @@ Read `references/model-family-notes.md`.
 
 ## Step 3: Shape the prompt deliberately
 
-Read `references/core-patterns.md`.
+Read `references/core-patterns.md`. When the prompt surface includes tools or a skill layer, also read `references/tools.md` or `references/skills.md` respectively.
 
 1. Separate durable behavior from task-local context:
 - stable policy and behavioral defaults belong in `system` or `developer`
 - variable inputs, retrieved context, and task instances belong in templated user-facing sections
+- when the system prompt is assembled at runtime from a platform layer and a deployer-authored persona layer (e.g., `SOUL.md`, `CLAUDE.md`, `AGENTS.md`), see "Layered prompts with multiple owners" in `references/core-patterns.md` — platform behavior rules must not depend on what the deployer layer contains
 
 2. Keep one authoritative instruction per behavior:
 - if a rule appears in more than one layer, choose one owner for it

diff --git a/skills/prompt-optimizer/SOURCES.md b/skills/prompt-optimizer/SOURCES.md
@@ -33,6 +33,16 @@ Why: this skill is a repeatable prompt-optimization workflow with explicit preco
 | `https://arxiv.org/abs/2303.17651` | research paper | canonical | 2026-04-18 | high | FEEDBACK -> REFINE loop and test-time improvement | research result, not product guarantee | Supports iterative refinement loop |
 | `https://arxiv.org/abs/2303.11366` | research paper | canonical | 2026-04-18 | high | Reflection memory across trials for agents | research result, not product guarantee | Supports optimization log and reflection memory |
 | `https://dspy.ai/` | official project docs | canonical | 2026-04-18 | high | Current prompt optimizers such as GEPA and MIPROv2; score-driven instruction search; composable optimization | framework-specific guidance | Supports modern optimizer framing |
+| `https://platform.claude.com/docs/en/api/messages` | official docs | canonical | 2026-04-24 | high | Native `tools` parameter mechanics on Anthropic Messages; system-prompt auto-injection of tool definitions | provider-specific | Used in `tools.md` |
+| `https://modelcontextprotocol.io/specification/2025-11-25` | standards spec | canonical | 2026-04-24 | high | MCP tool/capability/prompt disclosure; `tools/list` and `tools/list_changed` notifications | spec version-sensitive | Used in `tools.md` and `skills.md` for capability-negotiation pattern |
+| `https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview` | official docs | canonical | 2026-04-24 | high | Agent Skills primitive, `SKILL.md` format, frontmatter fields, progressive disclosure | product surface may evolve | Used in `skills.md` |
+| `https://agentskills.io/specification` | standards spec | canonical | 2026-04-24 | high | Cross-framework SKILL.md specification, frontmatter schema, portability contract | early-stage spec | Used in `skills.md` |
+| `https://code.claude.com/docs/en/skills` | official docs | canonical | 2026-04-24 | medium | Claude Code skill-tool pattern and deferred tool mechanism (`ToolSearch`) | product surface may evolve | Used in `tools.md` and `skills.md` for deferred-disclosure pattern |
+| `https://learn.microsoft.com/en-us/semantic-kernel/concepts/plugins/` | official docs | canonical | 2026-04-24 | medium | Plugin/skill primitive; function-calling-native disclosure; plugin-per-folder structure | provider-specific | Used in `skills.md` |
+| `https://developers.openai.com/docs/guides/function-calling` | official docs | canonical | 2026-04-24 | high | OpenAI native tool-array disclosure, parallel tool calls, tool search for large suites | product surface may evolve | Used in `tools.md` |
+| `https://ai.google.dev/docs/function_calling` | official docs | canonical | 2026-04-24 | high | Gemini native function declarations and tool-array disclosure | product surface may evolve | Used in `tools.md` |
+| `https://langchain-ai.github.io/langgraphjs/reference/classes/langgraph_prebuilt.ToolNode.html` | official docs | canonical | 2026-04-24 | medium | LangGraph ToolNode concurrent tool execution; tools-first binding pattern | framework-specific | Used in `tools.md` |
+| `https://arxiv.org/abs/2601.04748` | research paper | canonical | 2026-04-24 | medium | Skills-based routing at scale; semantic confusability dominates library-size effects | research result, not product guarantee | Used in `skills.md` for routing guidance |
 
 ## Decisions
 
@@ -68,6 +78,22 @@ Why: this skill is a repeatable prompt-optimization workflow with explicit preco
    Status: adopted
    Why: Gemini and Anthropic both document that long-context prompts perform better when evidence comes before the final query.
 
+9. Treat tool schemas as disclosed by the provider-native tool array; keep the prompt free of tool-schema restatement.
+   Status: adopted
+   Why: Anthropic, OpenAI, Gemini, Semantic Kernel, LangGraph, and MCP all expose tools via a native parameter or protocol handshake. Restating schemas in the system prompt is redundant and costs tokens every turn.
+
+10. Document progressive and deferred tool disclosure as the scaling pattern beyond ~20 tools.
+    Status: adopted
+    Why: Claude Code (`ToolSearch`) and MCP (`tools/list` + `list_changed`) converge on deferred/progressive disclosure as the scaling strategy. Prompt-level enumeration degrades routing and context budget.
+
+11. Require explicit handling of layered prompts where part of the system prompt is deployer-authored.
+    Status: adopted
+    Why: `SOUL.md`/`WORLD.md`-style layered runtimes risk platform rules drifting into deployer-authored files that may be sparse or absent. Platform behavior must survive independent of the deployer layer.
+
+12. Treat skills as a distinct prompt surface from tools, with their own disclosure, invocation, and lifecycle decisions.
+    Status: adopted
+    Why: Skills carry procedural instructions that must reach the model at the right time; tools do not. Disclosure (eager vs lazy vs hybrid), invocation (slash vs meta-tool vs description-match vs implicit), and resumability (does a loaded skill survive a pause?) are all distinct design axes not covered by tool guidance.
+
 ## Coverage matrix
 
 | Dimension | Coverage status | Evidence |
@@ -81,6 +107,8 @@ Why: this skill is a repeatable prompt-optimization workflow with explicit preco
 | Safety and escalation boundaries | complete | provider docs plus repo workflow conventions |
 | Output and acceptance checks | complete | OpenAI prompting and optimizer docs, skill-writer output patterns |
 | Transformed example artifacts | complete | `references/transformed-examples.md` |
+| Tool disclosure and policy | complete | `references/tools.md` |
+| Skill disclosure, invocation, and lifecycle | complete | `references/skills.md` |
 | Future-family coverage beyond OpenAI/Claude/Gemini | partial | currently deferred until there is a concrete repo need |
 
 ## Description QA
@@ -123,3 +151,4 @@ Further retrieval is currently low-yield for this first version. The source pack
 - 2026-04-18: Created the initial `prompt-optimizer` skill, references, and provenance record.
 - 2026-04-18: Added an explicit prompt learnings pass covering compaction, deduplication, and context ordering.
 - 2026-04-18: Folded the prompt learnings back into the core shaping and iteration guidance to keep the workflow compact.
+- 2026-04-24: Added `references/tools.md` (tool disclosure, policy, deferred/progressive disclosure, tool-count ceilings, narration, error policy) and `references/skills.md` (skill vs tool, eager/lazy/hybrid disclosure, invocation conventions, platform-vs-deployer layering, skill-bundled tools, routing, resumable-session lifecycle).
diff --git a/skills/prompt-optimizer/references/core-patterns.md b/skills/prompt-optimizer/references/core-patterns.md
@@ -18,7 +18,7 @@ Good section names are concrete and stable:
 - `<role>`
 - `<goal>`
 - `<context>`
-- `<tools>`
+- `<tool_policy>`
 - `<workflow>`
 - `<constraints>`
 - `<output_format>`
@@ -27,6 +27,29 @@ Do not add markup around every sentence. Markers are useful when they carve the
 
 If the target stack or model family responds better to plain markdown, use headings and bullets instead of XML-style tags. The structure matters more than the syntax.
 
+### Where rules live
+
+Markers signal to the model what kind of content a block carries. Descriptive
+or state markers (`<context>`, `<state>`, `<turn-state>`, `<environment>`,
+`<artifact-state>`) read as facts about the situation — data, not policy.
+Canonical rules markers (`<behavior>`, `<constraints>`, `<tool_policy>`,
+`<workflow>`) read as directives the model should follow.
+
+A directive buried in a descriptive block can underperform the same directive
+placed in a rules block, especially for state-conditional rules. Observed in
+the field: a resume-notice instruction placed inside `<turn-state>resumed</turn-state>`
+scored 0.5 on the relevant eval; the identical sentence moved into
+`<behavior>` passed at ≥0.75 with no other change.
+
+Rules of thumb:
+
+- keep descriptive markers descriptive — put facts about the situation there,
+  not directives
+- directives live in a canonical rules section
+- for state-conditional rules, phrase them in the rules section and reference
+  the state by name: "When `<turn-state>` is `resumed`, post a brief
+  continuation notice, then answer."
+
 ## Layer the prompt correctly
 
 Keep these layers separate:
@@ -65,14 +88,55 @@ When prompts are long, separate policy from evidence explicitly:
 - instructions in one block
 - retrieved documents in another
 - examples in another
-- tool rules and schemas in their own labeled sections
+- tool policy (when/why/whether) in its own labeled section; tool schemas stay in the provider-native tools parameter
 
 For long-context prompts, place long evidence before the final query and keep the actual ask in a terminal section.
 Do not cargo-cult this ordering into short prompts that do not need it.
 
+### Layered prompts with multiple owners
+
+The layers above assume a single author owns the whole system prompt. Many
+runtimes concatenate the system prompt from multiple layers with different
+owners at request time:
+
+- a **platform layer** owned by the product or framework team (harness rules,
+  tool-use policy, output contract, safety boundaries)
+- a **deployer or persona layer** authored by the downstream user or customer
+  (voice, tone, identity files such as `SOUL.md`, `CLAUDE.md`, `AGENTS.md`)
+
+When this is the case, treat the deployer layer as **voice-only**:
+
+- every platform behavior rule — evidence gathering, tool-use policy, narration
+  rules, output contract, escalation boundaries — must live in the
+  platform-owned layer and must still fire if the deployer layer is empty,
+  five lines of voice, or customized in unexpected ways
+- do not delete a platform bullet on the assumption that a persona file
+  "probably covers it"; deployers ship sparse persona files in practice
+- if a rule is load-bearing, it belongs in the platform layer by default;
+  the deployer layer gets voice and domain framing, not policy
+
+Hermes Agent, OpenClaw, and similar SOUL.md-style frameworks use this split
+explicitly: platform behavior is code-level, SOUL.md carries identity and
+tone, and the platform falls back to a built-in default identity if SOUL.md
+is absent or sparse. Mirror that invariant whenever a prompt is assembled
+from more than one authorship layer.
+
 ## Portable agent prompt skeleton
 
-Use this as a starting point and adapt it:
+Use this as a starting point and adapt it.
+
+Tool schemas are disclosed to the model by the provider-native tools parameter
+(Anthropic `tools`, OpenAI `tools`, Gemini `tools`). On Anthropic this is
+explicit — the API constructs a special system prompt that injects the tool
+definitions from the `tools` parameter alongside the user-authored system
+prompt. Well-tuned harnesses (Codex CLI, pi-agent-core) pass tools natively
+and keep the prompt text free of schema restatements.
+
+The prompt text should carry tool *policy* — when to call tools, when to avoid
+them, what evidence to gather before acting — not a restated list of tool
+names or argument schemas. Naming a specific tool in a policy rule ("prefer
+`Read` over a `Bash` cat") is fine; re-enumerating the tool inventory or its
+schemas is not.
 
 ```text
 <role>
@@ -91,11 +155,11 @@ Available files or documents:
 Known constraints:
 </context>
 
-<tools>
-Available tools:
-When to use them:
-When to avoid them:
-</tools>
+<tool_policy>
+When to use tools:
+When to avoid tools:
+Evidence to gather before acting:
+</tool_policy>
 
 <workflow>
 1. Clarify only if required.
@@ -130,6 +194,7 @@ Use markdown headings instead of tags if that fits the target stack better.
 - Keep progress-update style explicit if the user should see it.
 - Use the shortest wording that preserves the intended behavioral constraint.
 - Remove persona, motivation, or reminder text that does not change measured behavior.
+- Place directives in canonical rules sections (`<behavior>`, `<constraints>`, `<tool_policy>`, `<workflow>`), not buried inside descriptive markers like `<context>`, `<state>`, or `<turn-state>`.
 
 ## Examples
 

diff --git a/skills/prompt-optimizer/references/model-family-notes.md b/skills/prompt-optimizer/references/model-family-notes.md
@@ -12,6 +12,7 @@ Use this file to adapt prompts to model behavior instead of assuming all model f
 - Use delimiters such as markdown headings, XML tags, or section titles when the prompt mixes multiple content blocks.
 - Try zero-shot first. Add few-shot examples only when the output contract or edge cases need them.
 - Be explicit about constraints, success criteria, and completion conditions.
+- Tool schemas are disclosed via the Responses API `tools` parameter. Keep tool policy (when/why/whether to call) in the prompt; do not restate tool names or argument schemas.
 
 ### GPT-style non-reasoning models
 
@@ -29,6 +30,7 @@ Use this file to adapt prompts to model behavior instead of assuming all model f
 - For long context, place long documents before the question and put the actual query near the end.
 - When grounding in long documents, asking for relevant quotes first can improve downstream analysis.
 - If tool use or progress-update behavior matters, specify it explicitly rather than assuming the model will infer it.
+- When you call the Messages API with `tools`, the API injects the tool definitions into a special system prompt automatically. Keep your user-authored system prompt focused on policy; put tool detail in each tool's `description` field rather than re-listing schemas in prose.
 
 ## Gemini
 
@@ -39,6 +41,7 @@ Use this file to adapt prompts to model behavior instead of assuming all model f
 - Use system instructions when the target runtime supports them.
 - Thinking is dynamic by default on modern Gemini thinking models; tune it only when latency or deeper reasoning warrants it.
 - Gemini long-context workflows can benefit from many-shot in-context learning when you have a large bank of representative examples.
+- Tool schemas are disclosed via the Gemini API `tools` (function declarations) parameter. Keep the prompt focused on tool policy; do not re-list function names or parameter schemas.
 
 ## Cross-family adapter rules