Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion skills/prompt-optimizer/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ Load only the references you need:
|------|------|
| Create a new agent prompt | `references/core-patterns.md`, `references/model-family-notes.md`, `references/transformed-examples.md` |
| Refine an existing prompt | `references/meta-optimization-loop.md`, `references/core-patterns.md`, `references/model-family-notes.md`, `references/transformed-examples.md` |
| Shape tool disclosure, tool policy, or tool-call narration | `references/tools.md`, `references/core-patterns.md` |
| Shape skill disclosure, invocation, or routing between skills | `references/skills.md`, `references/core-patterns.md` |
| Port a prompt between model families | `references/model-family-notes.md`, `references/core-patterns.md` |
| Diagnose repeated prompt failures | `references/meta-optimization-loop.md`, `references/core-patterns.md` |
| Explain the provenance behind this workflow | `SOURCES.md` |
Expand Down Expand Up @@ -53,11 +55,12 @@ Read `references/model-family-notes.md`.

## Step 3: Shape the prompt deliberately

Read `references/core-patterns.md`.
Read `references/core-patterns.md`. When the prompt surface includes tools or a skill layer, also read `references/tools.md` or `references/skills.md` respectively.

1. Separate durable behavior from task-local context:
- stable policy and behavioral defaults belong in `system` or `developer`
- variable inputs, retrieved context, and task instances belong in templated user-facing sections
- when the system prompt is assembled at runtime from a platform layer and a deployer-authored persona layer (e.g., `SOUL.md`, `CLAUDE.md`, `AGENTS.md`), see "Layered prompts with multiple owners" in `references/core-patterns.md` — platform behavior rules must not depend on what the deployer layer contains

2. Keep one authoritative instruction per behavior:
- if a rule appears in more than one layer, choose one owner for it
Expand Down
29 changes: 29 additions & 0 deletions skills/prompt-optimizer/SOURCES.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,16 @@ Why: this skill is a repeatable prompt-optimization workflow with explicit preco
| `https://arxiv.org/abs/2303.17651` | research paper | canonical | 2026-04-18 | high | FEEDBACK -> REFINE loop and test-time improvement | research result, not product guarantee | Supports iterative refinement loop |
| `https://arxiv.org/abs/2303.11366` | research paper | canonical | 2026-04-18 | high | Reflection memory across trials for agents | research result, not product guarantee | Supports optimization log and reflection memory |
| `https://dspy.ai/` | official project docs | canonical | 2026-04-18 | high | Current prompt optimizers such as GEPA and MIPROv2; score-driven instruction search; composable optimization | framework-specific guidance | Supports modern optimizer framing |
| `https://platform.claude.com/docs/en/api/messages` | official docs | canonical | 2026-04-24 | high | Native `tools` parameter mechanics on Anthropic Messages; system-prompt auto-injection of tool definitions | provider-specific | Used in `tools.md` |
| `https://modelcontextprotocol.io/specification/2025-11-25` | standards spec | canonical | 2026-04-24 | high | MCP tool/capability/prompt disclosure; `tools/list` and `tools/list_changed` notifications | spec version-sensitive | Used in `tools.md` and `skills.md` for capability-negotiation pattern |
| `https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview` | official docs | canonical | 2026-04-24 | high | Agent Skills primitive, `SKILL.md` format, frontmatter fields, progressive disclosure | product surface may evolve | Used in `skills.md` |
| `https://agentskills.io/specification` | standards spec | canonical | 2026-04-24 | high | Cross-framework SKILL.md specification, frontmatter schema, portability contract | early-stage spec | Used in `skills.md` |
| `https://code.claude.com/docs/en/skills` | official docs | canonical | 2026-04-24 | medium | Claude Code skill-tool pattern and deferred tool mechanism (`ToolSearch`) | product surface may evolve | Used in `tools.md` and `skills.md` for deferred-disclosure pattern |
| `https://learn.microsoft.com/en-us/semantic-kernel/concepts/plugins/` | official docs | canonical | 2026-04-24 | medium | Plugin/skill primitive; function-calling-native disclosure; plugin-per-folder structure | provider-specific | Used in `skills.md` |
| `https://developers.openai.com/docs/guides/function-calling` | official docs | canonical | 2026-04-24 | high | OpenAI native tool-array disclosure, parallel tool calls, tool search for large suites | product surface may evolve | Used in `tools.md` |
| `https://ai.google.dev/docs/function_calling` | official docs | canonical | 2026-04-24 | high | Gemini native function declarations and tool-array disclosure | product surface may evolve | Used in `tools.md` |
| `https://langchain-ai.github.io/langgraphjs/reference/classes/langgraph_prebuilt.ToolNode.html` | official docs | canonical | 2026-04-24 | medium | LangGraph ToolNode concurrent tool execution; tools-first binding pattern | framework-specific | Used in `tools.md` |
| `https://arxiv.org/abs/2601.04748` | research paper | canonical | 2026-04-24 | medium | Skills-based routing at scale; semantic confusability dominates library-size effects | research result, not product guarantee | Used in `skills.md` for routing guidance |

## Decisions

Expand Down Expand Up @@ -68,6 +78,22 @@ Why: this skill is a repeatable prompt-optimization workflow with explicit preco
Status: adopted
Why: Gemini and Anthropic both document that long-context prompts perform better when evidence comes before the final query.

9. Treat tool schemas as disclosed by the provider-native tool array; keep the prompt free of tool-schema restatement.
Status: adopted
Why: Anthropic, OpenAI, Gemini, Semantic Kernel, LangGraph, and MCP all expose tools via a native parameter or protocol handshake. Restating schemas in the system prompt is redundant and costs tokens every turn.

10. Document progressive and deferred tool disclosure as the scaling pattern beyond ~20 tools.
Status: adopted
Why: Claude Code (`ToolSearch`) and MCP (`tools/list` + `list_changed`) converge on deferred/progressive disclosure as the scaling strategy. Prompt-level enumeration degrades routing and context budget.

11. Require explicit handling of layered prompts where part of the system prompt is deployer-authored.
Status: adopted
Why: `SOUL.md`/`WORLD.md`-style layered runtimes risk platform rules drifting into deployer-authored files that may be sparse or absent. Platform behavior must survive independent of the deployer layer.

12. Treat skills as a distinct prompt surface from tools, with their own disclosure, invocation, and lifecycle decisions.
Status: adopted
Why: Skills carry procedural instructions that must reach the model at the right time; tools do not. Disclosure (eager vs lazy vs hybrid), invocation (slash vs meta-tool vs description-match vs implicit), and resumability (does a loaded skill survive a pause?) are all distinct design axes not covered by tool guidance.

## Coverage matrix

| Dimension | Coverage status | Evidence |
Expand All @@ -81,6 +107,8 @@ Why: this skill is a repeatable prompt-optimization workflow with explicit preco
| Safety and escalation boundaries | complete | provider docs plus repo workflow conventions |
| Output and acceptance checks | complete | OpenAI prompting and optimizer docs, skill-writer output patterns |
| Transformed example artifacts | complete | `references/transformed-examples.md` |
| Tool disclosure and policy | complete | `references/tools.md` |
| Skill disclosure, invocation, and lifecycle | complete | `references/skills.md` |
| Future-family coverage beyond OpenAI/Claude/Gemini | partial | currently deferred until there is a concrete repo need |

## Description QA
Expand Down Expand Up @@ -123,3 +151,4 @@ Further retrieval is currently low-yield for this first version. The source pack
- 2026-04-18: Created the initial `prompt-optimizer` skill, references, and provenance record.
- 2026-04-18: Added an explicit prompt learnings pass covering compaction, deduplication, and context ordering.
- 2026-04-18: Folded the prompt learnings back into the core shaping and iteration guidance to keep the workflow compact.
- 2026-04-24: Added `references/tools.md` (tool disclosure, policy, deferred/progressive disclosure, tool-count ceilings, narration, error policy) and `references/skills.md` (skill vs tool, eager/lazy/hybrid disclosure, invocation conventions, platform-vs-deployer layering, skill-bundled tools, routing, resumable-session lifecycle).
81 changes: 73 additions & 8 deletions skills/prompt-optimizer/references/core-patterns.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Good section names are concrete and stable:
- `<role>`
- `<goal>`
- `<context>`
- `<tools>`
- `<tool_policy>`
- `<workflow>`
- `<constraints>`
- `<output_format>`
Expand All @@ -27,6 +27,29 @@ Do not add markup around every sentence. Markers are useful when they carve the

If the target stack or model family responds better to plain markdown, use headings and bullets instead of XML-style tags. The structure matters more than the syntax.

### Where rules live

Markers signal to the model what kind of content a block carries. Descriptive
or state markers (`<context>`, `<state>`, `<turn-state>`, `<environment>`,
`<artifact-state>`) read as facts about the situation — data, not policy.
Canonical rules markers (`<behavior>`, `<constraints>`, `<tool_policy>`,
`<workflow>`) read as directives the model should follow.

A directive buried in a descriptive block can underperform the same directive
placed in a rules block, especially for state-conditional rules. Observed in
the field: a resume-notice instruction placed inside `<turn-state>resumed</turn-state>`
scored 0.5 on the relevant eval; the identical sentence moved into
`<behavior>` passed at ≥0.75 with no other change.

Rules of thumb:

- keep descriptive markers descriptive — put facts about the situation there,
not directives
- directives live in a canonical rules section
- for state-conditional rules, phrase them in the rules section and reference
the state by name: "When `<turn-state>` is `resumed`, post a brief
continuation notice, then answer."

## Layer the prompt correctly

Keep these layers separate:
Expand Down Expand Up @@ -65,14 +88,55 @@ When prompts are long, separate policy from evidence explicitly:
- instructions in one block
- retrieved documents in another
- examples in another
- tool rules and schemas in their own labeled sections
- tool policy (when/why/whether) in its own labeled section; tool schemas stay in the provider-native tools parameter

For long-context prompts, place long evidence before the final query and keep the actual ask in a terminal section.
Do not cargo-cult this ordering into short prompts that do not need it.

### Layered prompts with multiple owners

The layers above assume a single author owns the whole system prompt. Many
runtimes concatenate the system prompt from multiple layers with different
owners at request time:

- a **platform layer** owned by the product or framework team (harness rules,
tool-use policy, output contract, safety boundaries)
- a **deployer or persona layer** authored by the downstream user or customer
(voice, tone, identity files such as `SOUL.md`, `CLAUDE.md`, `AGENTS.md`)

When this is the case, treat the deployer layer as **voice-only**:

- every platform behavior rule — evidence gathering, tool-use policy, narration
rules, output contract, escalation boundaries — must live in the
platform-owned layer and must still fire if the deployer layer is empty,
five lines of voice, or customized in unexpected ways
- do not delete a platform bullet on the assumption that a persona file
"probably covers it"; deployers ship sparse persona files in practice
- if a rule is load-bearing, it belongs in the platform layer by default;
the deployer layer gets voice and domain framing, not policy

Hermes Agent, OpenClaw, and similar SOUL.md-style frameworks use this split
explicitly: platform behavior is code-level, SOUL.md carries identity and
tone, and the platform falls back to a built-in default identity if SOUL.md
is absent or sparse. Mirror that invariant whenever a prompt is assembled
from more than one authorship layer.

## Portable agent prompt skeleton

Use this as a starting point and adapt it:
Use this as a starting point and adapt it.

Tool schemas are disclosed to the model by the provider-native tools parameter
(Anthropic `tools`, OpenAI `tools`, Gemini `tools`). On Anthropic this is
explicit — the API constructs a special system prompt that injects the tool
definitions from the `tools` parameter alongside the user-authored system
prompt. Well-tuned harnesses (Codex CLI, pi-agent-core) pass tools natively
and keep the prompt text free of schema restatements.

The prompt text should carry tool *policy* — when to call tools, when to avoid
them, what evidence to gather before acting — not a restated list of tool
names or argument schemas. Naming a specific tool in a policy rule ("prefer
`Read` over a `Bash` cat") is fine; re-enumerating the tool inventory or its
schemas is not.

```text
<role>
Expand All @@ -91,11 +155,11 @@ Available files or documents:
Known constraints:
</context>

<tools>
Available tools:
When to use them:
When to avoid them:
</tools>
<tool_policy>
When to use tools:
When to avoid tools:
Evidence to gather before acting:
</tool_policy>

<workflow>
1. Clarify only if required.
Expand Down Expand Up @@ -130,6 +194,7 @@ Use markdown headings instead of tags if that fits the target stack better.
- Keep progress-update style explicit if the user should see it.
- Use the shortest wording that preserves the intended behavioral constraint.
- Remove persona, motivation, or reminder text that does not change measured behavior.
- Place directives in canonical rules sections (`<behavior>`, `<constraints>`, `<tool_policy>`, `<workflow>`), not buried inside descriptive markers like `<context>`, `<state>`, or `<turn-state>`.

## Examples

Expand Down
3 changes: 3 additions & 0 deletions skills/prompt-optimizer/references/model-family-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Use this file to adapt prompts to model behavior instead of assuming all model f
- Use delimiters such as markdown headings, XML tags, or section titles when the prompt mixes multiple content blocks.
- Try zero-shot first. Add few-shot examples only when the output contract or edge cases need them.
- Be explicit about constraints, success criteria, and completion conditions.
- Tool schemas are disclosed via the Responses API `tools` parameter. Keep tool policy (when/why/whether to call) in the prompt; do not restate tool names or argument schemas.

### GPT-style non-reasoning models

Expand All @@ -29,6 +30,7 @@ Use this file to adapt prompts to model behavior instead of assuming all model f
- For long context, place long documents before the question and put the actual query near the end.
- When grounding in long documents, asking for relevant quotes first can improve downstream analysis.
- If tool use or progress-update behavior matters, specify it explicitly rather than assuming the model will infer it.
- When you call the Messages API with `tools`, the API injects the tool definitions into a special system prompt automatically. Keep your user-authored system prompt focused on policy; put tool detail in each tool's `description` field rather than re-listing schemas in prose.

## Gemini

Expand All @@ -39,6 +41,7 @@ Use this file to adapt prompts to model behavior instead of assuming all model f
- Use system instructions when the target runtime supports them.
- Thinking is dynamic by default on modern Gemini thinking models; tune it only when latency or deeper reasoning warrants it.
- Gemini long-context workflows can benefit from many-shot in-context learning when you have a large bank of representative examples.
- Tool schemas are disclosed via the Gemini API `tools` (function declarations) parameter. Keep the prompt focused on tool policy; do not re-list function names or parameter schemas.

## Cross-family adapter rules

Expand Down
Loading
Loading