You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Discovery method: skill_directories=[...] on Session(...)
Question
What is the matching algorithm / threshold for description-based skill body injection? Natural user prompts with moderate keyword overlap don't trigger it, but heavily keyword-loaded prompts do. Is there guidance on how to write description fields that activate reliably for the kinds of prompts real users type?
What we observed
We have 6 skills with keyword-rich description fields (all under the 1024-char cap, all visible in skills.list()). We tested 12 sessions across 5 domains and found that description matching works, but the activation threshold is high:
Prompt style
Body injected?
SKILL_INVOKED fires?
Sessions
Explicit /skill-name ...
✅ Yes
✅ Yes
2
Named in natural language (Use the X skill to…)
✅ Yes
✅ Yes
1
Natural question, ~3–6 keyword overlaps with description
❌ No
❌ No
7
Same question + appended clause stuffed with 12+ description keywords
✅ Yes
✅ Yes
2
We confirmed injection / non-injection by crafting prompts targeting content that exists only in the SKILL.md body (not in our base instructions, tool definitions, or model training data). When the body was injected, responses contained our unique workflow guidance and tool-specific parameter values. When it wasn't, responses gave generic advice from training data, and context token usage matched a bare prompt.
The gap
The natural-language prompts that real users type (row 3) sit well below the activation threshold. These prompts contain relevant keywords that appear in the description field, but the overlap isn't dense enough to trigger injection. Users only get the benefit of our skill content if they either:
Know the skill name and invoke it explicitly, or
Happen to write an unusually keyword-dense prompt
This makes the description field hard to tune. We don't know whether the matching is cosine similarity, keyword count, TF-IDF, or something else — so we can't optimize our descriptions for natural activation.
What would help
Any of the following would make skills more useful for natural prompts:
Document the matching algorithm — even a high-level description (e.g. "semantic similarity above threshold X" or "N keyword matches required") would let skill authors tune their descriptions.
Lower the threshold for description-based injection, so natural prompts with moderate keyword overlap trigger it.
Expose a confidence score — e.g. on skills.list() or a new skills.match(prompt) RPC — so hosts can see how close a prompt came to triggering a skill and surface that in their UI.
What we expected
We expected that a natural prompt sharing several keywords with a skill's description (e.g. 3–6 matches across a description that contains 50+ domain keywords) would be sufficient to trigger body injection. The current threshold seems to require near-exhaustive keyword overlap, which users won't naturally produce.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Environment
1.0.36-0github_copilot_sdk==0.3.0skill_directories=[...]onSession(...)Question
What is the matching algorithm / threshold for description-based skill body injection? Natural user prompts with moderate keyword overlap don't trigger it, but heavily keyword-loaded prompts do. Is there guidance on how to write
descriptionfields that activate reliably for the kinds of prompts real users type?What we observed
We have 6 skills with keyword-rich
descriptionfields (all under the 1024-char cap, all visible inskills.list()). We tested 12 sessions across 5 domains and found that description matching works, but the activation threshold is high:SKILL_INVOKEDfires?/skill-name ...Use the X skill to…)descriptiondescriptionkeywordsWe confirmed injection / non-injection by crafting prompts targeting content that exists only in the SKILL.md body (not in our base instructions, tool definitions, or model training data). When the body was injected, responses contained our unique workflow guidance and tool-specific parameter values. When it wasn't, responses gave generic advice from training data, and context token usage matched a bare prompt.
The gap
The natural-language prompts that real users type (row 3) sit well below the activation threshold. These prompts contain relevant keywords that appear in the
descriptionfield, but the overlap isn't dense enough to trigger injection. Users only get the benefit of our skill content if they either:This makes the
descriptionfield hard to tune. We don't know whether the matching is cosine similarity, keyword count, TF-IDF, or something else — so we can't optimize our descriptions for natural activation.What would help
Any of the following would make skills more useful for natural prompts:
skills.list()or a newskills.match(prompt)RPC — so hosts can see how close a prompt came to triggering a skill and surface that in their UI.What we expected
We expected that a natural prompt sharing several keywords with a skill's
description(e.g. 3–6 matches across a description that contains 50+ domain keywords) would be sufficient to trigger body injection. The current threshold seems to require near-exhaustive keyword overlap, which users won't naturally produce.Thanks!
Beta Was this translation helpful? Give feedback.
All reactions