fix(skills): include descriptions in skill index for better auto-loading#712
Merged
Conversation
…ing (#696) The skill index was only showing file paths without context, so the model had no information about WHEN to load each skill. This caused eval failures for skill discovery tests. Changes: - SkillRegistry.GenerateIndex() now includes skill descriptions on each line - SOUL.template.md adds explicit identity grounding instruction - AGENTS.template.md skill reference table uses action-oriented language - Skill descriptions enhanced with more user-language keywords - Tests updated for new index format
…ategory filtering Skill Discovery evals previously checked for `[tool:call] file_read` with a skill name, which tested a proxy (explicit skill loading) rather than an outcome (does the model have the knowledge?). Most prompts could be answered without loading the skill, making the tests measure the wrong thing. Redesigned tests: - skill_scheduling_knowledge: asks about schedule types (cron/interval/once) which are only documented in the skill file - skill_memory_knowledge: asks about memory classes (durable/evidence/trace) which are only documented in the skill file - skill_operations_diagnostics: verifies the model takes diagnostic action - skill_citation_search: verifies the model actually calls web_search - skill_web_content_knowledge: asks about browser needs for JS-heavy sites Added NETCLAW_EVAL_CATEGORY and NETCLAW_EVAL_CASE env vars for running specific categories or cases without the full suite.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #696 - Eval failures for Skill Discovery and Identity tests.
The root cause was a context assembly problem: the skill index only showed file paths (
netclaw-operations/SKILL.md) without any context about WHEN to load each skill. The model had no information to decide which skills were relevant.GenerateIndex()now includes skill descriptions on each lineChanges
src/Netclaw.Actors/Skills/SkillRegistry.cssrc/Netclaw.Cli/Resources/identity/SOUL.template.mdsrc/Netclaw.Cli/Resources/identity/AGENTS.template.mdfeeds/skills/.system/files/*/SKILL.mdsrc/Netclaw.Actors.Tests/Skills/SkillRegistryTests.csTest plan
dotnet build- compilesdotnet test src/Netclaw.Actors.Tests- 1116 tests passdotnet test src/Netclaw.Cli.Tests- 429 tests passdotnet slopwatch analyze- no violations