Skip to content

fix(skills): include descriptions in skill index for better auto-loading#712

Merged
Aaronontheweb merged 3 commits into
devfrom
claude-wt-20260421-174515
Apr 21, 2026
Merged

fix(skills): include descriptions in skill index for better auto-loading#712
Aaronontheweb merged 3 commits into
devfrom
claude-wt-20260421-174515

Conversation

@Aaronontheweb

Copy link
Copy Markdown
Collaborator

Summary

Fixes #696 - Eval failures for Skill Discovery and Identity tests.

The root cause was a context assembly problem: the skill index only showed file paths (netclaw-operations/SKILL.md) without any context about WHEN to load each skill. The model had no information to decide which skills were relevant.

  • Identity fix: Added explicit grounding instruction in SOUL.template.md
  • Skill index fix: GenerateIndex() now includes skill descriptions on each line
  • Guidance fix: AGENTS.md skill reference table uses action-oriented "BEFORE you..." language
  • Description fix: Enhanced skill descriptions with more user-language keywords (reminders, recall, buy, shop, etc.)

Changes

File Change
src/Netclaw.Actors/Skills/SkillRegistry.cs Include descriptions in skill index
src/Netclaw.Cli/Resources/identity/SOUL.template.md Add identity grounding instruction
src/Netclaw.Cli/Resources/identity/AGENTS.template.md Action-oriented skill reference
feeds/skills/.system/files/*/SKILL.md Enhanced descriptions with keywords
src/Netclaw.Actors.Tests/Skills/SkillRegistryTests.cs Updated for new index format

Test plan

  • dotnet build - compiles
  • dotnet test src/Netclaw.Actors.Tests - 1116 tests pass
  • dotnet test src/Netclaw.Cli.Tests - 429 tests pass
  • dotnet slopwatch analyze - no violations
  • Run eval suite to verify skill discovery tests improve

…ing (#696)

The skill index was only showing file paths without context, so the model
had no information about WHEN to load each skill. This caused eval failures
for skill discovery tests.

Changes:
- SkillRegistry.GenerateIndex() now includes skill descriptions on each line
- SOUL.template.md adds explicit identity grounding instruction
- AGENTS.template.md skill reference table uses action-oriented language
- Skill descriptions enhanced with more user-language keywords
- Tests updated for new index format
…ategory filtering

Skill Discovery evals previously checked for `[tool:call] file_read` with a
skill name, which tested a proxy (explicit skill loading) rather than an
outcome (does the model have the knowledge?). Most prompts could be answered
without loading the skill, making the tests measure the wrong thing.

Redesigned tests:
- skill_scheduling_knowledge: asks about schedule types (cron/interval/once)
  which are only documented in the skill file
- skill_memory_knowledge: asks about memory classes (durable/evidence/trace)
  which are only documented in the skill file
- skill_operations_diagnostics: verifies the model takes diagnostic action
- skill_citation_search: verifies the model actually calls web_search
- skill_web_content_knowledge: asks about browser needs for JS-heavy sites

Added NETCLAW_EVAL_CATEGORY and NETCLAW_EVAL_CASE env vars for running
specific categories or cases without the full suite.
@Aaronontheweb Aaronontheweb merged commit afaafc8 into dev Apr 21, 2026
4 checks passed
@Aaronontheweb Aaronontheweb deleted the claude-wt-20260421-174515 branch April 21, 2026 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval failures: Skill Discovery and Identity tests need investigation

1 participant