Skip to content

Create evals for the structured-context agent skill #38

@mindsocket

Description

@mindsocket

Context

The agent skill (skills/structured-context/) needs automated evals to verify it remains accurate as the tool and schemas evolve. Without evals, regressions in skill quality are invisible.

Use cases to cover

  1. Validate a space — agent runs validate on a space and correctly interprets errors
  2. Schema design and authoring — agent writes or modifies a schema file using correct field names and structure
  3. Troubleshoot a validation error — agent diagnoses a common error (e.g. missing field, broken wikilink) from validate output
  4. Qualitative content assessment — agent retrieves rules via schemas show and applies them to review content (depends on Add qualitative content assessment to the agent skill #37)

Notes

  • Evals should run against real or fixture spaces/schemas where practical
  • Consider using bun run test infrastructure or a separate evals/ directory
  • This tracks ongoing skill health, not just initial correctness

Closes once eval suite is running in CI or as a documented manual process.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions