Skip to content

feat(cli): add skill quality evaluator (#119)#154

Merged
luongnv89 merged 1 commit intomainfrom
feat/119-skill-quality-evaluator
Apr 18, 2026
Merged

feat(cli): add skill quality evaluator (#119)#154
luongnv89 merged 1 commit intomainfrom
feat/119-skill-quality-evaluator

Conversation

@luongnv89
Copy link
Copy Markdown
Owner

Summary

  • Adds asm eval <skill-path> which scores a skill's SKILL.md against seven best-practice categories (structure, description, prompt engineering, context efficiency, safety, testability, naming & conventions) and emits a structured report with an overall 0–100 score plus the top three actionable suggestions.
  • --fix applies deterministic frontmatter fixes (add missing version, infer effort from body size, canonical key ordering, CRLF normalisation, trailing-whitespace stripping, creator from git config user.name) and creates a SKILL.md.bak backup before writing. --fix --dry-run previews a unified diff without touching disk.
  • Supports --json and --machine (v1 envelope) outputs for programmatic consumers, matching the shape used by doctor/publish.

Changes

  • New module src/evaluator.ts — category scorers, report aggregator, auto-fix planner, unified diff helper, formatters, machine-envelope helper.
  • New test file src/evaluator.test.ts — 37 unit tests covering every category scorer, each auto-fixable item in isolation, dry-run, backup creation, idempotency, format helpers, and end-to-end evaluateSkill.
  • CLI wiring src/cli.ts — new --fix flag in ParsedArgs, help text, cmdEval dispatcher, switch case, and eval added to the commands array in isCLIMode. doctor also added to the array (was previously relying on the fallback branch).
  • CLI integration tests in src/cli.test.tseval --help, missing-path error, --json, --machine, --fix --dry-run non-writing behaviour, and --fix backup creation. isCLIMode tests for eval and doctor.

Schema alignment (intentional)

The issue uses author / type / XS/S/M/L/XL, which don't match the existing SKILL.md schema described in the README and parsed by src/utils/frontmatter.ts. Rather than silently introduce a new schema, the evaluator maps issue terminology to existing conventions:

Issue wording Codebase convention
author creator (or metadata.creator)
top-level version metadata.version preferred, version fallback (via resolveVersion)
XS/S/M/L/XL low/medium/high/max (README table)
type no existing field — deferred; can be added in a follow-up

The decision is documented in the module docstring at the top of src/evaluator.ts.

Scope note

The optional "Can be integrated into asm publish as pre-publish quality gate (medium)" item from the issue is deferred. Hooking into the existing publish pipeline would widen the blast radius of this PR (existing publish tests, a behavior change for an already-shipped command). This PR ships eval standalone; publish integration is a clean follow-up.

Testing

  • bun test src/evaluator.test.ts — 37 pass, 0 fail, 70 expect() calls.
  • bun test src/cli.test.ts --test-name-pattern "eval" — 7 new CLI integration tests pass.
  • bun run typecheck — clean.
  • bunx prettier --check src/evaluator.ts src/evaluator.test.ts src/cli.ts src/cli.test.ts — clean.
  • bun run build — succeeds (run by the pre-push hook).
  • bun test tests/e2e/bun-e2e.test.ts — passes (run by the pre-push hook).
  • Manual smoke tests: ran asm eval against ./skills/hello-world (scored 40/F) and ./skills/skill-index-updater (scored higher) to confirm scoring differentiates real skills rather than being degenerate.

Note on pre-existing test failures

Five unit tests fail locally on both main and this branch due to local environment state (4 publishSkill > ... tests that depend on git / gh CLI state, and 1 CLI integration: import > import existing skills are skipped test that collides with the user's globally-installed skills). These failures exist on main prior to this PR — verified via git stash + bun test. CI on main is green, so the CI sandbox is not affected. This PR does not add or touch any of those tests.

Test plan

  • CI passes on the branch (unit, typecheck, e2e, build)
  • asm eval ./skills/hello-world produces a scored report
  • asm eval ./skills/hello-world --json emits parseable JSON with 7 categories
  • asm eval <tempdir>/skill --fix --dry-run prints a diff and does not modify SKILL.md
  • asm eval <tempdir>/skill --fix creates SKILL.md.bak and rewrites the original
  • asm eval <bogus> exits with code 1 and a helpful error
  • asm eval with no path prints the usage error and exits with code 2
  • asm eval --help prints help
  • asm eval ./skills/hello-world --machine emits a v1 envelope with command: "eval"

Closes #119

Adds `asm eval <skill-path>` which scores a skill's SKILL.md against
seven best-practice categories (structure, description, prompt
engineering, context efficiency, safety, testability, naming) and emits
a structured report with an overall 0-100 score plus the top three
actionable suggestions. `--fix` applies deterministic frontmatter
fixes (missing version, inferred effort, canonical key ordering, CRLF
normalisation, trailing-whitespace stripping, creator from git). `--fix
--dry-run` previews a unified diff without writing, and `--fix` on a
real run creates a `SKILL.md.bak` before modifying. Supports `--json`
and `--machine` for programmatic consumers.

Scope choices documented inline in `src/evaluator.ts`: the issue uses
`author`/`type`/`XS/S/M/L/XL` terminology which does not match the
existing SKILL.md schema described in the README and `utils/frontmatter.ts`
(`creator`, `metadata.version`, `low/medium/high/max`). The evaluator
maps to existing conventions instead of silently introducing a schema
change, and defers `type` since no codebase field uses it. The optional
"integrate with asm publish" bullet from the issue is deferred for a
follow-up — this PR ships eval standalone.

Five pre-existing test failures (4 publishSkill gh-CLI flows,
1 import-integration) are environment-specific on `main` and unrelated
to this change; CI on main is green.

Closes #119
@luongnv89 luongnv89 merged commit d106bc6 into main Apr 18, 2026
10 checks passed
@luongnv89 luongnv89 deleted the feat/119-skill-quality-evaluator branch April 18, 2026 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add skill quality evaluator based on best practices

1 participant