Skip to content

CLI tool that evaluates AI agent skills and produces quality scores. Works with any SKILL.md-based skill from skills.sh, ClawHub, GitHub, or local directories.

License

Notifications You must be signed in to change notification settings

joeynyc/skillscore

Repository files navigation

SkillScore

npm version License: MIT Node.js CI TypeScript

The universal quality standard for AI agent skills.
Evaluate any SKILL.md β€” from skills.sh, ClawHub, GitHub, or your local machine.


✨ Features

  • 🎯 Comprehensive Evaluation: 8 scoring categories with weighted importance
  • 🎨 Multiple Output Formats: Terminal (colorful), JSON, and Markdown reports
  • πŸ” Deterministic Analysis: Reliable, reproducible scoring without requiring API keys
  • πŸ“‹ Detailed Feedback: Specific findings and actionable recommendations
  • ⚑ Fast & Reliable: Built with TypeScript for speed and reliability
  • 🌍 Cross-Platform: Works on Windows, macOS, and Linux
  • πŸ™ GitHub Integration: Score skills directly from GitHub repositories
  • πŸ“Š Batch Mode: Compare multiple skills with a summary table
  • πŸ—£οΈ Verbose Mode: See all findings, not just truncated summaries

πŸ“¦ Installation

Global Installation (Recommended)

npm install -g skillscore

Local Installation

npm install skillscore
npx skillscore ./my-skill/

From Source

git clone https://github.com/joeynyc/skillscore.git
cd skillscore
npm install
npm run build
npm link

πŸš€ Quick Start

Evaluate a skill directory:

skillscore ./my-skill/

πŸ“– Usage Examples

Basic Usage

# Evaluate a skill
skillscore ./skills/my-skill/

# Evaluate with verbose output (shows all findings)
skillscore ./skills/my-skill/ --verbose

GitHub Integration

# Full GitHub URL (always recognized)
skillscore https://github.com/vercel-labs/skills/tree/main/skills/find-skills

# GitHub shorthand (requires -g/--github flag)
skillscore -g vercel-labs/skills/find-skills

# Anthropic skills
skillscore -g anthropic/skills/skill-creator

Output Formats

# JSON output
skillscore ./skills/my-skill/ --json

# Markdown report
skillscore ./skills/my-skill/ --markdown

# Save to file
skillscore ./skills/my-skill/ --output report.md
skillscore ./skills/my-skill/ --json --output score.json

Batch Mode

# Compare multiple skills (auto-enters batch mode)
skillscore ./skill1 ./skill2 ./skill3

# Explicit batch mode flag
skillscore ./skill1 ./skill2 --batch

# Compare GitHub skills
skillscore -g user/repo1/skill1 user/repo2/skill2 --json

Utility Commands

# Show version
skillscore --version

# Get help
skillscore --help

πŸ“Š Example Output

Terminal Output

πŸ“Š SKILLSCORE EVALUATION REPORT
============================================================

πŸ“‹ Skill: Weather Information Fetcher
   Fetches current weather data for any city using OpenWeatherMap API
   Path: ./weather-skill

🎯 OVERALL SCORE
   A- - 92.0% (9.2/10.0 points)

πŸ“ CATEGORY BREAKDOWN
------------------------------------------------------------
Structure β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 100.0%
   SKILL.md exists, clear name/description, follows conventions
   Score: 10/10 (weight: 15%)
   βœ“ SKILL.md file exists (+3)
   βœ“ Clear skill name: "Weather Information Fetcher" (+2)
   βœ“ Clear description provided (+2)
   ... 2 more findings

Clarity β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘ 90.0%
   Specific actionable instructions, no ambiguity, logical order
   Score: 9/10 (weight: 20%)
   βœ“ Contains specific step-by-step instructions with commands (+3)
   βœ“ No ambiguous language detected (+3)
   βœ“ Instructions follow logical order (+2)
   ... 1 more finding (use --verbose to see all)

Safety β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘ 70.0%
   No destructive commands, respects permissions
   Score: 7/10 (weight: 20%)
   βœ“ No dangerous destructive commands found (+3)
   βœ“ No obvious secret exfiltration risks (+3)
   βœ— Some potential security concerns detected

πŸ“ˆ SUMMARY
------------------------------------------------------------
βœ… Strengths: Structure, Clarity, Dependencies, Documentation
❌ Areas for improvement: Safety

Generated: 2/11/2026, 3:15:49 PM

Batch Mode Output

πŸ“Š BATCH SKILL EVALUATION
Evaluating 3 skill(s)...

[1/3] Processing: ./weather-skill
βœ… Completed

[2/3] Processing: ./file-backup
βœ… Completed

[3/3] Processing: user/repo/skill
βœ… Completed

πŸ“‹ COMPARISON SUMMARY

Skill                          Grade  Score    Structure Clarity Safety Status    
Weather Information Fetcher    A-     92.0%    100%      90%     70%    OK        
File Backup Tool              B+     87.0%    95%       85%     90%    OK        
Advanced Data Processor       A      94.0%    100%      95%     85%    OK        

πŸ“ˆ BATCH SUMMARY
βœ… Successful: 3
πŸ“Š Average Score: 91.0%

πŸ† Scoring System

SkillScore evaluates skills across 8 weighted categories:

Category Weight Description
Structure 15% SKILL.md exists, clear name/description, file organization, artifact output spec
Clarity 20% Specific actionable instructions, no ambiguity, logical order
Safety 20% No destructive commands, respects permissions, network containment
Dependencies 10% Lists required tools/APIs, install instructions, env vars
Error Handling 10% Failure instructions, fallbacks, no silent failures
Scope 10% Single responsibility, routing quality, negative examples
Documentation 10% Usage examples, embedded templates, expected I/O
Portability 5% Cross-platform, no hardcoded paths, relative paths

Scoring Methodology

Each category is scored from 0-10 points based on specific criteria:

  • Structure: Checks for SKILL.md existence, clear naming, proper organization, and whether outputs/artifacts are defined
  • Clarity: Analyzes instruction specificity, ambiguity, logical flow
  • Safety: Scans for destructive commands, security risks, permission issues, and network containment (does the skill scope network access when using HTTP/APIs?)
  • Dependencies: Validates tool listings, installation instructions, environment setup
  • Error Handling: Reviews error scenarios, fallback strategies, validation
  • Scope: Assesses single responsibility, trigger clarity, conflict potential, negative routing examples ("don't use when..."), and routing quality (concrete signals vs vague descriptions)
  • Documentation: Evaluates examples, I/O documentation, troubleshooting guides, and embedded templates/worked examples with expected output
  • Portability: Checks cross-platform compatibility, path handling, limitations

v1.1.0: Production-Validated Checks

Five new sub-criteria added in v1.1.0, inspired by OpenAI's Skills + Shell + Compaction blog and production data from Glean:

Check Category Points Why It Matters
Negative routing examples Scope 2 Skills that say when NOT to use them trigger ~20% more accurately (Glean data)
Routing quality Scope 1 Descriptions with concrete tool names, I/O, and "use when" patterns route better than marketing copy
Embedded templates Documentation 2 Real output templates inside the skill drove the biggest quality + latency gains in production
Network containment Safety 1 Skills combining tools + open network access are a data exfiltration risk without scoping
Artifact output spec Structure 1 Skills that define where outputs go create clean review boundaries

Grade Scale

Grade Score Range Description
A+ 97-100% Exceptional quality
A 93-96% Excellent
A- 90-92% Very good
B+ 87-89% Good
B 83-86% Above average
B- 80-82% Satisfactory
C+ 77-79% Acceptable
C 73-76% Fair
C- 70-72% Needs improvement
D+ 67-69% Poor
D 65-66% Very poor
D- 60-64% Failing
F 0-59% Unacceptable

πŸ“ What Makes a Good Skill?

Required Structure

my-skill/
β”œβ”€β”€ SKILL.md           # Main skill definition (REQUIRED)
β”œβ”€β”€ README.md          # Documentation (recommended)
β”œβ”€β”€ package.json       # Dependencies (if applicable)
β”œβ”€β”€ scripts/           # Executable scripts
β”‚   β”œβ”€β”€ setup.sh
β”‚   └── main.py
└── examples/          # Usage examples
    └── example.md

SKILL.md Template

# My Awesome Skill

Brief description of what this skill does and when to use it.

## When to Use

Use this skill when you need to [specific task] with [specific tools/inputs].

## When NOT to Use

Don't use this skill when:
- The task is [alternative scenario] β€” use [other skill] instead
- You need [different capability]

## Dependencies

- Tool 1: Installation instructions
- API Key: How to obtain and configure
- Environment: OS requirements

## Usage

1. Step-by-step instructions
2. Specific commands to run
3. Expected outputs

## Output

Results are written to `./output/` as JSON files.

## Error Handling

- Common issues and solutions
- Fallback strategies
- Validation steps

## Examples

### Example Output

```json
{
  "status": "success",
  "result": "Example of what the skill produces"
}
# Working example
./scripts/main.py --input "test data"

Limitations

  • Known constraints
  • Platform-specific notes
  • Edge cases

## πŸ”§ API Usage

Use SkillScore programmatically in your Node.js projects:

```typescript
import { SkillParser, SkillScorer, TerminalReporter } from 'skillscore';
import type { Reporter, SkillScore } from 'skillscore';

const parser = new SkillParser();
const scorer = new SkillScorer();
const reporter: Reporter = new TerminalReporter();

async function evaluateSkill(skillPath: string): Promise<SkillScore> {
  const skill = await parser.parseSkill(skillPath);
  const score = await scorer.scoreSkill(skill);
  const report = reporter.generateReport(score);

  console.log(report);
  return score;
}

All three reporters (TerminalReporter, JsonReporter, MarkdownReporter) implement the Reporter interface.

πŸ› οΈ CLI Options

Usage: skillscore [options] <path...>

Arguments:
  path                   Path(s) to skill directory, GitHub URL, or shorthand

Options:
  -V, --version         Output the version number
  -j, --json            Output in JSON format
  -m, --markdown        Output in Markdown format
  -o, --output <file>   Write output to file
  -v, --verbose         Show ALL findings (not just truncated)
  -b, --batch           Batch mode for comparing multiple skills
  -g, --github          Treat shorthand paths as GitHub repos (user/repo/path)
  -h, --help           Display help for command

πŸ§ͺ Testing

# Run all tests
npm test

# Run tests in watch mode
npm run test:ui

# Run tests once
npm run test:run

# Lint code
npm run lint

# Build project
npm run build

🀝 Contributing

We welcome contributions! Here's how to get started:

Development Setup

git clone https://github.com/joeynyc/skillscore.git
cd skillscore
npm install
npm run build
npm link

# Run in development mode
npm run dev ./test-skill/

# Build for production
npm run build

Running Tests

npm test

Submitting Changes

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Add tests for new functionality
  5. Ensure all tests pass (npm test)
  6. Lint your code (npm run lint)
  7. Commit your changes (git commit -m 'Add amazing feature')
  8. Push to the branch (git push origin feature/amazing-feature)
  9. Open a Pull Request

Coding Standards

  • Use TypeScript for all new code
  • Follow existing code style (enforced by ESLint)
  • Add tests for new features
  • Update documentation for API changes
  • Keep commits focused and descriptive

πŸ› Troubleshooting

Common Issues

Error: "Path does not exist"

  • Check for typos in the path
  • Ensure you have permission to read the directory
  • Verify the path points to a directory, not a file

Error: "No SKILL.md file found"

  • Skills must contain a SKILL.md file
  • Check if you're pointing to the right directory
  • The file must be named exactly "SKILL.md"

Error: "Git is not available"

  • Install Git to clone GitHub repositories
  • macOS: xcode-select --install
  • Ubuntu: sudo apt-get install git
  • Windows: Download from git-scm.com

Scores seem too high/low

  • Scoring is calibrated against real-world skills
  • See the scoring methodology above
  • Consider the specific criteria for each category

Getting Help

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Inspired by the need for quality assessment in AI agent skills
  • Built for the OpenClaw and Claude Code communities
  • Thanks to all contributors and skill creators
  • Scoring methodology informed by software engineering best practices and OpenAI's production skill patterns

πŸ“Š Example Scores

Here are some real-world examples of how different skills score:

  • Vercel find-skills: 85% (B) - Well-structured, good documentation
  • Anthropic frontend-design: 87% (B+) - Excellent clarity, minor dependency issues
  • Anthropic skill-creator: 92% (A-) - Outstanding overall, minor safety concerns

Made with ❀️ for the AI agent community
Help us improve AI agent skills, one evaluation at a time

About

CLI tool that evaluates AI agent skills and produces quality scores. Works with any SKILL.md-based skill from skills.sh, ClawHub, GitHub, or local directories.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published