Sensei

"A true master teaches not by telling, but by refining." - The Skill Sensei

Sensei automates the improvement of Agent Skills frontmatter compliance using the Ralph loop pattern - iteratively improving skills until they reach Medium-High compliance with all tests passing.

Overview

The Problem

Skills without proper frontmatter lead to skill collision - agents invoking the wrong skill for a given prompt. Common issues include:

No triggers - Agent doesn't know when to activate the skill
No anti-triggers - Agent doesn't know when NOT to use the skill
Brief descriptions - Not enough context for accurate matching
Token bloat - Oversized skills waste context window

The Solution

Sensei implements the "Ralph Wiggum" technique:

Read - Load the skill's current state and token count
Score - Evaluate frontmatter compliance
Improve - Add triggers, anti-triggers, compatibility
Verify - Run tests to ensure changes work
Check Tokens - Analyze token usage, gather suggestions
Summary - Display before/after with suggestions
Prompt - Ask user: Commit, Create Issue, or Skip?
Repeat - Until target score reached

Quick Start

Using with Copilot CLI

Single Skill

Run sensei on my-skill-name

Single Skill (Fast Mode)

Run sensei on my-skill-name --fast

Multiple Skills

Run sensei on skill-a, skill-b, skill-c

All Low-Adherence Skills

Run sensei on all Low-adherence skills

All Skills

Run sensei on all skills

Using Scripts Directly

# Count tokens in all markdown files
npm run tokens -- count

# Count tokens in specific files
npm run tokens -- count SKILL.md references/*.md

# Check files against token limits
npm run tokens -- check

# Check with strict mode (exits 1 if limits exceeded)
npm run tokens -- check --strict

# Get optimization suggestions
npm run tokens -- suggest

# Compare with previous commit
npm run tokens -- compare HEAD~1

Flags

Flag	Description
`--fast`	Skip tests for faster iteration
`--skip-integration`	Skip integration tests (unit + trigger tests only)

⚠️ Note: Using --fast speeds up the loop significantly but may miss issues. Consider running full tests before final commit.

Prerequisites

Required

Node.js 18+ - For running token management scripts
```
node --version
```
Git - For commits and comparisons
```
git --version
```

Optional

Test Framework - Jest, pytest, or similar for trigger tests

Installation

Option 1: Install as Copilot CLI Skill (Recommended)

# Clone to your skills directory
git clone https://github.com/spboyer/sensei.git ~/.copilot/skills/sensei

# Install token CLI dependencies
cd ~/.copilot/skills/sensei/scripts && npm install

The skill is now available in Copilot CLI. Invoke with:

Run sensei on my-skill-name

Option 2: Install in Project Skills Folder

For project-specific installation:

# From your project root
mkdir -p .github/skills
git clone https://github.com/spboyer/sensei.git .github/skills/sensei

# Install dependencies
cd .github/skills/sensei/scripts && npm install

Verify Installation

# Test the token CLI
cd ~/.copilot/skills/sensei  # or your install path
npm run tokens -- check

# Should output token counts for all markdown files

How It Works

The Ralph Loop

┌─────────────────────────────────────────────────────────┐
│  START: User invokes "Run sensei on {skill-name}"       │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  1. READ: Load skills/{skill-name}/SKILL.md             │
│           Load tests/{skill-name}/ (if exists)          │
│           Count tokens (baseline for comparison)        │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  2. SCORE: Run rule-based compliance check              │
│     • Check description length (> 150 chars?)           │
│     • Check for trigger phrases ("USE FOR:")            │
│     • Check for anti-triggers ("DO NOT USE FOR:")       │
│     • Check for compatibility field                     │
└─────────────────────┬───────────────────────────────────┘
                      ▼
              ┌───────────────┐
              │ Score >= M-H  │──YES──▶ COMPLETE ✓
              │ AND tests pass│        (next skill)
              └───────┬───────┘
                      │ NO
                      ▼
┌─────────────────────────────────────────────────────────┐
│  3. SCAFFOLD: If tests/{skill-name}/ missing:           │
│     Create tests from references/test-templates/        │
│     Creates prompts.md and framework-specific tests     │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  4. IMPROVE FRONTMATTER:                                │
│     • Add "USE FOR:" with trigger phrases               │
│     • Add "DO NOT USE FOR:" with anti-triggers          │
│     • Add compatibility if applicable                   │
│     • Keep description under 1024 chars                 │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  5. IMPROVE TESTS:                                      │
│     • Update shouldTriggerPrompts (5+ prompts)          │
│     • Update shouldNotTriggerPrompts (5+ prompts)       │
│     • Match prompts to new frontmatter triggers         │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  6. VERIFY: Run tests for the skill                     │
│     • If tests fail → fix and retry                     │
│     • If tests pass → continue                          │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  7. CHECK TOKENS:                                       │
│     npm run tokens count {skill}/SKILL.md               │
│     Verify under 500 token soft limit                   │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  8. SUMMARY: Display before/after comparison            │
│     • Score change (Low → Medium-High)                  │
│     • Token delta (+/- tokens)                          │
│     • Unimplemented suggestions                         │
└─────────────────────┬───────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────────────┐
│  9. PROMPT USER: Choose action                          │
│     [C] Commit changes                                  │
│     [I] Create GitHub issue with suggestions            │
│     [S] Skip (discard changes)                          │
└─────────────────────┬───────────────────────────────────┘
                      ▼
              ┌───────────────┐
              │ Iteration < 5 │──YES──▶ Go to step 2
              └───────┬───────┘
                      │ NO
                      ▼
               TIMEOUT (move to next skill)

Batch Processing

When running on multiple skills:

Skills are processed sequentially
Each skill goes through the full loop
User prompted after each skill: Commit, Create Issue, or Skip
Summary report at the end shows all results

Configuration

Setting	Default	Description
Skills directory	`skills/` or `.github/skills/`	Where SKILL.md files live
Tests directory	`tests/`	Where test files live
Max iterations	5	Per-skill iteration limit before moving on
Target score	Medium-High	Minimum compliance level
Token soft limit	500	SKILL.md target token count
Token hard limit	5000	SKILL.md maximum token count
User prompt	After each skill	Commit, Create Issue, or Skip
Continue on failure	Yes	Process remaining skills if one fails

Custom Paths

Override defaults in your prompt:

Run sensei on my-skill with skills in src/ai/skills/ and tests in spec/

Scoring Criteria

Adherence Levels

Level	Description	Criteria
Low	Basic description	No explicit triggers, no anti-triggers, often < 150 chars
Medium	Has trigger keywords	Description > 150 chars, implicit or explicit trigger phrases
Medium-High	Has triggers + anti-triggers	"USE FOR:" present AND "DO NOT USE FOR:" present
High	Full compliance	Medium-High + routing clarity (INVOKES/FOR SINGLE OPERATIONS)

Rule-Based Checks

Name validation
- Lowercase + hyphens only
- Matches directory name
- ≤ 64 characters
Description length
- Minimum: 150 characters (effective)
- Maximum: 1024 characters (spec limit)
Trigger phrases
- Contains "USE FOR:", "TRIGGERS:", or "Use this skill when"
- Lists specific keywords and phrases
Anti-triggers
- Contains "DO NOT USE FOR:" or "NOT FOR:"
- Lists scenarios that should use other skills
Routing clarity (for High score)
- Skill type prefix: **WORKFLOW SKILL**, **UTILITY SKILL**, or **ANALYSIS SKILL**
- INVOKES: lists tools/MCP servers the skill calls
- FOR SINGLE OPERATIONS: guidance for when to bypass skill

Target: Medium-High

To reach Medium-High, a skill must have:

✅ Description > 150 characters
✅ Explicit trigger phrases ("USE FOR:" or equivalent)
✅ Anti-triggers ("DO NOT USE FOR:" or clear scope limitation)
✅ SKILL.md < 500 tokens (soft limit, monitored)

Target: High (with routing)

To reach High, add routing clarity:

✅ All Medium-High criteria
✅ Skill type prefix (**WORKFLOW SKILL**, etc.)
✅ INVOKES: listing tools/MCP servers used
✅ FOR SINGLE OPERATIONS: bypass guidance

MCP Integration Checks

When a skill's description contains INVOKES:, Sensei performs additional checks based on the Skills, Tools & MCP Development Guide:

Check	Purpose
MCP Tools Used table	Documents tool dependencies in skill body
Prerequisites section	Lists required tools and permissions
CLI fallback pattern	Provides fallback when MCP unavailable
Name collision detection	Warns when skill name matches MCP tool

MCP Integration Score (0-4 points):

4/4 = Excellent MCP integration
3/4 = Good (minor gaps)
2/4 = Fair (needs improvement)
0-1/4 = Poor (missing key patterns)

See references/mcp-integration.md for detailed patterns.

Token Budget

SKILL.md: < 500 tokens (soft), < 5000 (hard)
references/*.md: < 2000 tokens each
Check with: npm run tokens -- check
Get suggestions: npm run tokens -- suggest

Examples

Before: Low Adherence

---
name: pdf-processor
description: 'Process PDF files for various tasks'
---

Problems:

Only 37 characters
No trigger phrases
No anti-triggers
Agent doesn't know when to activate

After: Medium-High Adherence

---
name: pdf-processor
description: |
  Process PDF files including text extraction, rotation, and merging.
  USE FOR: "extract PDF text", "rotate PDF", "merge PDFs", "split PDF",
  "PDF to text", "combine PDF files".
  DO NOT USE FOR: creating new PDFs (use document-creator), extracting
  images (use image-extractor), or OCR on scanned documents (use ocr-processor).
---

Improvements:

~350 characters (informative but under limit)
Clear description of purpose
Explicit trigger phrases
Anti-triggers prevent collision with related skills

After: High Adherence (with routing)

---
name: azure-deploy
description: |
  **WORKFLOW SKILL** - Orchestrates deployment through preparation, validation,
  and execution phases for Azure applications.
  USE FOR: "deploy to Azure", "azd up", "push to Azure", "publish to Azure".
  DO NOT USE FOR: preparing new apps (use azure-prepare), validating before
  deploy (use azure-validate), Azure Functions specifically (use azure-functions).
  INVOKES: azure-azd MCP (up, deploy, provision), azure-deploy MCP (plan_get).
  FOR SINGLE OPERATIONS: Use azure-azd MCP directly for single azd commands.
---

High score achieved with:

Skill type prefix (**WORKFLOW SKILL**)
INVOKES: lists MCP tools used
FOR SINGLE OPERATIONS: guides when to bypass skill

Test Updates

Before (empty):

const shouldTriggerPrompts = [];
const shouldNotTriggerPrompts = [];

After:

const shouldTriggerPrompts = [
  'Extract text from this PDF',
  'Rotate this PDF 90 degrees',
  'Merge these PDF files together',
  'Split this PDF into pages',
  'Convert PDF to text',
];

const shouldNotTriggerPrompts = [
  'Create a new PDF document',
  'Extract images from this PDF',
  'OCR this scanned document',
  'What is the weather today?',
  'Help me with AWS S3',
];

Troubleshooting

Tests Failing After Improvement

Ensure shouldTriggerPrompts match "USE FOR:" phrases and shouldNotTriggerPrompts match "DO NOT USE FOR:" scenarios.

Skill Not Reaching Target Score

Common causes: description > 1024 chars, anti-triggers not using "DO NOT USE FOR:" format, or conflicting triggers with other skills.

Rolling Back Changes

git reset --soft HEAD~1  # Undo last commit

Contributing

Improving the Sensei Skill

Edit SKILL.md for instruction changes
Edit references/*.md for documentation changes
Test tokens: npm run tokens -- check
Test on a sample skill before committing

Adding New Scoring Rules

Document the rule in references/scoring.md
Add examples in references/examples.md
Update scoring criteria in SKILL.md

Adding Test Framework Support

Create template in references/test-templates/{framework}.md
Document usage in references/configuration.md

Waza Trigger Tests

Sensei supports Waza for trigger accuracy testing. See references/test-templates/waza.md.

Reporting Issues

Open an issue with skill name, starting state, and git log --oneline -10.

References

Ralph Loop Pattern - Original Ralph loop implementation
Anthropic Skills Documentation - Writing guidance
Skills, Tools & MCP Development Guide - MCP integration best practices
Waza Testing Framework - Skill trigger accuracy testing
skill-creator - For creating new skills from scratch

Sensei - "The path to compliance begins with a single trigger." 🥋

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.ai-team-templates		.ai-team-templates
.ai-team		.ai-team
.github		.github
demo-docs		demo-docs
docs		docs
references		references
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
.token-limits.json		.token-limits.json
AGENTS.md		AGENTS.md
LICENSE.txt		LICENSE.txt
README.md		README.md
SKILL.md		SKILL.md
package.json		package.json

License

spboyer/sensei

Folders and files

Latest commit

History

Repository files navigation