A clean-architecture toolkit for crafting high-quality Claude Code skills.
Writing a Claude Code skill is easy. Writing a good one is not. Skills that trigger unreliably, consume too much context, or try to do everything at once make Claude less effective, not more.
skills-forge applies software engineering principles (SRP, OCP, DIP) to the skill authoring process. It gives you a CLI to scaffold, lint, and install skills — with built-in validators that catch common anti-patterns before they reach production.
New here? Read
docs/getting-started.mdfor a step-by-step walkthrough of the full authoring loop.
pip install -e ".[dev]"
# Initialize a workspace
skills-forge init
# 1. Scaffold a skill
skills-forge create \
--name python-tdd \
--category development \
--description "Use for TDD with Python. Triggers: pytest, test-first, red-green-refactor, .py files." \
--emoji 🔴
# 2. Open the generated SKILL.md and write the actual content
# (principles, workflow, constraints, hints, references...)
$EDITOR output_skills/development/python-tdd/SKILL.md
# 3. Lint until clean (fix every error, then warnings)
skills-forge lint output_skills/development/python-tdd
# 4. Install — default (Claude Code, global)
skills-forge install output_skills/development/python-tdd
# Or: universal project install — works with Gemini CLI, Codex, VS Code Copilot too
skills-forge install output_skills/development/python-tdd --target agents --scope project
# Or: write to every supported tool at once
skills-forge install output_skills/development/python-tdd --target all
# 5. Iterate: edit → re-lint → Claude picks up changes instantly
# 6. Bundle and share with your team
skills-forge pack output_skills/development/python-tdd
skills-forge publish ./python-tdd-0.1.0.skillpack \
--registry ~/code/skill-registry \
--base-url https://raw.githubusercontent.com/ficiverson/skill-registry/main \
--push
# Or try installing a real pack from the live registry right now:
skills-forge install https://raw.githubusercontent.com/ficiverson/skill-registry/main/packs/evaluation/ai-eng-evaluator-1.0.0.skillpack \
--sha256 10d16ba0db7b768219d0adb6c3dd8ea68b62e9f719a0132fdcd2bcf10271c0e6The install command creates a symlink from your skill directory into the target tool's skills directory. Because it's a symlink, you edit the source in output_skills/, re-lint, and the installed version updates without reinstalling.
All targets share the same SKILL.md format — the agentskills.io open standard adopted by Claude Code, Gemini CLI, OpenAI Codex, VS Code Copilot, and 20+ other tools.
--target |
Global path | Project path |
|---|---|---|
claude (default) |
~/.claude/skills/ |
.claude/skills/ |
gemini |
~/.gemini/skills/ |
.gemini/skills/ |
codex |
~/.codex/skills/ |
.codex/skills/ |
vscode |
(no global) | .github/skills/ |
agents |
~/.agents/skills/ |
.agents/skills/ |
all |
all of the above | all of the above |
agents is the recommended target for shared repos — every conforming tool scans .agents/skills/ at project scope, so teammates on Gemini CLI, Codex, or VS Code Copilot all pick up the same skills without any per-tool setup.
The authoring loop has five steps. Step 2 (the actual writing) is the one that matters most.
- Scaffold with
skills-forge create. This writes a starterSKILL.mdplus empty companion directories (references/,examples/,assets/,scripts/) when the relevant fields are present. - Author the content. Open
SKILL.mdand fill in the description, principles, workflow, and constraints. Drop reference docs intoreferences/, sample outputs intoexamples/, and static files intoassets/. - Lint with
skills-forge lint <path>. Fix every error and warning. The linter checks both the SKILL.md content (description length, vague language, token budget) and the filesystem (do all linked files actually exist?). - Install with
skills-forge install <path>. Use--scope projectfor a project-local install, or omit it for global. Add--target agentsto install into.agents/skills/— the universal cross-vendor path that works for every agentskills.io-conforming tool (Gemini CLI, Codex, VS Code Copilot, etc.). Use--target allto write to every supported tool at once. - Test and iterate. Open Claude Code, trigger the skill with a realistic prompt, and watch how it activates. Tweak the description and triggers until activation is reliable.
This is what you should aim for when authoring a basic skill — copy this as a starting point:
---
name: python-tdd
description: |
Use this skill when writing Python code with a test-first workflow.
Triggers on: pytest, unittest, TDD, red-green-refactor, test-driven,
.py files, "write tests first", "add a failing test".
---
STARTER_CHARACTER = 🔴
## Principles
- Write a failing test before any production code
- One assertion per test where possible
- Keep the red-green-refactor cycle under 5 minutes
## Workflow
1. Write the smallest failing test that captures the next behavior
2. Run pytest and confirm it fails for the right reason
3. Write the minimum code to make it pass
4. Refactor with the test as a safety net
5. Commit on green
## Constraints
- Never write production code without a failing test
- Never commit on red
- Never skip the refactor stepThe description is the skill's interface — Claude uses it to decide whether to activate. Aim for 30–150 tokens (≈ 20–100 words) and follow this formula:
Use this skill when [situation]. Triggers on: [keywords, file extensions, action verbs].
Good:
Use this skill when reviewing pull requests for security issues. Triggers on: PR review, security audit, OWASP, SQL injection, XSS, .py, .js, .ts files.
Bad:
A skill for code stuff. Use it for any kind of code review or whatever.
The bad example trips four validators at once: too short, vague language ("stuff", "whatever"), overly broad ("any kind"), and missing triggers (no extensions or action verbs).
If you are an AI agent (e.g. Claude) creating a skill on a user's behalf, follow this deterministic playbook:
- Clarify the skill's single responsibility. Ask the user what trigger condition should activate the skill and what concrete action it should produce. If the answer covers more than one responsibility, split it into multiple skills.
- Pick a category and a kebab-case name. Names must contain no spaces. Categories live under
output_skills/<category>/<name>/. - Run
skills-forge createwith--name,--category,--description, and--emoji. The description must follow the formula above (30–150 tokens, includes triggers). - Write the SKILL.md body using the minimal example as a structural template. Required sections: Principles, Workflow (or Instructions), Constraints. Optional but recommended: Hints, References, Examples, Assets.
- Add companion files when referenced from SKILL.md. Every link in
## References,## Examples,## Assets, and## Scriptsmust resolve to a real file on disk — the linter will fail otherwise. - Run
skills-forge lint <path>and fix every issue. Iterate until the linter reports clean. Do not stop on the first warning — fix all of them. - Stop and report. Hand the skill back to the user with the path and a one-line summary. Do not install it without explicit user permission.
A skill is ready when all of these are true:
-
skills-forge lint <path>reports clean (zero errors, zero warnings) - Description is 30–150 tokens and follows the Use when... Triggers on... formula
- At least three concrete trigger keywords (file extensions, action verbs, or domain terms)
- Principles section has 3–7 imperative bullets
- Workflow has numbered steps that a junior could follow without guessing
- Constraints lists what the skill must never do
- Every link in References/Examples/Assets/Scripts resolves to a real file
- Total token estimate is under 1200 (warning threshold)
- Principles — universal rules that always apply ("write tests first")
- Workflow — the ordered steps Claude should follow when the skill activates
- Constraints — hard "never" rules ("never commit on red")
- Hints — situational branching ("if no tests directory exists, score code_quality ≤ 4"). Use hints, not principles, when guidance is conditional.
- References — long-form docs Claude reads only when a workflow step needs them. Use this to keep SKILL.md under the token budget.
- Examples — sample outputs that calibrate Claude's quality bar. The single highest-leverage thing you can add.
- Assets — static files (CSVs, templates, configs) that scripts or Claude reference at runtime.
- depends_on — declare another skill that should also be loaded when this one activates.
The project follows clean architecture with four layers:
src/skill_forge/
├── domain/ # Core: models, validators, ports (zero dependencies)
├── application/ # Use cases: create, lint, install, pack/unpack, publish, install-from-url
├── infrastructure/ # Adapters: filesystem, markdown, symlinks, zip packer, git registry, http fetcher
└── cli/ # Entry point: typer CLI + composition root
Dependency rule: dependencies point inward. Domain knows nothing about infrastructure. Use cases depend on ports (abstractions), not adapters (implementations). The CLI's factory.py is the composition root that wires everything together.
A skill is a directory with this structure:
my-skill/
├── SKILL.md # Core: description, principles, workflow, constraints
├── references/ # On-demand docs loaded only when a step needs them
├── scripts/ # Executable automation (generators, validators)
├── examples/ # Sample outputs showing expected format and quality
└── assets/ # Static files (CSVs, templates, images, config)
The SKILL.md frontmatter supports these fields:
---
name: my-skill
description: |
What this skill does and when to trigger it.
depends_on: other-skill (reason for dependency)
---The body supports these sections: Principles, Workflow/Instructions, Constraints, Hints, References, Examples, Assets.
The ## Hints section contains conditional guidance that Claude applies only when relevant. Unlike principles (always apply) or constraints (hard rules), hints are situational branching logic:
## Hints
- If the repo has no tests directory, score code_quality <= 4
- If the project uses TypeScript, look for tsconfig.json instead of mypy
- If this is a monorepo, evaluate each service separatelySkills can declare dependencies on other skills. This tells Claude to also read the dependency's SKILL.md when the skill activates:
depends_on: pdf (PDF generation for the final report)Multiple dependencies use comma separation:
depends_on: pdf (report output), xlsx (data export)The ## Examples section links to sample outputs in examples/. These are the single most useful calibration tool for Claude — an example output is worth more than a page of specification:
## Examples
- [Sample evaluation JSON](examples/example-eval.json)
- [Example test report](examples/example-report.md)The ## Assets section links to static files that scripts or Claude can reference at runtime:
## Assets
- [Level thresholds](assets/level-thresholds.csv)
- [Docker reference](assets/docker-cypress.md)The key insight: the same principles that make code maintainable also make skills effective.
| Principle | Applied to skills |
|---|---|
| SRP | One skill, one responsibility. Split broad skills. |
| OCP | Extend via references/, don't bloat SKILL.md. |
| ISP | Keep the description lean — it's the skill's interface. |
| DIP | Skills define principles, not tool-specific commands. |
See docs/clean-principles-for-skills.md for the full guide, and docs/getting-started.md for a step-by-step walkthrough of the complete workflow (create → validate → install → test → iterate).
The linter runs two types of validators:
Pure validators (check the Skill object):
| Rule | Severity | What it checks |
|---|---|---|
description-too-short |
warning | Description < 30 tokens |
description-too-long |
error | Description > 150 tokens |
description-vague-language |
warning | Vague words like "stuff", "things" |
description-overly-broad |
error | Phrases like "any task", "everything" |
description-missing-triggers |
warning | No file extensions or action verbs |
missing-principles |
warning | No principles section |
context-budget-exceeded |
error | Total tokens > 2000 |
context-budget-high |
warning | Total tokens > 1200 |
reference-too-deep |
warning | Nested references (> 1 level) |
possible-srp-violation |
info | Instructions > 800 words |
missing-starter-character |
info | No STARTER_CHARACTER defined |
missing-examples |
info | Has scripts but no example outputs |
invalid-dependency-name |
error | Malformed depends_on entry |
Path-aware validators (check the filesystem):
| Rule | Severity | What it checks |
|---|---|---|
broken-reference-link |
error | Reference file doesn't exist on disk |
broken-example-link |
error | Example file doesn't exist on disk |
broken-asset-link |
error | Asset file doesn't exist on disk |
broken-script-link |
error | Script file doesn't exist on disk |
Once a skill is good, you'll want to share it with your team. skills-forge ships with a .skillpack format — a single zip file containing one or more skills plus a JSON manifest — so you can distribute skills via Slack, Notion, email, GitHub releases, or any other channel that can move a file.
Each skill carries its own semantic version in frontmatter:
---
name: ai-eng-evaluator
version: 1.0.0
description: |
...
---The pack command auto-derives its version from the skill itself, so you don't need to pass --version for single-skill packs. Bump the skill's frontmatter version when you ship a change, and the next pack will use the new value. Use skills-forge create --version 0.1.0 to set an initial version when scaffolding.
# Bundle a single skill — version auto-derived from frontmatter
skills-forge pack output_skills/evaluation/ai-eng-evaluator \
--output ./packs/
# → ./packs/ai-eng-evaluator-1.0.0.skillpack
# Bundle multiple skills together (explicit pack version recommended)
skills-forge pack \
output_skills/evaluation/ai-eng-evaluator \
output_skills/evaluation/user-story-test-cases \
--name evaluation-bundle \
--version 1.0.0 \
--output evaluation-bundle.skillpack
# A teammate receives the file and unpacks it
skills-forge unpack evaluation-bundle.skillpack --output output_skills/
# Then lints and installs as usual
skills-forge lint output_skills/evaluation/ai-eng-evaluator
skills-forge install output_skills/evaluation/ai-eng-evaluatorPack version precedence: an explicit --version always wins; otherwise a single-skill pack takes the skill's own version; multi-skill bundles without --version fall back to the default.
A .skillpack is just a zip you can inspect with any zip tool. The manifest at the root looks like:
{
"format_version": "1",
"name": "evaluation-bundle",
"version": "1.0.0",
"author": "me@fernandosouto.dev",
"created_at": "2026-04-06T13:43:36+00:00",
"description": "AI engineering evaluator + user-story test-case generator",
"tags": ["evaluation", "ai-engineering", "test-cases"],
"owner": {"name": "Fernando Souto", "email": "me@fernandosouto.dev"},
"skills": [
{"category": "evaluation", "name": "ai-eng-evaluator", "version": "1.0.0"},
{"category": "evaluation", "name": "user-story-test-cases", "version": "0.1.0"}
]
}Each skill records its own version in the manifest, so a multi-skill bundle can mix and match. The optional description, tags, owner, and deprecated fields travel with the pack and become the defaults when you publish it to a registry — passing the same flags on publish overrides them. Older packs without these fields still unpack and install fine; the codec fills in safe defaults on read.
The packer excludes __pycache__/, .DS_Store, .git/, and *.pyc files by default. Unpack rejects archives with ../ paths to defend against zip-slip attacks.
For broader distribution, drop the .skillpack into a shared Git repo with CI running skills-forge lint on every PR. That gives you version control, code review, and rollback for free without standing up any extra infrastructure.
skills-forge publish turns any git repo into a free, CDN-backed skill registry. No GitHub Actions, no releases, no API server — just a normal repo where each pack lives at a stable raw URL. Teammates install directly from that URL.
A live example registry built with skills-forge lives at github.com/ficiverson/skill-registry — every URL in this section points at it, so you can curl the index, install a real pack, and see exactly what your own registry will look like.
The registry repo layout is fixed:
skill-registry/ ← any git repo (GitHub, GitLab, self-hosted)
├── index.json ← machine catalog (auto-maintained)
└── packs/
└── <category>/
└── <name>-<version>.skillpack
One-time setup — create the registry repo and clone it locally:
git clone git@github.com:ficiverson/skill-registry.gitPublish a pack — point at the local clone and the public base URL:
skills-forge pack output_skills/evaluation/ai-eng-evaluator
# → ./ai-eng-evaluator-1.0.0.skillpack
skills-forge publish ./ai-eng-evaluator-1.0.0.skillpack \
--registry ~/code/skill-registry \
--base-url https://raw.githubusercontent.com/ficiverson/skill-registry/main \
--message "ai-eng-evaluator 1.0.0" \
--pushOutput:
✔ Published ai-eng-evaluator v1.0.0
path: packs/evaluation/ai-eng-evaluator-1.0.0.skillpack
sha256: 10d16ba0…
git: committed
git: pushed
Install URL:
https://raw.githubusercontent.com/ficiverson/skill-registry/main/packs/evaluation/ai-eng-evaluator-1.0.0.skillpack
Teammates can install with:
skills-forge install https://raw.githubusercontent.com/ficiverson/skill-registry/main/packs/evaluation/ai-eng-evaluator-1.0.0.skillpack --sha256 10d16ba0db7b768219d0adb6c3dd8ea68b62e9f719a0132fdcd2bcf10271c0e6
The publisher copies the pack into packs/<category>/<name>-<version>.skillpack, regenerates index.json, commits, and (with --push) pushes. Drop --push if you'd rather review the diff first; the commit is already on your local branch.
Install from a URL — skills-forge install accepts a URL alongside the existing local-path form:
# Direct URL — works for any https:// pointing at a .skillpack
skills-forge install https://raw.githubusercontent.com/ficiverson/skill-registry/main/packs/evaluation/ai-eng-evaluator-1.0.0.skillpack
# With sha256 verification (recommended — digest from index.json)
skills-forge install https://raw.githubusercontent.com/ficiverson/skill-registry/main/packs/evaluation/ai-eng-evaluator-1.0.0.skillpack \
--sha256 10d16ba0db7b768219d0adb6c3dd8ea68b62e9f719a0132fdcd2bcf10271c0e6
# Local install still works exactly as before
skills-forge install output_skills/evaluation/ai-eng-evaluatorBehind the scenes the URL form fetches the pack to a temp file, verifies the sha256 if you supplied one, unpacks it via the existing unpack flow, and then installs each contained skill into ~/.claude/skills/ (or .claude/skills/ with --scope project).
Multi-platform install (--target) — skills-forge supports every major agent-CLI tool. All targets use the same SKILL.md format; only the destination path differs:
# Install into Gemini CLI (global)
skills-forge install output_skills/development/python-tdd --target gemini
# Install into OpenAI Codex (global)
skills-forge install output_skills/development/python-tdd --target codex
# Universal project path — works for all tools (recommended for shared repos)
skills-forge install output_skills/development/python-tdd --target agents --scope project
# VS Code Copilot (project-only — no global skills dir in VS Code)
skills-forge install output_skills/development/python-tdd --target vscode --scope project
# Write to every supported tool at once
skills-forge install output_skills/development/python-tdd --target allExport to chatbot / API platforms (export --format) — agent-CLI tools (Claude Code, Gemini CLI, Codex, VS Code) load SKILL.md natively via install --target. For platforms that have no file-system skill directory, use export to render the skill in their native format:
--format |
Output file | Target platform |
|---|---|---|
system-prompt (default) |
<name>.system-prompt.md |
Any chat UI system-prompt field |
gpt-json |
<name>.gpt.json |
OpenAI Custom GPT / Assistants API |
gem-txt |
<name>.gem.txt |
Google Gemini Gems |
bedrock-xml |
<name>.bedrock.xml |
AWS Bedrock agent prompt template |
mcp-server |
<name>-mcp-server.py |
Any MCP-capable host (Claude Desktop, Cursor, …) |
# Plain system prompt — paste into any chat UI
skills-forge export ./packs/productivity-1.0.0.skillpack
# OpenAI Custom GPT config JSON
skills-forge export ./packs/productivity-1.0.0.skillpack --format gpt-json
# Gemini Gem instructions
skills-forge export ./packs/productivity-1.0.0.skillpack --format gem-txt
# AWS Bedrock XML prompt template
skills-forge export ./packs/productivity-1.0.0.skillpack --format bedrock-xml
# Self-contained Python MCP Prompts server
skills-forge export ./packs/productivity-1.0.0.skillpack --format mcp-server -o ./exports/
# → Run with: uvx --from "mcp[cli]" mcp run ./exports/productivity-mcp-server.pyThe MCP server format deserves special mention: it generates a single runnable Python file that exposes the skill as an MCP Prompts primitive. Any MCP-compatible host (Claude Desktop, Cursor, VS Code, OpenAI desktop app) can connect via stdio and inject the skill at inference time — no installation step on the end user's machine. The generated file includes a ready-to-paste mcpServers configuration for Claude Desktop using uvx.
Private repos — set GITHUB_TOKEN in your environment and the fetcher will pass it as a token Authorization header on raw.githubusercontent.com requests, so private registries work without any extra configuration.
Integrity — every published pack records its sha256 in index.json, and install --sha256 ... verifies the download against that digest before unpacking. The fetcher also caps downloads at 50 MB by default to refuse runaway responses.
Why this beats the alternatives
- vs
gh release: no per-version release noise, plain file URLs are simpler to share, and the registry is one browsable folder. - vs S3 / R2: no AWS account, no IAM, no boto3 dependency. Free for public registries.
- vs Slack uploads: discoverable. New teammates find every published skill in one place instead of digging through channel history.
- vs synced folders: works across orgs and works for open-source distribution, not just intra-team.
Four templates in templates/:
- minimal — Just frontmatter, principles, instructions, constraints
- with-references — Adds references, examples, hints, and depends_on
- with-scripts — Adds scripts, examples, assets, and validation
- full-featured — All sections: workflow steps, references, examples, assets, hints, depends_on, validation scripts
# Install dev dependencies
pip install -e ".[dev]"
# Run tests (178 tests)
pytest
# Lint code
ruff check src/ tests/
# Type check
mypy src/MIT
I decided to create this repo after reading this eferro post on encoding experience into AI skills.
