SkillProof is an open-source coding-agent skill for verifying whether another coding-agent skill helps in a real repository on a real task slice with acceptable risk.
Instead of trusting README copy or prompt vibes, SkillProof is designed to make an agent:
- inspect the candidate skill
- inspect the target repo
- normalize a task suite and deterministic validators
- run baseline without the skill
- run treatment with the skill
- compare outcomes fairly
- analyze compatibility and risk
- emit an auditable report inside
.skillproof/
A skill is just a directory with a SKILL.md entrypoint. The fastest path is to symlink this repo into your agent's skill folder.
Set this once:
export SKILLPROOF_DIR="/absolute/path/to/SkillProof"Or skip manual symlinks and use the installer script:
python "$SKILLPROOF_DIR/scripts/install_skillproof.py" --agent codexCodex-style personal skills install under $CODEX_HOME/skills by convention. The default is ~/.codex/skills.
Fastest path:
python "$SKILLPROOF_DIR/scripts/install_skillproof.py" --agent codexexport CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
mkdir -p "$CODEX_HOME/skills"
ln -sfn "$SKILLPROOF_DIR" "$CODEX_HOME/skills/skillproof"
test -f "$CODEX_HOME/skills/skillproof/SKILL.md"Then use it explicitly in a prompt:
Use $skillproof to verify whether this candidate skill helps in this repo on these tasks with acceptable risk.
Claude Code personal skills live at ~/.claude/skills/<skill-name>/SKILL.md.
Fastest path:
python "$SKILLPROOF_DIR/scripts/install_skillproof.py" --agent claude-personalmkdir -p "$HOME/.claude/skills"
ln -sfn "$SKILLPROOF_DIR" "$HOME/.claude/skills/skillproof"
test -f "$HOME/.claude/skills/skillproof/SKILL.md"Then invoke it directly:
/skillproof Verify whether this candidate skill helps in this repo on these tasks with acceptable risk.
Or ask naturally:
Verify whether this candidate skill helps in this repo on these tasks with acceptable risk.
If you want the skill available only in one repository:
Fastest path:
python "$SKILLPROOF_DIR/scripts/install_skillproof.py" --agent claude-project --project-dir /path/to/repomkdir -p .claude/skills
ln -sfn "$SKILLPROOF_DIR" .claude/skills/skillproof
test -f .claude/skills/skillproof/SKILL.mdThis is the cleanest option when you want SkillProof committed or shared as a project-specific workflow.
Once the skill is installed, the shortest path to a real report is:
- Scaffold
.skillproof/into the target repo:
python "$SKILLPROOF_DIR/scripts/bootstrap_skillproof.py" /path/to/target-repo-
Edit
/path/to/target-repo/.skillproof/verification.yaml. -
Edit
/path/to/target-repo/.skillproof/tasks.yaml. -
Run the verification loop:
python "$SKILLPROOF_DIR/scripts/run_skillproof.py" --repo /path/to/target-repo- Open the report:
open /path/to/target-repo/.skillproof/reports/skill-report.mdIf you only want to try SkillProof once, you can skip agent installation entirely and run the two bundled scripts directly.
Most users do not need to understand the internals first.
They need to know:
- where to install the skill
- how to invoke it
- how to get the first report
- where the outputs land
That is the whole adoption path.
SkillProof is a skill repository first.
The primary interface is SKILL.md. The bundled Python scripts are just reliability tooling that help the skill scaffold and execute the verification loop.
The recommended user path is:
- install or symlink the repo into the agent's skill folder
- invoke
skillproof - let the bundled scripts create and run
.skillproof/
.
├── SKILL.md
├── PRD.md
├── README.md
├── LICENSE
├── .gitignore
├── agents/
│ └── openai.yaml
├── references/
│ ├── artifact-contract.md
│ ├── compatibility-model.md
│ ├── risk-model.md
│ └── verification-method.md
├── scripts/
│ ├── bootstrap_skillproof.py
│ ├── run_skillproof.py
│ └── skillproof_core.py
└── tests/
Use $skillproof to verify whether this candidate skill helps in this repo on these tasks with acceptable risk.
For Claude Code, direct invocation via /skillproof should also work because the skill name is skillproof.
SkillProof leaves durable evidence inside the target repo:
.skillproof/reports/skill-report.md.skillproof/reports/skill-report.json.skillproof/reports/run-manifest.json.skillproof/reports/task-results.json.skillproof/risk/risk-summary.json.skillproof/compatibility/compatibility-summary.json
If the user can find those files, the run succeeded.
For v1, the .yaml files are JSON-compatible YAML so they can be parsed without third-party dependencies.
Example .skillproof/verification.yaml:
{
"adapter": "shell-command-template",
"candidate_skill": "../skills/my-skill",
"baseline_command_template": "codex exec --repo {repo} --task {task_prompt}",
"treatment_command_template": "codex exec --repo {repo} --task {task_prompt} --skill {candidate_skill}",
"timeout_seconds": 600,
"repeats": 1,
"artifact_root": ".skillproof"
}Example .skillproof/tasks.yaml:
{
"tasks": [
{
"id": "bugfix-001",
"prompt": "Fix the bug and preserve the existing contract.",
"tags": ["backend", "bugfix"],
"scope": ["src/"],
"validation_commands": [
"pytest tests/test_contract.py -q"
]
}
]
}SkillProof v1 supports one adapter: shell-command-template.
The command templates can interpolate:
{repo}{task_id}{task_prompt}{task_scope}{task_tags}{candidate_skill}{python}{condition}
codex exec --repo {repo} --task {task_prompt}
codex exec --repo {repo} --task {task_prompt} --skill {candidate_skill}
claude-code run --cwd {repo} --prompt {task_prompt}
claude-code run --cwd {repo} --prompt {task_prompt} --append-skill {candidate_skill}
SkillProof does not ship hard-coded agent integrations in v1. The shell template is the portability layer.
- Verify the symlink or copied directory ends in
skillproof. - Verify the target path contains
SKILL.md. - Re-run the installer script if the symlink target moved.
- For Codex, invoke it explicitly with
Use $skillproof .... - For Claude Code, invoke it explicitly with
/skillproof ....
- Check that
verification.yamlpoints to a real candidate skill path. - Check that the baseline and treatment command templates actually run in your environment.
- Check that each task has at least one deterministic
validation_command. - Run
python "$SKILLPROOF_DIR/scripts/run_skillproof.py" --repo /path/to/target-repodirectly to isolate install issues from runtime issues.
Tell them to do exactly this:
- symlink the repo into the agent's skill folder
- run
bootstrap_skillproof.py - fill in
verification.yamlandtasks.yaml - run
run_skillproof.py - open
skill-report.md
SkillProof makes the same judgment along three axes described in the PRD:
- utility
- compatibility
- risk
The recommendation is derived from those axes, but the underlying evidence remains visible.
The current repository includes:
- a publishable skill definition
- reference docs for the verification method
- bootstrap and run tooling
- tests for bootstrapping, risk analysis, compatibility analysis, and end-to-end reporting