Evaluate, judge, and optimize agent skills (SKILL.md files) in pull requests.
A single composite GitHub Action that runs static checks, a security scan, an
LLM-as-judge rubric, or an LLM rewrite — and posts the results as a PR comment.
Skill-Lab is the evaluation framework; this action is the GitHub front door to it.
- Zero-config. The default mode calls
api.skill-lab.dev, which absorbs the LLM cost. No API key, no installation, no billing to wire up. - Escape hatch for privacy. Supply an
api-keyand the action switches to BYOK mode: it installs the OSSskill-labCLI on the runner and uses your key / model of choice. - Three modes.
check(static + security),judge(adds LLM rubric),optimize(adds a rewrite proposal). - Slash-command optimize flow.
/optimizetriggers a rewrite; reviewers can tweak the proposed content inside the PR comment;/apply-optimizecommits it. - Single PR comment. Upserted via an HTML marker — no comment spam.
# .github/workflows/skill-review.yml
name: Skill Review
on:
pull_request:
paths: ['**/SKILL.md']
permissions:
pull-requests: write
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: skilllabdev/skill-lab-action@v1
with:
mode: judgeThat's the whole thing — drop it in, open a PR that touches a SKILL.md file,
and you'll get an evaluation comment.
| Mode | What runs | LLM? | Trigger pattern |
|---|---|---|---|
check |
static checks + security scan | no | pull_request on **/SKILL.md |
judge |
static + security + 8-criterion LLM rubric | yes | pull_request on **/SKILL.md |
optimize |
judge + LLM rewrite proposal | yes | /optimize comment |
apply |
writes optimized content from last comment, commits, pushes | no | /apply-optimize comment |
In BYOK mode the action shells out to the sklab CLI. The mapping is:
Action mode |
Equivalent BYOK command | Backend endpoint |
|---|---|---|
check |
sklab evaluate --skip-review (full 37-check static + security, no LLM) |
GET /v1/repos/:o/:r/evaluate |
judge |
sklab evaluate --model <model> (static + security + LLM rubric in one call) |
GET /evaluate + POST /judge |
optimize |
sklab evaluate --model <model> + sklab optimize --model <model> |
GET /evaluate + POST /judge + POST /optimize |
apply |
no CLI equivalent — parses the last sklab-action comment and commits | no backend call |
Note: mode: check is not sklab check — the CLI's check subcommand runs only HIGH-severity checks as a fast pre-flight. The action's check mode runs the full static + security pipeline, matching sklab evaluate --skip-review.
| Name | Default | Description |
|---|---|---|
mode |
check |
check, judge, optimize, or apply. See mode → CLI mapping below. |
fail-threshold |
0 |
Minimum static quality score (0–100) to pass. 0 = informational only. |
security-gate |
true |
Fail the check when the security scan returns BLOCK. |
judge-threshold |
0 |
Minimum judge score (0–100) to pass. 0 = informational only. |
spec-only |
false |
Skip quality-suggestion checks; run only spec-required ones. |
api-key |
"" |
If set, switches to BYOK — installs the CLI and uses your key. If empty, calls api.skill-lab.dev (free tier). |
model |
claude-haiku-4-5-20251001 |
LLM model (BYOK only). Prefix determines which provider env var is set: claude-* → ANTHROPIC_API_KEY, gpt-*/o1-*/o3-* → OPENAI_API_KEY, gemini-* → GEMINI_API_KEY. |
comment |
true |
Post / update a PR comment with results. |
sklab-version |
latest |
Pin the CLI version (BYOK only). |
max-skills |
10 |
Cap the number of skills processed per run. |
path |
"" |
Evaluate this skill explicitly (e.g. skills/my-skill). Overrides auto-detection — used by the slash-command workflows below. |
github-token |
${{ github.token }} |
Token used for gh api calls and the apply commit. |
| Name | Type | Description |
|---|---|---|
results |
JSON string | Array of per-skill results (see schema below). optimize.optimized_content is stripped to fit GitHub's step-output size limit — use results-file for the full payload. |
results-file |
path | Path on the runner to the full, untrimmed results JSON (under $RUNNER_TEMP). Valid for the duration of the job. |
any-failed |
'true'/'false' |
Any skill failed any gate. |
skills-count |
integer | Number of skills processed. |
total-cost |
string | "free (backend tier)" or "~$0.0043 (1200 tokens)" for BYOK. |
applied |
'true'/'false' |
apply mode only — did a commit actually land on the PR branch. |
Fields that weren't computed for the mode (e.g. judge in check mode) are {}.
name: Skill Check
on:
pull_request:
paths: ['**/SKILL.md']
permissions:
pull-requests: write
jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: skilllabdev/skill-lab-action@v1
with:
mode: check
fail-threshold: 75
security-gate: truename: Skill Optimize
on:
issue_comment:
types: [created]
permissions:
pull-requests: write
contents: read
jobs:
optimize:
if: >-
github.event.issue.pull_request &&
startsWith(github.event.comment.body, '/optimize')
runs-on: ubuntu-latest
steps:
- name: Extract path argument (optional)
id: args
env:
BODY: ${{ github.event.comment.body }}
run: |
# "/optimize skills/my-skill" → skills/my-skill. Empty → evaluate every changed SKILL.md.
path="$(printf '%s' "$BODY" | awk '{print $2}')"
echo "path=${path:-}" >> "$GITHUB_OUTPUT"
- name: Check out PR head
uses: actions/checkout@v4
with:
ref: refs/pull/${{ github.event.issue.number }}/head
- uses: skilllabdev/skill-lab-action@v1
with:
mode: optimize
path: ${{ steps.args.outputs.path }}The action posts a collapsible diff in the PR comment. Reviewers can edit the proposed content inside the markdown code block before applying.
name: Skill Apply Optimized
on:
issue_comment:
types: [created]
permissions:
pull-requests: write
contents: write
jobs:
apply:
if: >-
github.event.issue.pull_request &&
startsWith(github.event.comment.body, '/apply-optimize')
runs-on: ubuntu-latest
steps:
- name: Get PR head ref
id: pr
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
ref=$(gh api "repos/$GITHUB_REPOSITORY/pulls/${{ github.event.issue.number }}" --jq '.head.ref')
echo "ref=$ref" >> "$GITHUB_OUTPUT"
- uses: actions/checkout@v4
with:
ref: ${{ steps.pr.outputs.ref }}
token: ${{ secrets.GITHUB_TOKEN }}
- uses: skilllabdev/skill-lab-action@v1
with:
mode: applyjobs:
judge:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: skilllabdev/skill-lab-action@v1
with:
mode: judge
api-key: ${{ secrets.ANTHROPIC_API_KEY }}
model: claude-sonnet-4-6Combine workflows 1 + 2 + 3 in the same repo. The review workflow gates the PR
merge; /optimize generates a rewrite proposal when requested; /apply-optimize
lands it on the branch.
- uses: skilllabdev/skill-lab-action@v1
with:
mode: check
comment: false
fail-threshold: 70- Fork PRs can't post comments.
GITHUB_TOKENis read-only on fork PRs, so the action falls back to$GITHUB_STEP_SUMMARY. The evaluation still runs; reviewers see it in the job summary panel. - Backend 429 / unreachable. The action prints the error and suggests
adding
api-key: ${{ secrets.ANTHROPIC_API_KEY }}for BYOK fallback. Pair the two patterns withcontinue-on-error+ a second step if you want automatic failover. /apply-optimizesays "no blocks found". The latest<!-- sklab-action -->comment on the PR doesn't contain<!-- sklab-optimized:... -->markers — run/optimizefirst.- Apply doesn't commit. Ensure the caller workflow has
permissions: { contents: write }and checks out the PR head branch (not the detached ref). See example 3 above. - Monorepo with lots of skills. Bump
max-skills(default 10) and setpaths-ignoreon the workflow to skip non-skill file changes.
Scripts live in scripts/ and are composed by action.yml — no Docker, no
Node, just bash + jq + gh (preinstalled on GitHub runners) + python3
(for the apply-mode regex).
Self-tests run in .github/workflows/test.yml against tests/fixtures/:
good-skill/— expected to pass all gatesbad-skill/— low static score (informational)malicious-skill/— expected to trigger a securityBLOCK
To debug locally, export the env vars each script requires and run it directly, or invoke the action against a branch in your own repo:
- uses: skilllabdev/skill-lab-action@main # or any branch / SHA
with:
mode: check
path: tests/fixtures/good-skillApache-2.0 — see LICENSE.
[ { "path": "skills/my-skill/SKILL.md", "skill_dir": "skills/my-skill", "mode": "judge", "source": "backend", "static": { "quality_score": 87.5, "passed": 30, "total": 37, "dimensions": [...], "failed_checks": [...] }, "security": { "scan_status": "ALLOW", "findings": [] }, "judge": { "judge_score": 78.5, "activation_score": 82, "instruction_score": 75, "verdict": "...", "criteria": [...], "suggestions": [...], "usage": {...} }, "optimize": { "original_score": 78.5, "optimized_score": 88.5, "original_failures": 4, "optimized_failures": 1, "optimized_content": "...", "usage": {...} }, "gates": { "static_pass": true, "security_pass": true, "judge_pass": true, "overall_pass": true }, "error": null } ]