A sample AI agent skill for systematic web application penetration testing. Built on a Burp Suite-inspired CLI toolkit that an AI agent can execute autonomously.
This repo demonstrates the skill architecture used at Strobes to give AI agents structured, methodology-driven cybersecurity capabilities.
A skill is more than a collection of scripts. It's a four-layer system that encodes domain expertise into a format an AI agent can follow:
┌─────────────────────────────────────────────┐
│ SKILL.md │ Methodology Layer
│ (phases, playbooks, decision trees, rules) │ "What to do and when"
├─────────────────────────────────────────────┤
│ scripts/ │ Scripts Layer
│ (CLI tools the agent executes) │ "How to do it"
├─────────────────────────────────────────────┤
│ scripts/lib/ │ Shared Library Layer
│ (db, output, http, parsing) │ "Reusable foundations"
├─────────────────────────────────────────────┤
│ project.db │ Data Layer
│ (SQLite: state, results, evidence) │ "Persistent memory"
└─────────────────────────────────────────────┘
The SKILL.md is the most important file. It teaches the agent when to use each tool, how to sequence phases, and what decisions to make. Without it, the scripts are just a toolbox with no instructions.
| Script | Subcommands | Purpose | Phase |
|---|---|---|---|
scope.py |
init add remove list check import export info |
Project setup, target scoping | 1 |
testcase.py |
import-owasp add list start complete skip progress coverage |
OWASP test tracking | 1, 5, 6 |
wordlist.py |
list info get create append merge delete |
Manage wordlists for fuzzing | 1 |
discover.py |
dirs params vhosts subdomains results |
Content & parameter discovery | 2 |
sitemap.py |
add add-bulk list tree params search stats mark export |
Endpoint tracking | 2, 3, 6 |
repeater.py |
send history replay show export |
Send/inspect HTTP requests | 3 |
decoder.py |
encode decode smart hash jwt-decode jwt-tamper chain |
Encoding, hashing, JWT ops | 3, 4 |
intruder.py |
run results list-runs |
Position-based payload fuzzing | 4 |
turbo.py |
race flood results |
High-concurrency / race testing | 4 |
comparer.py |
diff |
HTTP response diffing | 4 |
sequencer.py |
collect analyze |
Token entropy analysis | 4 |
finding.py |
add evidence list get update delete stats report |
Findings & reporting | 5, 6 |
- Setup & Scope — Initialize project, define targets, import OWASP testcases
- Discovery — Directory brute-forcing, parameter discovery, subdomain enumeration
- Analysis & Mapping — Build sitemap, inspect requests, decode tokens
- Testing — Fuzz parameters, race condition testing, response diffing, entropy analysis
- Documentation — Record findings with severity, CWE, evidence, and remediation
- Reporting — Generate report, verify coverage, export results
See SKILL.md for the full methodology with exact commands for each step.
# 1. Initialize a project
python3 skill/scripts/scope.py init --project "Example Corp" --tester "yourname"
# 2. Add targets
python3 skill/scripts/scope.py add --type domain --value "example.com"
python3 skill/scripts/scope.py add --type url --value "https://example.com"
# 3. Import OWASP testcases
python3 skill/scripts/testcase.py import-owasp
# 4. Discover directories
python3 skill/scripts/discover.py dirs --url "https://example.com" --wordlist common
# 5. Fuzz a parameter
python3 skill/scripts/intruder.py run --url "https://example.com/search?q=§FUZZ§" \
--payloads xss-basic --match-regex "<script>"
# 6. Record a finding
python3 skill/scripts/finding.py add --title "Reflected XSS in search" \
--severity high --cwe "CWE-79" --location "/search?q=" \
--description "User input reflected without encoding" \
--remediation "Implement output encoding"
# 7. Generate report
python3 skill/scripts/finding.py report --format markdown- Python 3.8+
- No external dependencies (stdlib only)
See BUILDING_SKILLS.md for a complete guide on building methodology-driven agent skills for any domain.
MIT