CTFagent

An AI-powered penetration testing framework built on Claude Code. It turns Claude into a methodical pentesting partner that handles recon, exploitation, enumeration, privilege escalation, and professional report generation through a structured skill-based workflow. Originally developed against shared lab platforms (e.g. HTB Pro Labs, OffSec, TryHackMe) and now generalised to broader pentest engagements (web/API/AD/cloud) via the bundled yaklang/hack-skills catalog.

The repo ships target-clean: no engagement data is committed. Each /pentest-start <ip> creates a fresh boxes/<ip>/ for the target, and per-box artefacts (replay scripts, screenshots, reports) stay local to your clone.

Full documentation: the docs/ tree is the in-repo guide — getting started, workflow, skills reference (all 30), scripts reference (all 38), engagement profiles & ROE, architecture, and developing the framework. This README is the quick-start; docs/ goes deep.

What it does

You give it a target IP and optional hints. It runs the full pentest lifecycle:

/pentest-start 10.14.1.5 BoxName --hints -

From there, Claude works through a defined phase sequence — recon, foothold, local enum, cracking, privesc, documentation — using specialized skills and helper scripts. Each phase produces structured artifacts (summaries, exploit chains, reports) that feed into the next.

The framework enforces:

Cognitive discipline — hypothesis-before-action, confidence tracking, explicit pivot thresholds, verification gates at every step
Direct-First principle — always try credentials directly (SSH/su) before cracking them
Token efficiency — raw scan output never enters the AI context; only digested summaries do
Honesty auditing — exploit chains are verified against actual proof artifacts before reports are generated
Screenshot proof — mandatory screenshot capture at every proof moment via LiveAuditShell
Structured logging — every finding, decision, and pivot is logged with timestamps and visibility levels
shared-lab safety defaults — cleanup is mandatory, no persistence, no network-wide scans, no destructive exploits
Parallel work — background cracking with dead-time probes, concurrent web enumeration, parallel loot parsing

Setup on a new host

Just want to start a box? After install, jump to STARTING.md — the beginner-friendly walkthrough with a worked example, glossary, and FAQ.

This section gets the framework running on a fresh Kali (or compatible Linux) host in about 5 minutes of wall-clock time. Each step explains what it does and why, so you can troubleshoot if something doesn't match.

Prerequisites checklist

You need each of these before running the installer. The installer does NOT install these — they're external dependencies.

Kali Linux (tested on 2026.1+ / kernel 6.x). Other Debian-based distros (Ubuntu 22.04+, Parrot OS) work for most flows. The framework leans on Kali's pre-bundled pentest toolset; on a non-Kali distro you'll need to install the tools below manually. The full inventory of which tool each phase uses is in lib/mcp_tools.md.
Claude Code CLI installed and authenticated — install guide. Verify with claude --version (any recent version works). This is the program you'll run inside the repo to start an engagement.
git with submodule support (any modern git ≥ 2.20). Check with git --version. The framework includes a 102-skill pentest-knowledge catalog as a git submodule, so submodule support is required.
lab VPN connection if you're running against lab platform targets. After connecting, your VPN interface will be vpn0 and you'll have an IP in the 172.16.x.x range. The framework reads this dynamically via bin/pentest-vpnip — never hardcode the address, it changes across reconnects. Skip this if you're using the framework on non-targets (HTB, real engagements, etc.).
Standard Kali tools — nmap, dirb, gobuster, ffuf, nikto, john, searchsploit, wpscan, whatweb, enum4linux, smbclient, smbmap, hashcat, linpeas. On Kali these come pre-installed. On Ubuntu/Debian: sudo apt install -y nmap dirb gobuster ffuf nikto john smbclient smbmap hashcat; the rest you'll need to install individually (searchsploit from the exploitdb package, wpscan via gem install, linpeas from the PEASS-ng releases).
Optional but recommended: ttyd + Playwright. These power the LiveAuditShell — a real bash terminal hosted via ttyd that Claude can drive and screenshot during the engagement. Without them, the framework still works but you lose automatic proof-screenshot capture. Install: bash pip install playwright && playwright install chromium wget https://github.com/tsl0922/ttyd/releases/latest/download/ttyd.x86_64 \ -O ~/.local/bin/ttyd && chmod +x ~/.local/bin/ttyd

Install (3 commands)

# 1. Clone with submodule. --recurse-submodules ensures the
#    yaklang/hack-skills catalog (102 entries, ~2.5MB) gets populated
#    at lib/hack-skills/ during the clone.
git clone --recurse-submodules https://github.com/nickatnight96/CTFagent.git ~/CTFagent
cd ~/CTFagent

# 2. Run the installer. This:
#      a) Confirms the submodule is initialised (re-runs init if not)
#      b) Copies all 27 slash-command definitions to ~/.claude/skills/
#      c) Copies all 5 subagent definitions to ~/.claude/agents/
#      d) Creates ~/.claude/CLAUDE.md referencing this repo's CLAUDE.md
#         (auto-loaded by Claude Code in every session)
#      e) Marks bin/pentest-* scripts executable
#    Total: ~5 seconds.
./install.sh

# 3. Add bin/ to PATH so pentest-* scripts are available outside the repo.
#    Adding to both ~/.zshrc and ~/.bashrc covers the common shells.
#    The 'source' line activates the change in your current shell.
echo 'export PATH="$HOME/CTFagent/bin:$PATH"' >> ~/.zshrc
echo 'export PATH="$HOME/CTFagent/bin:$PATH"' >> ~/.bashrc
source ~/.zshrc 2>/dev/null || source ~/.bashrc

If you cloned without --recurse-submodules — common mistake — run this once so the catalog gets populated:

git submodule update --init --recursive

You'll see "Submodule path 'lib/hack-skills': checked out '8b8e567...'". After that, ls lib/hack-skills/skills/ | wc -l should print 102.

Verify the install

Run these checks. Each tells you something specific went right or wrong:

# Check 1: did all 27 slash commands install?
ls ~/.claude/skills/ | wc -l
# Expect: 27
# If you see < 27: re-run ./install.sh, watch for "[+] skill installed" lines.

# Check 2: did all 5 subagents install?
ls ~/.claude/agents/ | wc -l
# Expect: 5
# If you see < 5: same fix — re-run ./install.sh.

# Check 3: are bin/ scripts on PATH?
which pentest-vpnip
# Expect: <repo-dir>/bin/pentest-vpnip (or wherever you cloned)
# If you see "pentest-vpnip not found": your shell didn't pick up the PATH
# change. Open a new terminal, or run:
#   export PATH="$HOME/CTFagent/bin:$PATH"

# Check 4: is the hack-skills catalog populated and consistent?
pentest-validate-hack-skills
# Expect: [+] All 99 referenced skill IDs resolve (catalog: 102 entries)
# If you see "Missing bridge index" or 0 entries: the submodule didn't
# init. Run: git submodule update --init --recursive

If all four checks pass, you're ready. If any fail, re-run ./install.sh from the repo root and re-check; the installer is idempotent (safe to run multiple times).

Your first engagement

Now you're ready to attack a target. The full beginner-friendly walkthrough — including a worked example, what each phase does, what output to expect, and how to read the results — is in STARTING.md.

The 30-second version:

# 1. Open Claude Code from the repo directory.
#    (Important: launching from inside ~/CTFagent ensures CLAUDE.md gets
#    auto-loaded, so the agent has the framework's rules and context.)
cd ~/CTFagent && claude

Inside Claude Code, first declare what kind of engagement this is so the framework knows the ROE it should enforce:

/engagement-init 10.14.1.5

This walks a short questionnaire — engagement type (bug-bounty / red-team / lab-ctf / custom), scope, authorization, allowed and forbidden techniques, cleanup obligations — and writes boxes/10.14.1.5/PROFILE.md. Every other phase skill consults this file before active steps.

For a engagement, /engagement-init will offer the lab-ctf template (which mirrors the existing shared-lab safety defaults as a structured profile). For bug bounty / red team / other lab CTFs, pick the matching template.

Then start the engagement:

/pentest-start 10.14.1.5 TargetName --hints -

Replace 10.14.1.5 with your actual target IP and TargetName with a label (anything readable — it shows up in reports). After you press Enter, Claude waits for you to paste any hints about the box (from your lab course page, HTB writeup notes, or scope document). Press Ctrl-D when done. If you have no hints, omit --hints -:

/pentest-start 10.14.1.5 TargetName

The framework then runs the full phase loop automatically:

recon → anon sweep → foothold → enum → crack → privesc → document

When done, you'll have:

boxes/10.14.1.5/EXPLOIT_CHAIN.md — the operational notes (commands you ran, with proofs)
boxes/10.14.1.5/REPORT.md — the polished, client-ready report
boxes/10.14.1.5/audit/step_NN_*.png — proof screenshots
A flag (if the target had one) noted in both files

For an end-to-end narrative including what you'll actually see at each phase, see STARTING.md § 11 — A complete worked example.

What `install.sh` actually does

git submodule update --init --recursive — populates lib/hack-skills/ (102 deep skills + 6 category routers + master)
Copies all 24 skill definitions to ~/.claude/skills/<name>/SKILL.md
Copies all 5 agent definitions to ~/.claude/agents/<name>.md
Creates ~/.claude/CLAUDE.md referencing this repo's CLAUDE.md (the master instruction set, auto-loaded every session)
Marks bin/pentest-* scripts executable

Updating

cd ~/CTFagent
git pull
./install.sh                              # re-syncs skills / agents

# Optional: bump the upstream hack-skills catalog (read the docstring
# in CLAUDE.md before doing this — it requires re-running the validation
# script to confirm no upstream rename / delete drift)
pentest-validate-hack-skills                  # baseline
git submodule update --remote lib/hack-skills
pentest-validate-hack-skills                  # confirm no drift
git add lib/hack-skills && git commit -m "bump hack-skills"

Workflow

Each engagement follows this phase sequence. Invoke skills with /skill-name inside Claude Code.

Phase	Command	What happens
Kickoff	`/pentest-start <ip> <name> --hints -`	Creates target directory, captures hints, starts timer, launches LiveAuditShell, runs recon
Recon	`/pentest-recon <ip>`	Parallel TCP/UDP nmap, service fingerprinting, vhost brute, protocol-specific probes. Produces confidence-scored attack surface
Anon sweep	`/pentest-anonsweep <ip>`	Hammers all no-auth surfaces (FTP anon, SMB guest, Redis, Mongo, Tomcat defaults, etc.)
Foothold	`/pentest-foothold <ip>`	Vuln research via subagent, web enum, exploit execution. Logs every finding and decision
Local enum	`/pentest-enum <ip>`	Pushes enumeration script to target, GTFOBins cross-reference, parallel loot parsing
Cracking	`/pentest-crack <hashfile>`	Backgrounded john with auto-detected budget. Spawns sidequest probes during dead time
Privesc	`/pentest-privesc <ip>`	Decision tree over enum findings with pivot thresholds
Document	`/pentest-document <ip>`	Honesty audit + cognitive discipline audit + formal report generation

If the foothold lands as root, the framework skips enum/crack/privesc and goes straight to documentation.

Cognitive discipline

The framework embeds behavioral patterns (backported from Cyber-AutoAgent) that prevent common pentest failure modes:

Pattern	What it prevents
Hypothesis-Test-Validate	Shotgun exploitation without reasoning
Confidence tracking (0-100%)	Persisting on dead vectors
Direct-First	Wasting hours cracking hashes that could be tried as passwords
STORE-ON-OBSERVE	Losing findings to context compaction
Pivot thresholds (3/5 failures)	Technique fixation
Checkpoints (every ~10 calls)	Missing the forest for the trees
Verification gates	Assuming success from absence of errors
False positive awareness	Escalating version disclosures as vulnerabilities

Observability

Engagement logging (`pentest-log`)

Every significant event is logged to <ip>/engagement.log with two visibility levels:

USER — printed to stdout with color-coded symbols ([+] finding, [>] decision, [↻] pivot, [✓] proof)
AGENT — silent, written to log only. Available for checkpoints, resume, and audit queries

pentest-log <ip> foothold FINDING USER "RCE confirmed: uid=33(www-data)"
pentest-log <ip> foothold PIVOT USER "SQLi exhausted → switching to file upload"
pentest-log <ip> findings          # query all findings
pentest-log <ip> stats             # action/finding/pivot counts per phase
pentest-log <ip> user-summary      # everything the user saw

Engagement timer (`pentest-timer`)

Tracks wall-clock time with phase transitions and checkpoint advice:

pentest-timer <ip> start           # at kickoff
pentest-timer <ip> phase foothold  # record phase transition
pentest-timer <ip> check           # elapsed + "are you stuck?" advice at 30/60/90/120m
pentest-timer <ip> stop            # final summary with phase timeline

Screenshot proof (`pentest-liveshell-session` + `pentest-sh`)

LiveAuditShell captures screenshots at every proof moment via ttyd + Playwright:

pentest-liveshell-session --ip <ip> &          # start at kickoff
pentest-sh --ip <ip> --phase foothold --screenshot "id"  # captured to audit/

Auto-triggers on patterns: uid=0, root, SYSTEM, key.txt, flag{.

Post-engagement audit (`pentest-audit-discipline`)

9-check automated discipline audit:

pentest-audit-discipline <ip>

Checks: technique fixation, Direct-First violations, STORE-ON-OBSERVE compliance, verification gates, completeness, screenshot proof, honesty markers.

Repository structure

For a full greppable inventory with cross-references, see INDEX.md. Quick map:

CTFagent/
├── INDEX.md                    # Canonical inventory (read first when scoping)
├── CLAUDE.md                   # Master instruction set (loaded every session)
├── install.sh                  # Installer
├── claude/
│   ├── skills/                 # 24 slash commands (claude/skills/<name>/SKILL.md)
│   │   ├── pentest-start, pentest-recon, pentest-anonsweep, pentest-foothold,
│   │   ├── pentest-enum, pentest-crack, pentest-privesc, pentest-document,
│   │   ├── pentest-reexploit, pentest-resume,
│   │   ├── hack                # Pentest knowledge router
│   │   ├── pentest-ad          # AD attack workflow
│   │   ├── pentest-pivot       # Tunneling / lateral movement
│   │   ├── pentest-api         # REST / GraphQL / OpenAPI
│   │   ├── pentest-cloud       # AWS / GCP / Azure / K8s
│   │   └── pentest-source      # .git / .svn recovery + secret scan
│   └── agents/                 # 5 subagents
│       ├── pentest-vulnresearch    # CVE / exploit research
│       ├── pentest-sidequest       # Dead-time target probes
│       └── pentest-lootparse       # Parallel loot file parser
├── bin/                        # 29 helper scripts (add to PATH)
│   ├── pentest-init / pentest-recon / pentest-webenum / pentest-webenum-all
│   ├── pentest-anonsweep / pentest-crack / pentest-revsh / pentest-serve
│   ├── pentest-exfil / pentest-runs / pentest-flags / pentest-vpnip
│   ├── pentest-log / pentest-timer / pentest-state / pentest-campaign
│   ├── pentest-build-reports / pentest-docx / pentest-proof-check
│   ├── pentest-audit-report / pentest-audit-discipline
│   ├── pentest-liveshell-session / pentest-sh
│   ├── pentest-screenshots / pentest-screenshots-live / pentest-screenshots-liveshell
│   ├── pentest-browser-shot / pentest-replay / pentest-replay-common.py
│   └── pentest-validate-hack-skills      # bridge-index drift detector
├── lib/                        # Templates, payloads, references
│   ├── cognitive_rules.md      # Cross-skill discipline rules
│   ├── exploit_chain_template.md / report_template.md / report_example.md
│   ├── lessons.md              # Cross-engagement lessons (fresh-start)
│   ├── localnum.sh             # Target-side enumeration script
│   ├── gtfobins.tsv            # SUID / sudo / cap privesc lookup
│   ├── mcp_tools.md            # Tool inventory
│   ├── encoding_helpers.py     # HFS / PowerShell / Lua / VPN-IP helpers
│   ├── vhosts.txt              # Vhost wordlist
│   ├── log4shell/              # Log4Shell kit + portable JDK 11
│   ├── hack-skills/            # SUBMODULE — yaklang/hack-skills catalog (102 entries)
│   └── hack-skills-bridge/     # engagement-context routing on top of catalog
│       ├── INDEX.md            # phase + symptom → upstream skill ID
│       ├── safety_defaults.md  # ROE flags (banned, passive-only, cleanup)
│       └── README.md
└── boxes/                      # Per-target engagement data — created on `/pentest-start`
                                # NOT committed; the repo ships target-clean.

Per-target directory layout

When you run /pentest-start, it creates this structure for each target:

10.14.1.X/
├── EXPLOIT_CHAIN.md      # Operational notes (terse, technical)
├── REPORT.md             # Deliverable report (narrative, verbatim commands)
├── STATE.md              # Cognitive state tracker (confidence, vectors, findings)
├── engagement.log        # Structured event log (pentest-log)
├── .timer                # Engagement wall-clock timer (pentest-timer)
├── hints.md              # Lab hints captured at kickoff
├── recon/
│   ├── SUMMARY.md        # Distilled recon findings (the only file Claude reads)
│   ├── anonsweep.txt     # No-auth sweep results (tag-sectioned)
│   └── raw/              # Raw nmap output (never loaded into AI context)
├── web/
│   └── <port>/SUMMARY.md # Per-port web enumeration results
├── loot/
│   ├── hashes/           # Extracted hashes by type
│   ├── cracked.txt       # John results
│   └── enum_SUMMARY.md   # Local enumeration digest
├── audit/                # LiveAuditShell output
│   ├── audit_trail.jsonl # Structured command log
│   └── *.png             # Auto-captured screenshots at proof moments
└── notes/                # Scratch space

How it works under the hood

Skills

Skills are markdown instruction files that Claude Code loads when you type /skill-name. Each skill is a complete playbook for one phase of the engagement — it knows what files to read, what tools to run, what artifacts to produce, what to log, and what comes next.

Every skill includes:

Cognitive discipline rules — hypothesis-validate loops, pivot thresholds, verification gates
Engagement logging — USER-level events shown to the user, AGENT-level events stored silently
STATE.md updates — cognitive state preserved across context compactions
Screenshot proof — mandatory capture at proof moments

Skills live at ~/.claude/skills/<name>/SKILL.md. The installer copies them from skills/.

Agents (subagents)

Three specialized subagents handle work that would bloat the main context:

Agent	Purpose	When used
`pentest-vulnresearch`	CVE lookup, exploit-db search, default creds. Returns confidence-scored findings with structured HANDOFF block	During foothold — keeps noisy research out of main context
`pentest-sidequest`	Quick target-side probes (5 commands max, read-only)	During dead time — auto-spawned by pentest-crack while john runs
`pentest-lootparse`	Parse exfiltrated files into structured output (shadow, passwd, sudoers, SSH keys, cron)	After exfil — auto-spawned by pentest-enum in parallel with grep

Agents live at ~/.claude/agents/<name>.md. The installer copies them from agents/.

Cross-skill state passing

STATE.md tracks cognitive state across context compactions:

phase: foothold
confidence: 75
active-vector: CVE-2015-3306 mod_copy
tried-vectors: default creds (FAIL: 403), SQLi on search (FAIL: no injection point)
pending-findings: DB creds from config.php (Direct-First: try on SSH)
liveshell: alive

Written by recon, foothold, enum, and privesc. Read by resume for instant state recovery.

Tool escalation ladder

lib/mcp_tools.md provides a 5-level escalation guide for when you're stuck:

Standard toolkit — nmap, dirb, nikto, wpscan (first 15 min)
Specialized tools — sqlmap, searchsploit, smbmap (after Level 1 exhausted)
Manual investigation — curl, python, source grep (Level 2 produced nothing)
Research delegation — vulnresearch subagent, cve-search MCP, Metasploit (all local analysis exhausted)
Rethink — stop escalating, re-read everything, check for forgotten ports/creds/dependencies

Token efficiency

Raw tool output (nmap, dirb, john, linpeas) is never loaded into the AI context. Instead, helper scripts produce SUMMARY.md files with just the actionable findings. This keeps the context window focused on decision-making rather than parsing.

Lessons index

lib/lessons.md is a grep-friendly cross-box index of reusable findings. Each line follows the format [box] [phase] [category] lesson. Claude reads relevant lessons before each phase to avoid repeating past mistakes. Includes a ## Cognitive Discipline section linking structural rules to the specific engagement evidence that motivated them.

Configuration

Cracking budget

pentest-crack auto-detects hash type and selects an appropriate time budget:

Hash type	Budget	Rationale
MD5, SHA1, NTLM	`deep`	Fast hashes — KoreLogic rules feasible
md5crypt, apr1	`standard`	Medium — jumbo rules OK
sha512crypt, bcrypt	`quick`	Slow — beyond best64 takes hours on CPU

Override with --budget quick|standard|deep. The Direct-First gate requires you to try the hash as a literal password before cracking.

Reverse shells

pentest-revsh generates payloads and spawns listeners:

pentest-revsh 4444 --type bash       # bash /dev/tcp reverse shell
pentest-revsh 4444 --type python     # python pty reverse shell
pentest-revsh 4444 --type php        # php reverse shell

Background task management

pentest-runs tracks all background jobs (cracks, listeners, HTTP servers):

pentest-runs list                    # show active tasks
pentest-runs view <id>               # check output
pentest-runs kill <id>               # stop a task
pentest-runs gc                      # clean up finished tasks

Customization

Adding your own lessons

Edit lib/lessons.md and add lines in the format:

[boxname] [phase] [category] One-line lesson learned

Modifying skills

Edit the skill files in skills/ and re-run ./install.sh to update the installed copies.

Adding new helper scripts

Drop scripts into bin/ and they'll be available after adding bin/ to PATH. Register them in CLAUDE.md (bin/ table) and lib/mcp_tools.md so Claude knows they exist.

Known limitations

Linux-focused — localnum.sh and most privesc logic targets Linux. Windows boxes require manual enumeration (systeminfo, whoami /priv, etc.). The framework handles this with OS-specific guidance in pentest-enum and pentest-privesc.
CPU cracking only — pentest-crack uses john on CPU. GPU acceleration (hashcat) may not be available on lab Kali instances.
engagement-specific — Rules of engagement, cleanup procedures, and the shared-lab constraints are specific to shared lab platforms. Adapting to other lab environments (HackTheBox, PG Practice) would require modifying CLAUDE.md and the cleanup sections of /pentest-document.
Requires Anthropic API access — Claude Code requires an Anthropic API key or Claude Pro/Max subscription.
LiveAuditShell optional — Screenshot capture requires ttyd and Playwright. The framework degrades gracefully to text-only proof files when unavailable.

License

This framework is provided for authorized security testing and educational purposes only. Use it only on systems you have explicit permission to test.

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
.claude		.claude
.githooks		.githooks
.github/workflows		.github/workflows
agents-yaml		agents-yaml
all-skills		all-skills
bin		bin
claude		claude
docs		docs
lib		lib
.gitignore		.gitignore
.gitmodules		.gitmodules
CLAUDE.md		CLAUDE.md
INDEX.md		INDEX.md
README.md		README.md
STARTING.md		STARTING.md
install.sh		install.sh

Folders and files

Latest commit

History

Repository files navigation

CTFagent

What it does

Setup on a new host

Prerequisites checklist

Install (3 commands)

Verify the install

Your first engagement

What install.sh actually does

Updating

Workflow

Cognitive discipline

Observability

Engagement logging (pentest-log)

Engagement timer (pentest-timer)

Screenshot proof (pentest-liveshell-session + pentest-sh)

Post-engagement audit (pentest-audit-discipline)

Repository structure

Per-target directory layout

How it works under the hood

Skills

Agents (subagents)

Cross-skill state passing

Tool escalation ladder

Token efficiency

Lessons index

Configuration

Cracking budget

Reverse shells

Background task management

Customization

Adding your own lessons

Modifying skills

Adding new helper scripts

Known limitations

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

What `install.sh` actually does

Engagement logging (`pentest-log`)

Engagement timer (`pentest-timer`)

Screenshot proof (`pentest-liveshell-session` + `pentest-sh`)

Post-engagement audit (`pentest-audit-discipline`)

Packages