An autonomous Windows POC developer. Reads patchwatch per-CVE reports and drives a Claude agent through writing, building, deploying, running, and verifying a Proof-of-Concept against a pre-patch Windows VM under a remote kernel debugger.
pocsmith wires a handful of MCP servers and an LLM into one workflow:
- patchwatch — produces the per-CVE reports (description, ranked binaries, ghidriff output) that pocsmith consumes.
- hyperv-mcp — Hyper-V VM lifecycle: snapshots, KD configuration, PowerShell-Direct guest exec.
- kd-mcp — remote kernel debugger wrapper (breakpoints, register/memory inspection,
!analyze -v). - pyghidra-mcp — Ghidra running over the pre-patch binary with PDB symbols applied.
- pocsmith-mcp — driver tools the agent uses to compile, record attempts, declare success, and end phases. Implemented in this repo.
- Anthropic API — the agent that drives the loop. Configurable model.
It is designed to run locally against your own infrastructure: your Hyper-V host, your VMs, your ISOs. The only outbound traffic is to the LLM endpoint you configure.
pocsmith generates and runs exploit code. The system prompt restricts execution to the target Hyper-V VM via hyperv-mcp; payloads never run on the host. Treat any artifacts produced (POC sources, repro scripts) as authorized-research output and handle them accordingly. Only use against systems you control or are explicitly authorized to test.
- Windows 11 host with Hyper-V enabled (32 GB RAM recommended).
- At least one Windows ISO matching the CVE's patch KB (e.g. a 24H2 build).
| Dependency | Where to get it | Notes |
|---|---|---|
| Python 3.12+ | https://python.org | Must be on PATH. |
| Visual Studio 2022 | https://visualstudio.com | Install the Desktop Development with C++ workload. |
| Windows SDK (Debugging Tools) | https://developer.microsoft.com/windows/downloads/windows-sdk | Needed for kd.exe. |
| Docker Desktop | https://docker.com | For ghidra.mode: docker (recommended). |
| Java 21+ | https://adoptium.net | Only for ghidra.mode: local. |
| Ghidra 11.x | https://github.com/NationalSecurityAgency/ghidra | Only for ghidra.mode: local; set GHIDRA_INSTALL_DIR. |
| pyghidra-mcp | pip install pyghidra-mcp |
Only for ghidra.mode: local. |
| patchwatch | https://github.com/originsec/patchwatch | Produces the CVE reports pocsmith consumes. |
| hyperv-mcp | https://github.com/originsec/hyperv-mcp | Installed editable by setup.ps1; invoked as python -m hyperv_mcp. |
| kd-mcp | https://github.com/originsec/kd-mcp | Installed editable by setup.ps1; invoked as python -m kd_mcp. |
Set these in a .env file at the workspace root (copy .env.example to start):
ANTHROPIC_API_KEY your Anthropic API key
HYPERV_GUEST_USERNAME guest VM admin username (e.g. Administrator)
HYPERV_GUEST_PASSWORD guest VM admin password
HYPERV_GUEST_VICTIM_USERNAME optional: unprivileged account for EoP scenarios
HYPERV_GUEST_VICTIM_PASSWORD optional: password for the victim account
GHIDRA_INSTALL_DIR e.g. C:\Tools\ghidra_11.3 (only for ghidra.mode=local)
# 1. Clone and enter the project
git clone https://github.com/originsec/pocsmith.git
cd pocsmith
# 2. Run the setup script. Creates a venv, installs deps, generates a config
# template, and pulls the pyghidra-mcp Docker image.
.\scripts\setup.ps1
# 3. Activate the venv
.\.venv\Scripts\Activate.ps1
# 4. Copy and edit the env file
copy .env.example .env
notepad .env
# 5. Copy and edit the config file
copy pocsmith.example.yaml pocsmith.yaml
notepad pocsmith.yaml
# 6. Export a CVE from patchwatch
patchwatch export-poc-context CVE-2026-XXXXX --out C:\Research\pocsmith-workspaces
# 7. Run pocsmith
pocsmith run --cve CVE-2026-XXXXX --config pocsmith.yamlTo check prerequisites without installing anything:
.\scripts\check-prereqs.ps1# Start a fresh run on an exported CVE
pocsmith run --cve CVE-2026-XXXXX --config pocsmith.yaml
# Steer the agent with a hint injected into the first phase kickoff
pocsmith run --cve CVE-2026-XXXXX --config pocsmith.yaml `
--hint "The bug is in the pool allocation path; try heap spray with large IRPs first."
# Resume an interrupted run (re-uses notes.md and attempt history)
pocsmith resume --cve CVE-2026-XXXXX --config pocsmith.yaml
# List CVE workspaces under the configured workspace root
pocsmith inspect --workspace-root C:\Research\pocsmith-workspaces
# Regenerate the report.md for a workspace that already reached a success status
pocsmith report --cve CVE-2026-XXXXX --config pocsmith.yaml
# Live-tail the active session transcript in human-readable form
pocsmith tail --cve CVE-2026-XXXXX --config pocsmith.yaml --tail
# ...or point it at any session.jsonl directly:
pocsmith tail --file C:\Research\pocsmith-workspaces\CVE-2026-XXXXX\session.jsonl --thinkingOptional flags on run and resume:
| Flag | Default | Description |
|---|---|---|
--level A/B/C |
A | A = crash repro, B = controlled primitive, C = full exploit. |
--config |
— | Path to pocsmith.yaml. |
--workspace-root |
from config | Override workspace root. |
--vm-name |
from config | Hyper-V VM name. |
--hint TEXT |
(none) | Hints injected into the agent's first kickoff message. |
--model |
claude-opus-4-7 |
Anthropic model id. |
--skip-build-check |
off | Skip verifying that the VM's build matches context.json's patched_build. |
pocsmith.example.yaml is the canonical example. The fields most worth knowing about:
vm:
backend: hyperv # only supported backend
vm_root: C:\VMs\pocsmith # where Hyper-V VHDXs live
default_profile: win11-24h2 # VM used when no --vm-name given
mcp_module: hyperv_mcp # python -m hyperv_mcp (installed in venv)
kd:
module: kd_mcp # python -m kd_mcp (installed in venv)
hyperv_guest:
username_env: HYPERV_GUEST_USERNAME # env var holding the admin username
password_env: HYPERV_GUEST_PASSWORD
victim_username_env: HYPERV_GUEST_VICTIM_USERNAME # optional unprivileged account
victim_password_env: HYPERV_GUEST_VICTIM_PASSWORD
ghidra:
mode: docker # docker | local
image: ghcr.io/clearbluejar/pyghidra-mcp
port: 8000
compile:
vcvarsall: C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvarsall.bat
arch: x64
attacker_py:
venv: C:\Research\pocsmith-workspaces\attacker-venv
sysinternals_dir: C:\Research\pocsmith-workspaces\sysinternals # optional
packages: [impacket]
llm:
model: claude-opus-4-7
api_key_env: ANTHROPIC_API_KEY
context_threshold_pct: 70
ceilings:
level_a: { wall_min: 60, iterations: 40, dollars: 10.0, phases: 8 }
level_b: { wall_min: 240, iterations: 80, dollars: 50.0, phases: 16 }
level_c: { wall_min: 240, iterations: 80, dollars: 50.0, phases: 16 }
paths:
patchwatch_bin: C:\Tools\patchwatch\patchwatch.exe
workspace_root: C:\Research\pocsmith-workspacesghidra:
mode: local
pyghidra_mcp_cmd: pyghidra-mcp
ghidra_install_dir: C:\Tools\ghidra_11.3patchwatch report --> pocsmith run --> Claude agent (Agent SDK)
|
+--------------------------+--------------------------+
v v v
hyperv-mcp kd-mcp pyghidra-mcp
(VM lifecycle, (kernel debugger) (pre-patch binary
guest exec, + PDB analysis)
KD setup)
| |
+----------+---------------+
|
pocsmith-mcp
(compile_c, attacker_py,
record_attempt, end_phase,
report_outcome, cve_context)
|
pre-patch VM
(kernel-debugged)
Each run is a sequence of phases, each one a bounded Claude Agent SDK session.
The agent iterates: edit POC source, compile, deploy to VM, trigger the bug,
capture KD output, record the attempt, revert, repeat. On report_outcome,
pocsmith replays the attempt on a fresh revert to verify the signal before
promoting artifacts.
A phase is a coherent stretch of work — typically 3–6 iterations chasing one
hypothesis. A run is 3–8 phases. A phase ends when the agent calls end_phase
(voluntary, on changing hypothesis or hitting a wall) or when the driver's
input-token threshold (default 70% of the model's context window) is reached.
On phase end the full transcript is flushed to transcripts/phase-N.jsonl, the
session closes, and the next phase starts fresh — reading notes.md and a
compact summary of attempts/*/status.json instead of a transcript replay.
report_outcome is the terminal call. On a success status, pocsmith:
- Reverts the VM to a clean snapshot.
- Re-attaches kd, re-deploys the recorded POC artifact, re-runs the recorded invocation.
- Evaluates the agent-declared signal against the replay's kd output.
Signals are one of five typed kinds: bugcheck, usermode_exception,
kd_breakpoint_hit, service_crash, assertion. Anything outside this set is
recorded as unverified_claim and not promoted to an artifact.
The register_predicate DSL on kd_breakpoint_hit signals supports register
reads, dereferences with displacement, integer comparisons, and AND/OR.
| Level | Wall-clock | Iterations | Dollars | Phases |
|---|---|---|---|---|
| A | 60 min | 40 | $10 | 8 |
| B | 4 h | 80 | $50 | 16 |
| C | 4 h | 80 | $50 | 16 |
At 75% of any ceiling pocsmith injects a one-line reminder before the next
iteration. At 100% it forces the agent to call report_outcome and refuses
further tool calls.
paths.workspace_root is the root for all pocsmith runtime data: the shared
attacker venv, the Sysinternals tools cache, and one isolated subdirectory per
CVE.
<workspace-root>\
attacker-venv\ shared Python venv with impacket etc. (setup.ps1)
sysinternals\ Sysinternals Suite, host-side stage (setup.ps1)
CVE-XXXX-NNNNN\
context.json static CVE context from patchwatch
pre-patch\ pre-patch binaries (hardlinked from patchwatch cache)
post-patch\ post-patch binaries
ghidriff\ ghidriff diff outputs
symbols\ _NT_SYMBOL_PATH cache
ghidra-project\ pyghidra .gpr (cached by pre-patch SHA)
poc\ agent's POC sources and builds
notes.md agent exobrain - survives phase boundaries
attempts\NNN\ per-iteration: status.json, kd.log, target.log
transcripts\phase-N.jsonl full session transcript per phase
artifacts\ written on verified success:
poc\ the verified POC
repro.md reproduction steps
verification.json signal match record
summary.md run summary
report.md LLM-written narrative report
.mcp.json auto-generated MCP server config
pocsmith-run.lock prevents concurrent runs on this workspace
The agent receives POCSMITH_SYSINTERNALS as an env var on the pocsmith MCP
when attacker_py.sysinternals_dir is set, and is instructed to deploy those
binaries into the guest via hyperv_guest_put rather than executing them on
the host.
- Phase-scoped sessions: each phase is a fresh Agent SDK session. Persistent state lives in
notes.md(agent-curated) andattempts/*/status.json(driver-written). Transcripts are flushed to disk but not replayed. - Subagents for expensive reads: the system prompt directs the agent to route large-token reads (full decompilations, kd dumps, ghidriff JSON) through
Tasksubagents that return short structured summaries — the single biggest token-cost lever for Ghidra-heavy CVEs. - VM-only exploit execution: the system prompt forbids running exploit code on the host. attacker_py is for network-side tooling (e.g. impacket) targeting the VM, not host-side exploitation.
- Idempotent resume:
pocsmith resumere-uses the existing workspace,notes.md, and attempt history. It starts a new phase, not a full transcript replay. - Driver-managed MCP supervision:
.mcp.jsonis generated per workspace; kd-mcp and pyghidra-mcp are crash-restarted; hyperv-mcp failures abort the run.
See docs/design.md for the complete design spec, including signal-predicate types, context-window management, and MCP server contracts.
# Unit tests (no VM, no Anthropic, no Ghidra)
pytest
# Live smoke tests are gated by RUN_LIVE=1
$env:RUN_LIVE = "1"; pytest -k smokeIssues and PRs welcome. This is a research tool, not a product — expect rough edges and breaking changes between versions.
Apache 2.0 — see LICENSE and NOTICE
Built by Origin for security research and red team operations.