Ares is a penetration testing profile for Hermes Agent. It turns a bare server or local workstation into a fully autonomous pentest platform — operated via a web UI, Discord, or both simultaneously.
You send a target URL and credentials. The agent runs a full OWASP WSTG assessment, validates every finding with a working proof of concept, and delivers a Markdown/HTML report with PoC scripts. No manual steps during the engagement.
- Full OWASP WSTG coverage — 13 testing phases, 80+ test cases across recon, auth, authz, injection, business logic, client-side, API, and mobile
- Validated findings only — every finding requires a working PoC before it lands in the report. No theoretical vulnerabilities.
- Per-engagement isolation — each run creates
/pentest-output/{target}_{timestamp}/so parallel engagements never conflict - Attack chains — correlates individual findings into exploitable end-to-end scenarios with CVSS 4.0 scoring
- Detection engineering — Sigma rules and MITRE ATT&CK Navigator layers for every validated finding
- Mobile testing — static analysis (MoBSF), dynamic instrumentation (Frida SSL unpinning, crypto hooks), device control (ADB)
- Burp Pro integration — optional bridge that exposes all 28 Burp MCP tools (proxy history, repeater, scanner, intruder, collaborator) to the agent via
burp-start.sh - White-box support — clone target repos into
~/ares-workspace/; the agent reads source at/workspace/inside all containers - Memory across engagements — Hindsight extracts findings and patterns after each session, building institutional knowledge over time
graph TB
subgraph UI["User Interfaces"]
WEBUI["Open WebUI<br/>http://localhost:3000"]
DISCORD["Discord<br/>Forum Thread"]
end
subgraph HERMES["ares-hermes container"]
GW["Hermes Gateway<br/>HTTP API :8643 · Dashboard :9119"]
subgraph MODELS["Model Routing"]
M1["Sonnet 4.6<br/>orchestrator"]
M2["Haiku 4.5<br/>trivial turns"]
M3["Opus 4.6<br/>deep analysis"]
end
subgraph MCPS["MCP Servers"]
MCP1["playwright<br/>headless Chromium"]
MCP2["pentest-ai<br/>27 scan/exploit tools"]
MCP3["gitnexus<br/>GitHub source analysis"]
MCP4["mobsf MCP"]
MCP5["adb MCP"]
MCP6["frida MCP"]
MCP7["burp MCP<br/>(optional)"]
end
end
subgraph BURP["Burp Pro (macOS host, optional)"]
BURP_EXT["MCP extension<br/>SSE :9876"]
BURP_PROXY["Proxy listener<br/>:8091"]
BRIDGE["burp-proxy.py<br/>HTTP bridge :9877"]
end
subgraph TOOLS["ares-tools container (spawned per session)"]
T1["nmap · nuclei · nikto"]
T2["sqlmap · dalfox · ffuf"]
T3["subfinder · httpx · gitleaks"]
T4["testssl · sslyze · jwt_tool"]
end
subgraph SERVICES["Always-on Services"]
MOBSF["MoBSF :8100<br/>mobile static analysis"]
ZAP["OWASP ZAP<br/>spawned per engagement"]
end
subgraph HOST["macOS / Linux Host"]
PO["~/ares-pentest-output<br/>→ /pentest-output"]
WS["~/ares-workspace<br/>→ /workspace"]
end
WEBUI -->|"OpenAI-compat API"| GW
DISCORD -->|"bot gateway"| GW
GW --> MODELS
M1 -->|"MCP tool calls"| MCPS
M1 -->|"terminal()"| TOOLS
MCP4 -->|"REST"| MOBSF
M1 -->|"REST API via $ZAP_URL"| ZAP
MCP7 -->|"Streamable HTTP"| BRIDGE
BRIDGE -->|"SSE POST /?sessionId=..."| BURP_EXT
HERMES -.->|"bind mount"| PO
HERMES -.->|"bind mount"| WS
TOOLS -.->|"bind mount"| PO
TOOLS -.->|"bind mount"| WS
Model routing is tiered by cost and reasoning requirement:
| Model | Role | Triggers |
|---|---|---|
| Sonnet 4.6 | Default orchestrator | All phases — coordination, tool execution, parsing, report writing |
| Opus 4.6 | Deep analysis | Exploit confirmation, attack chain building, CVSS scoring |
| Haiku 4.5 | Trivial turns | Auto-routed for JSON extraction, format conversions, simple lookups |
This routing cuts cost ~43% vs full Opus without quality loss on tool execution or report writing.
sequenceDiagram
actor User
participant H as Hermes (Sonnet 4.6)
participant T as ares-tools terminal
participant Z as ZAP container
participant O as Opus 4.6
participant P as /pentest-output
User->>H: target + credentials + scope
H->>Z: Phase 0 — spawn per-engagement ZAP
H->>T: Phase 1 — nmap, nuclei, subfinder, ffuf
T-->>H: open ports, tech stack, endpoints
H->>Z: spider + passive scan
Z-->>H: alerts, security headers, cookies
H->>T: Phases 3-5 — auth, injection, business logic
loop per finding
T-->>H: raw response
H->>H: generate minimal PoC
H->>T: execute PoC
T-->>H: pass / fail
H->>O: confirm finding, score CVSS 4.0
O-->>H: validated finding + severity
H->>P: write evidence + PoC script
end
H->>T: Phase 7 — assemble report
T->>P: final-report.html + .tar.gz
H-->>User: report delivered
H->>Z: tear down ZAP container
All three paths — ares-hermes, ares-tools terminal containers, and the host — share the same two bind-mounted directories. No docker cp needed; files appear instantly on both sides.
graph LR
subgraph HOST["macOS / Linux Host"]
HO["~/ares-pentest-output/\n(reports, PoCs, evidence)"]
HW["~/ares-workspace/\n(APKs, repos, inputs)"]
end
subgraph HERMES["ares-hermes"]
HP["/pentest-output/"]
HWC["/workspace/"]
SOCK["/var/run/docker.sock"]
end
subgraph TOOLS["ares-tools (per session)"]
TP["/pentest-output/"]
TWC["/workspace/"]
end
HO <-->|"bind mount"| HP
HW <-->|"bind mount"| HWC
HO <-->|"bind mount"| TP
HW <-->|"bind mount"| TWC
SOCK -->|"Docker API\n(spawns ares-tools)"| TOOLS
| Directory | Purpose |
|---|---|
~/ares-pentest-output/ |
Output — reports, PoC scripts, screenshots, evidence tarballs. Everything the agent writes during an engagement. |
~/ares-workspace/ |
Input / scratch — drop APKs, clone repos, or place any files the agent needs to read. Also used for agent-to-host handoffs of intermediate files. |
| Component | Purpose | Port |
|---|---|---|
| Open WebUI | Web UI — multi-engagement chat, model selector | 3000 |
| Hermes Agent v0.9.0+ | Orchestrator, HTTP API, Discord gateway, skill engine | 9119 (dashboard), 8643 (API) |
| MoBSF | Mobile static analysis (APK/IPA/APPX) | 8100 |
| OWASP ZAP | Web scanner, spider, passive scanner | dynamic (per engagement) |
| pentest-ai | MCP server — 27 scan/exploit tools | stdio |
| Playwright MCP | Headless Chromium — DOM testing, auth crawl | stdio |
| MCP: mobsf / adb / frida | Mobile testing MCP servers (vendored in mcp/) |
stdio |
ares-tools image |
Terminal container — all security tools pre-installed | spawned per session |
Burp Pro + burp-proxy.py |
Optional — 28 MCP tools: proxy history, repeater, scanner, intruder, collaborator | host :9876 (Burp), :9877 (bridge) |
Minimum hardware: 4 cores, 8GB RAM, 50GB disk, macOS (Docker Desktop) or Linux
Recommended: 8+ cores, 16GB+ RAM, 200GB SSD (for ZAP + MoBSF + concurrent sessions)
skills/pentest-orchestrate/ is the master orchestrator. It runs all 13 phases and delegates to sub-skills:
| Sub-skill | Coverage |
|---|---|
pentest-race-condition |
TOCTOU, concurrent request attacks |
pentest-websocket |
Auth bypass, CSWSH, frame injection, subscriptions |
pentest-graphql |
Introspection, depth bombs, batching bypass, IDOR via resolvers |
pentest-supply-chain |
Exposed manifests, .git exposure, CI/CD files, SRI gaps |
pentest-cloud-ssrf |
IMDSv1/v2, GCP metadata, Azure IMDS, Kubernetes SA token escalation |
pentest-llm-platform-attacks |
OWASP LLM Top 10, prompt injection, cross-tenant data leakage |
pentest-creative-edge-cases |
Novel attack surface — polyglots, race+IDOR combos, parser differentials |
pentest-detection-engineering |
Sigma rules per finding, MITRE ATT&CK mapping |
pentest-framework-enrichment |
ATT&CK techniques, D3FEND countermeasures, NIST CSF subcategories |
pentest-attack-navigator |
MITRE ATT&CK Navigator JSON layer export |
pentest-attack-flow |
Attack flow diagram generation |
pentest-d3fend-advisor |
Defensive roadmap per finding |
pentest-report-merge-and-regenerate |
Consolidated Markdown + HTML report assembly |
| Phase | Coverage |
|---|---|
| 0 | Scope lock, per-engagement ZAP container spawn, output dir setup |
| 1 | Recon — nmap, nuclei, subfinder, ZAP spider + AJAX spider, authenticated Playwright crawl, ffuf content discovery |
| 2 | Passive analysis — ZAP alerts, security headers, JS source analysis, cookie flags, client-side storage |
| 3 | Auth & authz — IDOR across all ID parameters, broken function-level auth, mass assignment, JWT attacks, CSRF |
| 4 | Injection — SQLi (sqlmap), XSS (dalfox), DOM XSS (Playwright), SSRF, SSTI, command injection, XXE, file upload abuse |
| 5 | Business logic — race conditions, rate limiting, workflow bypass, WebSocket auth, CORS |
| 6 | Validation + chain building — Opus confirms every finding, builds attack chains, scores CVSS 4.0 |
| 7 | Report — executive summary, per-finding detail with PoC + evidence, Sigma rules, ATT&CK layer, delivery |
| 8 | Error handling — stack traces, framework debug pages |
| 9 | Cryptography — TLS config (testssl.sh), weak hashing, padding oracle |
| 10 | Client-side — DOM XSS sinks, postMessage, XSSI, clickjacking |
| 11 | API — REST rate limiting, BOLA, GraphQL introspection, WebSocket |
| 12 | Business logic deep dive — multi-step workflow bypass, negative quantities, TOCTOU |
| 13 | Mobile — MoBSF static analysis, Frida dynamic (SSL unpinning, crypto hooks), ADB device testing |
Every engagement writes to an isolated directory bind-mounted to the host:
~/ares-pentest-output/
{target-slug}_{YYYYMMDD_HHMMSS}/
final-report.html — full technical report, browser-ready
evidence/
F-01_sqli_response.txt — raw request/response for each finding
phase-2-summary.md — phase summaries (compaction resilience)
screenshots/
F-05_xss.png — browser evidence for client-side findings
pocs/
F-01_sqli.sh — standalone PoC script per finding
{target-slug}_{timestamp}-FINAL.tar.gz
The report and tarball are delivered as Discord attachments (if Discord is enabled) or accessible via the Open WebUI session.
git clone https://github.com/your-org/ares.git
cd ares/docker
./setup.shsetup.sh prompts for your Anthropic token, optionally Discord, builds images, starts the stack, and verifies everything. First run takes ~5 minutes.
Open http://localhost:3000 → select a model → start a new chat → send your engagement brief.
Prerequisites: Docker Engine + Docker Compose v2, your user in the docker group. No root required.
Answer y to the Discord prompt during setup.sh, or add to .env after setup:
# docker/.env
DISCORD_BOT_TOKEN=your-bot-token
DISCORD_ALLOWED_USERS=your-discord-user-id
DISCORD_FREE_RESPONSE_CHANNELS=your-forum-channel-id
docker compose --project-name ares restart hermesEach thread in your pentest forum channel becomes an isolated engagement session.
Via Open WebUI: Open http://localhost:3000 → new chat → paste your brief:
Full web app assessment on https://staging.target.com
Scope: staging.target.com only
Auth: admin@target.com / password123
Test accounts: user1@target.com / pass1, user2@target.com / pass2
Destructive: yes
Go.
Via Discord: Create a thread in your pentest forum channel and send the same brief.
Drop anything into ~/ares-workspace/ on the host — it's immediately visible at /workspace/ inside hermes and every terminal container it spawns, with no restart required.
# White-box pentest — clone repo locally
cd ~/ares-workspace && git clone https://github.com/your-org/target-app
# → tell Hermes: "review /workspace/target-app/ for auth issues"
# Mobile testing — drop APK
cp MyApp.apk ~/ares-workspace/
# → tell Hermes: "analyze /workspace/MyApp.apk with MoBSF"graph TB
subgraph DEVICE["Android Device / Emulator"]
FS["frida-server\n0.0.0.0:27042"]
ADBD["adbd"]
end
subgraph MACOS["macOS Host"]
ADBS["ADB server\n127.0.0.1:5037"]
FWD["adb forward\ntcp:27042 → tcp:27042"]
SC1["socat\n192.168.64.1:5038 → :5037"]
SC2["socat\n192.168.64.1:27042 → :27042"]
end
subgraph CONTAINER["ares-hermes container"]
ADBC["ADB client\n-H 192.168.64.1 -P 5038\n-s emulator-5554"]
FRIDAC["Frida Python\nTCP 192.168.64.1:27042"]
MADB["adb MCP"]
MFRIDA["frida MCP"]
end
ADBD <-->|USB/network| ADBS
ADBS --> FWD
FWD <-->|tcp| FS
ADBS <--> SC1
FWD --> SC2
SC1 <-->|Docker Desktop bridge\n192.168.64.1| ADBC
SC2 <-->|Docker Desktop bridge\n192.168.64.1| FRIDAC
ADBC --> MADB
FRIDAC --> MFRIDA
The ADB and Frida MCP servers run inside ares-hermes and reach the device via socat bridges on the Docker Desktop VM interface (192.168.64.1). No USB passthrough into Docker required.
Setup (one-time per machine):
# Install prerequisites
brew install socat android-platform-tools
# Install Android Studio → SDK Manager → Android 14 (API 34) + Emulator
# Configure Android + start stack
cd ares/docker && ./setup.sh --androidStart bridges each session:
cd ares/docker && bash mobile-start.sh # emulator
cd ares/docker && bash mobile-start.sh --usb # USB phonemobile-start.sh starts the emulator (if needed), pushes frida-server, starts socat bridges, and verifies connectivity from inside hermes. Keep the terminal open — bridges stop when it closes.
Device scenarios:
| Scenario | ADB_SERIAL |
Notes |
|---|---|---|
| macOS Android Studio AVD | emulator-5554 |
Serial the ADB server assigns to the emulator |
| Linux Docker emulator | localhost:5555 |
Docker emulator exposes ADB on port 5555 |
| USB phone (host) | serial from adb devices (e.g. R58N12345) |
|
| Wi-Fi / Tailscale phone | <device-ip>:5555 |
macOS note:
host.docker.internalresolves to the Docker VM bridge, not the Mac.setup.sh --androiddetects the correct IP (192.168.64.1) automatically.
Linux (Docker emulator, requires /dev/kvm):
docker compose --project-name ares --profile android up -d
# View emulator screen at http://localhost:6080Ares can use Burp Pro's official MCP extension to give the agent direct access to proxy history, Repeater, Intruder, the active scanner, and Collaborator. This is optional — the stack works without Burp.
Transport bridge: Burp's MCP extension uses an SSE-based transport, while Hermes uses Streamable HTTP. docker/burp-proxy.py is a lightweight stdlib-only Python bridge between the two — no extra dependencies required.
Prerequisites:
- Burp Pro running with the MCP extension (BApp Store) on
127.0.0.1:9876 - A Burp proxy listener on
192.168.64.1:8091(for traffic interception — configure inProxy > Listener)
Start the integration:
cd ares/docker
./burp-start.shburp-start.sh does:
- Verifies Burp MCP is reachable on
127.0.0.1:9876 - Optionally checks the proxy listener on
192.168.64.1:8091 - Starts
burp-proxy.py(SSE↔HTTP bridge on192.168.64.1:9877) - Restarts
ares-hermesso it picks up the Burp MCP server
On the next agent session, 28 Burp tools are registered automatically:
| Tool group | Tools |
|---|---|
| HTTP sending | send_http1_request, send_http2_request |
| Traffic review | get_proxy_http_history, get_proxy_http_history_regex, get_proxy_websocket_history |
| Active tools | create_repeater_tab, send_to_intruder |
| Scanner | get_scanner_issues |
| Collaborator | generate_collaborator_payload, get_collaborator_interactions |
| Config | output_project_options, set_project_options, set_proxy_intercept_state |
| Encoding | url_encode, url_decode, base64_encode, base64_decode |
Stop:
./burp-start.sh --stopWhy a custom bridge?
Hermes uses Streamable HTTP (POST /) for MCP connections. Burp's MCP extension uses the older SSE transport: GET / returns a session endpoint, then POST /?sessionId=xxx carries JSON-RPC, and responses arrive back over the SSE stream. The two protocols are incompatible. burp-proxy.py maintains a persistent SSE connection to Burp, correlates responses by JSON-RPC id, and presents a plain HTTP interface to Hermes.
For production deployments — full Hindsight memory, physical device testing, no Docker Desktop overhead.
Open Claude Code and run:
Deploy the Ares pentest stack following CLAUDE.md at https://github.com/your-org/ares
Target machine: USER@HOST (password: PASSWORD)
Claude Code reads CLAUDE.md and executes the full install remotely via SSH — tools, containers, Hermes profile, Discord gateway — stopping only at human action points (OAuth login, API keys, Discord bot setup).
All tools are pre-installed in the terminal container that Hermes spawns per session:
| Category | Tools |
|---|---|
| Web scanning | nmap, nuclei, nikto, whatweb, wafw00f |
| Fuzzing | ffuf, sqlmap |
| XSS / injection | dalfox, commix |
| TLS/SSL | sslyze, testssl.sh |
| Recon | subfinder, httpx |
| Secrets | gitleaks, jwt_tool |
| Mobile | adb, frida (via MCP) |
| Wordlists | SecLists common.txt, raft-medium-dirs.txt, api-endpoints.txt |
Why Open WebUI?
Clean multi-conversation UI without requiring a Discord bot. Connects to Hermes's OpenAI-compatible API endpoint (/v1), supports multiple simultaneous sessions, and has a model selector for switching between profiles. No forks, no custom frontend code.
Why Hermes? Multi-model orchestrator with persistent sessions, skill routing, HTTP API server, and MCP server management. A 100+ turn pentest session exhausts a single Claude conversation context — Hermes handles compression, model routing, and resumption automatically.
Why not ZAP MCP?
ZAP MCP addon v0.0.1 alpha hardcodes 127.0.0.1 as its bind address and enforces HTTPS. Both -config flags to override these are ignored at runtime. Ares uses ZAP via its REST API directly instead.
Why per-engagement ZAP containers? A shared ZAP instance accumulates state across engagements — old scan trees, stale alerts, mixed session data. Per-engagement containers give each run a clean ZAP state, then are torn down after report delivery.
Why delegate CVSS to Opus? In controlled testing, Sonnet consistently downgraded severity by one tier on auth bypass, mass assignment, and JWT storage findings compared to Opus. CVSS scoring is delegated to Opus with explicit calibration guidance to prevent systematic under-rating.
Why a custom bridge for Burp MCP instead of using Burp's SSE endpoint directly?
Hermes's MCP client (streamable_http_client from mcp 1.27+) speaks Streamable HTTP only — it POSTs JSON-RPC to / and expects a JSON response body. Burp's MCP extension speaks the legacy SSE transport — GET / establishes a stream that delivers a sessionId, then POST /?sessionId=xxx carries requests, and responses come back over the SSE channel asynchronously. The two are wire-incompatible. burp-proxy.py handles the SSE channel management and per-request Queue-based correlation entirely in ~150 lines of stdlib Python.
Why socat on macOS instead of adb -a?
On macOS, Android Studio's ADB daemon respawns immediately and re-locks itself to 127.0.0.1. The -a flag is silently ignored. socat bridges the Docker Desktop VM network to the Mac's localhost ADB, which works regardless of which process owns the ADB server.
docker/config.yaml is baked into ares-hermes at build time. Key settings:
model:
default: claude-sonnet-4-6
provider: anthropic
smart_model_routing:
enabled: true
cheap_model:
model: claude-haiku-4-5-20251001
compression:
threshold: 0.50 # compress when context reaches 50% of window
protect_last_n: 30 # keep last 30 turns verbatim
terminal:
backend: docker
docker_image: ares-tools:latest
container_persistent: true
docker_volumes:
- "${PENTEST_OUTPUT}:/pentest-output"
- "${WORKSPACE_DIR}:/workspace"
- "/var/run/docker.sock:/var/run/docker.sock"Important:
${VAR:-default}bash fallback syntax inconfig.yamlis NOT expanded by Hermes — it's passed as a literal string. Use plain${VAR}and set all defaults in.env.
- Authorization required. The profile will not proceed without explicit scope confirmation.
- No WAF bypass. Ares uses standard tool flags. Evasion against enterprise WAFs requires manual tuning.
- Android emulator (
--profile android) requires/dev/kvm— bare metal only. - GitNexus is PolyForm Noncommercial — personal/internal use only.
- Open WebUI cannot directly accept binary file uploads (APKs, binaries) — use
~/ares-workspace/for files Hermes needs to access.
MIT. Component licenses: pentest-ai (MIT), GitNexus (PolyForm Noncommercial), Hermes Agent (Apache 2.0), ZAP (Apache 2.0), MoBSF (GPL-3.0), Frida (wxWindows Library License), Open WebUI (MIT).