A personal fork of Nous Research's Hermes Agent — local Qwen3.5-9B vision inference on Apple Silicon, browser-based WebGPU inference, clipboard image paste for VLMs, Rust-accelerated prompt scanning, evaluation testbed, and RL research tooling.
109 commits ahead of upstream | +4,900 lines | 15+ new files | Full upstream compatibility preserved
Everything below is unique to this fork. The upstream repo has none of it.
| Feature | Upstream | This Fork |
|---|---|---|
| Local inference | Cloud-only (OpenRouter, Nous) | Qwen3.5-9B on Apple Silicon GPU via MLX-VLM |
| Browser inference | No | WebGPU client-side via WebLLM bridge |
| Image input | No | Cmd+V clipboard paste with VLM vision support |
| Prompt security | Python regex | Rust RegexSet (17x faster, PyO3) |
| Think blocks | Raw <think> tags shown |
Styled dim italic with ~ thinking ~ header |
| Terminal rendering | Hardcoded 200-char rules | Dynamic shutil.get_terminal_size() |
| Color themes | Fixed colors | Cyber (green/blue) or Synthwave (pink/purple) |
| Provider switching | Restart required | /provider hot-swap mid-session |
| Skill import | Manual | /copycode imports from Claude Code + Anthropic repo |
| Server lifecycle | Orphaned processes | Server tied to hermes session (atexit + SIGTERM) |
| Evaluation | No | Testbed with REPL, eval runner, task definitions |
| RL training | No | Full pipeline: batch runner → trajectory compressor → GRPO |
| Context compression | Basic | Fallback client chain, provider-aware token handling |
Run Qwen3.5-9B locally via MLX-VLM — no cloud, no API keys. Vision + text, 4-bit quantized, unified GPU memory.
hermes --provider local --model local/qwen3.5-9b- Auto-starts server on port 8800, polls up to 60s for model load
- Serves OpenAI-compatible
/v1/chat/completionswith full vision support - Server dies when hermes exits — atexit hook + SIGTERM handler, no orphan processes
- Manual start:
python3 -m local_models.serve qwen
New: local_models/serve.py | Modified: hermes_cli/runtime_provider.py, agent/model_metadata.py
Run models on the user's GPU through their browser — zero server compute, nothing leaves the machine. Uses WebLLM (MLC) for quantized model loading via WebGPU.
hermes --provider webgpuHermes CLI ──HTTP POST──> Bridge (port 8801) ──WebSocket──> Browser (WebGPU/WebLLM)
/v1/chat/completions <── inference results ──┘
The bridge (web_client/bridge.py) is a Starlette app: serves the web UI, accepts a WebSocket from the browser, exposes OpenAI-compatible /v1/chat/completions for Hermes, streams SSE responses back.
| Model | VRAM | Context |
|---|---|---|
| Qwen3 4B | ~2.5 GB | 8K |
| Qwen2.5 3B Instruct | ~1.8 GB | 8K |
| Llama 3.1 8B Instruct | ~4.5 GB | 8K |
| Mistral 7B v0.3 | ~4 GB | 8K |
| SmolLM2 1.7B | ~1 GB | 4K |
Requires Chrome 113+ or Edge 113+. Works on macOS, Windows, Linux.
New: web_client/bridge.py, web_client/index.html | Modified: hermes_cli/runtime_provider.py, hermes_cli/auth.py
Paste screenshots directly into the chat. Cmd+V on macOS, Ctrl+V on Linux. The image shows as [Image #N] above the input, gets base64-encoded and sent as an OpenAI image_url content part. Works with any vision model — local Qwen3.5-9B, OpenRouter multimodal models, etc.
● what word is this?
📎 attached clip_20260305_171559_1.png (7KB)
Terminal apps can't natively receive image data from the clipboard. When you press Cmd+V:
- The terminal emulator intercepts it, not your app. It reads the clipboard, extracts any text, and sends it to stdin wrapped in bracketed paste escape sequences (
\e[200~...text...\e[201~). - Terminals only paste text. If the clipboard has an image with no text, the terminal sends an empty bracketed paste — your app gets
data=''and has no idea an image exists. - You can't just call the clipboard API. PyObjC's
NSPasteboardrequires a running CFRunLoop for XPC communication with the macOS pasteboard server. But prompt_toolkit's asyncio event loop uses kqueue, not CFRunLoop. SoNSPasteboard.generalPasteboard().dataForType_()silently returns nil when called in-process — no error, no exception, just nothing. This is the bug that made us think the clipboard was empty when it wasn't.
The clipboard extraction runs in a separate Python subprocess. A fresh process gets its own AppKit runtime with a working CFRunLoop, so NSPasteboard works correctly. The subprocess writes the image to disk, the main process picks it up.
Extraction chain (tried in order on macOS):
pngpaste— native Obj-C binary, fastest (brew install pngpaste)- PyObjC
NSPasteboardin subprocess — reliable, no extra deps osascriptfallback — AppleScriptclipboard info+«class PNGf»
Linux: xclip -selection clipboard -t image/png -o
The BracketedPaste handler in prompt_toolkit fires on Cmd+V (even with empty data), triggers the subprocess clipboard check, and attaches any found image. Images survive through the interrupt queue, multipart content assembly, and the full run_conversation pipeline to the API call.
Modified: cli.py, run_agent.py (multipart content support through the entire agent pipeline)
hermes_rs — PyO3 native module. Compiled Rust RegexSet replaces Python regex for injection detection. 17x faster on real context files. Falls back to pure Python if not installed.
Scans for 10 threat patterns: prompt injection, instruction override, system prompt extraction, HTML comment injection, hidden divs, translate-and-execute, curl exfiltration, secret reading, invisible unicode (U+200B/C/D, U+2060, U+FEFF, U+202A-E).
Includes truncate_content() — smart 70% head / 20% tail truncation preserving UTF-8 boundaries.
cd hermes_rs && maturin develop --releaseNew: hermes_rs/ (Cargo.toml, src/lib.rs, prompt_scanner.rs, token_estimate.rs) | Modified: agent/prompt_builder.py
- Dynamic terminal width — rules and boxes use
shutil.get_terminal_size(), no more overflow in VS Code - Think block styling —
<think>tags render as dim italic gray with~ thinking ~header, toggleable with/thinkon/thinkoff - Color schemes — "cyber" (green/blue) or "synthwave" (pink/purple), chosen on first launch
/providerhot-swap — switchopenrouter|local|webgpu|custommid-session/copycode— imports skills from~/.claude/commands/, localSKILL.mdfiles, and anthropics/skills repo- Large paste collapse — pastes >20 lines saved to temp file, expanded on submit
- Server lifecycle — local model server is a child process, killed on exit
New: hermes_cli/color_scheme.py | Modified: cli.py, agent/display.py, hermes_cli/banner.py
python3 -m testbed.repl --query "list files in /tmp" # single query
python3 -m testbed.eval_runner # run eval suitetestbed/repl.py— REPL or single-query (--query,--toolsets,--model,--unsafe)testbed/eval_runner.py— runs tasks fromtasks.yaml, scores resultstestbed/harness.py— programmatic AIAgent wrapper- Default:
google/gemini-2.0-flash, file toolset only.--unsafeenables all tools.
Full pipeline for reinforcement learning with tool-calling agents. See RL_RESEARCH_WITH_HERMES.md.
Dataset → Batch Runner → Raw Trajectories → Compressor → GRPO Training → Fine-tuned Model
python3 batch_runner.py --dataset prompts.jsonl --run-name my-run --workers 4
python3 trajectory_compressor.py --input data/my-run/trajectories.jsonl \
--output data/my-run/compressed.jsonl --max-tokens 15000| Provider | Flag | How It Works |
|---|---|---|
| OpenRouter | --provider openrouter |
Cloud API (default) |
| Nous | --provider nous |
Nous Research portal with OAuth |
| OpenAI Codex | --provider openai-codex |
Codex Responses API |
| Local (MLX) | --provider local |
Qwen3.5-9B on Apple Silicon GPU |
| WebGPU | --provider webgpu |
Browser-side inference via WebLLM |
Switch mid-session: /provider <name> | Set permanently: hermes model
| Variable | Purpose |
|---|---|
HERMES_INFERENCE_PROVIDER |
Override provider |
OPENROUTER_API_KEY |
OpenRouter key |
OPENAI_API_KEY / OPENAI_BASE_URL |
Custom endpoint |
HERMES_WEBGPU_PORT |
Bridge port (default: 8801) |
hermes-agent/
├── agent/ # Core agent modules
│ ├── context_compressor.py # Auto context window compression
│ ├── display.py # Spinner, think-block formatting
│ ├── model_metadata.py # Token estimation, context lengths
│ ├── prompt_builder.py # System prompt + injection detection
│ └── ... # caching, redaction, skills, trajectory
├── hermes_cli/ # CLI application
│ ├── runtime_provider.py # Provider resolution + auto-start
│ ├── color_scheme.py # Theme picker (cyber/synthwave)
│ └── ... # auth, config, setup, gateway, etc.
├── hermes_rs/ # Rust prompt scanner (PyO3)
├── local_models/ # Qwen3.5-9B MLX-VLM server
├── web_client/ # WebGPU browser inference bridge
│ ├── bridge.py # Starlette HTTP ↔ WebSocket ↔ browser
│ └── index.html # WebLLM model loader UI
├── testbed/ # Evaluation harness
├── cli.py # Interactive REPL (prompt_toolkit)
├── run_agent.py # AIAgent orchestrator
├── batch_runner.py # RL trajectory generation
├── trajectory_compressor.py # Trajectory compression
├── mini-swe-agent/ # SWE-Agent submodule
└── tinker-atropos/ # RL training (Atropos/GRPO)
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
source ~/.bashrc && hermes setupLocal model: hermes model → select Local → Qwen3.5-9B. Or: hermes --provider local
WebGPU: hermes --provider webgpu (needs Chrome 113+)
Rust scanner: cd hermes_rs && maturin develop --release
Faster image paste: brew install pngpaste (optional, PyObjC fallback works)
hermes # Interactive CLI
hermes --provider local # Local Qwen3.5-9B (Apple Silicon)
hermes --provider webgpu # Browser WebGPU inference
hermes model # Switch provider/model
hermes setup / doctor / status # Configuration & diagnostics
hermes gateway # Telegram, Discord, Slack, WhatsApp
hermes skills / tools / cron # Skills, tools, scheduled jobs| Shortcut | Action |
|---|---|
| Cmd+V / Ctrl+V | Paste clipboard image |
| Enter | Send message |
| Alt+Enter / Ctrl+J | Newline (multiline) |
| Ctrl+C | Interrupt agent |
/provider <name> |
Switch provider |
/copycode |
Import Claude Code skills |
/thinkon /thinkoff |
Toggle think blocks |
Full upstream docs: hermes-agent.nousresearch.com/docs
git clone --recurse-submodules https://github.com/m0at/hermes-agent.git
cd hermes-agent
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv .venv --python 3.11
source .venv/bin/activate
uv pip install -e ".[all,dev]"
uv pip install -e ./mini-swe-agent
python3 -m pytest tests/ -qMIT — see LICENSE.
Built by Nous Research. Fork maintained by Andy.
