Hermes Agent (Fork)

A personal fork of Nous Research's Hermes Agent — local Qwen3.5-9B vision inference on Apple Silicon, browser-based WebGPU inference, clipboard image paste for VLMs, Rust-accelerated prompt scanning, evaluation testbed, and RL research tooling.

109 commits ahead of upstream | +4,900 lines | 15+ new files | Full upstream compatibility preserved

Fork vs Upstream — What's New

Everything below is unique to this fork. The upstream repo has none of it.

Feature	Upstream	This Fork
Local inference	Cloud-only (OpenRouter, Nous)	Qwen3.5-9B on Apple Silicon GPU via MLX-VLM
Browser inference	No	WebGPU client-side via WebLLM bridge
Image input	No	Cmd+V clipboard paste with VLM vision support
Prompt security	Python regex	Rust `RegexSet` (17x faster, PyO3)
Think blocks	Raw `<think>` tags shown	Styled dim italic with `~ thinking ~` header
Terminal rendering	Hardcoded 200-char rules	Dynamic `shutil.get_terminal_size()`
Color themes	Fixed colors	Cyber (green/blue) or Synthwave (pink/purple)
Provider switching	Restart required	`/provider` hot-swap mid-session
Skill import	Manual	`/copycode` imports from Claude Code + Anthropic repo
Server lifecycle	Orphaned processes	Server tied to hermes session (atexit + SIGTERM)
Evaluation	No	Testbed with REPL, eval runner, task definitions
RL training	No	Full pipeline: batch runner → trajectory compressor → GRPO
Context compression	Basic	Fallback client chain, provider-aware token handling

1. Local Qwen3.5-9B on Apple Silicon

Run Qwen3.5-9B locally via MLX-VLM — no cloud, no API keys. Vision + text, 4-bit quantized, unified GPU memory.

hermes --provider local --model local/qwen3.5-9b

Auto-starts server on port 8800, polls up to 60s for model load
Serves OpenAI-compatible /v1/chat/completions with full vision support
Server dies when hermes exits — atexit hook + SIGTERM handler, no orphan processes
Manual start: python3 -m local_models.serve qwen

New: local_models/serve.py | Modified: hermes_cli/runtime_provider.py, agent/model_metadata.py

2. WebGPU Client-Side Inference (Browser)

Run models on the user's GPU through their browser — zero server compute, nothing leaves the machine. Uses WebLLM (MLC) for quantized model loading via WebGPU.

hermes --provider webgpu

Hermes CLI ──HTTP POST──> Bridge (port 8801) ──WebSocket──> Browser (WebGPU/WebLLM)
              /v1/chat/completions       <── inference results ──┘

The bridge (web_client/bridge.py) is a Starlette app: serves the web UI, accepts a WebSocket from the browser, exposes OpenAI-compatible /v1/chat/completions for Hermes, streams SSE responses back.

Model	VRAM	Context
Qwen3 4B	~2.5 GB	8K
Qwen2.5 3B Instruct	~1.8 GB	8K
Llama 3.1 8B Instruct	~4.5 GB	8K
Mistral 7B v0.3	~4 GB	8K
SmolLM2 1.7B	~1 GB	4K

Requires Chrome 113+ or Edge 113+. Works on macOS, Windows, Linux.

New: web_client/bridge.py, web_client/index.html | Modified: hermes_cli/runtime_provider.py, hermes_cli/auth.py

3. Clipboard Image Paste for VLMs

Paste screenshots directly into the chat. Cmd+V on macOS, Ctrl+V on Linux. The image shows as [Image #N] above the input, gets base64-encoded and sent as an OpenAI image_url content part. Works with any vision model — local Qwen3.5-9B, OpenRouter multimodal models, etc.

● what word is this?
  📎 attached clip_20260305_171559_1.png (7KB)

Why this was hard

Terminal apps can't natively receive image data from the clipboard. When you press Cmd+V:

The terminal emulator intercepts it, not your app. It reads the clipboard, extracts any text, and sends it to stdin wrapped in bracketed paste escape sequences (\e[200~...text...\e[201~).
Terminals only paste text. If the clipboard has an image with no text, the terminal sends an empty bracketed paste — your app gets data='' and has no idea an image exists.
You can't just call the clipboard API. PyObjC's NSPasteboard requires a running CFRunLoop for XPC communication with the macOS pasteboard server. But prompt_toolkit's asyncio event loop uses kqueue, not CFRunLoop. So NSPasteboard.generalPasteboard().dataForType_() silently returns nil when called in-process — no error, no exception, just nothing. This is the bug that made us think the clipboard was empty when it wasn't.

How we solved it

The clipboard extraction runs in a separate Python subprocess. A fresh process gets its own AppKit runtime with a working CFRunLoop, so NSPasteboard works correctly. The subprocess writes the image to disk, the main process picks it up.

Extraction chain (tried in order on macOS):

pngpaste — native Obj-C binary, fastest (brew install pngpaste)
PyObjC NSPasteboard in subprocess — reliable, no extra deps
osascript fallback — AppleScript clipboard info + «class PNGf»

Linux: xclip -selection clipboard -t image/png -o

The BracketedPaste handler in prompt_toolkit fires on Cmd+V (even with empty data), triggers the subprocess clipboard check, and attaches any found image. Images survive through the interrupt queue, multipart content assembly, and the full run_conversation pipeline to the API call.

Modified: cli.py, run_agent.py (multipart content support through the entire agent pipeline)

4. Rust-Accelerated Prompt Scanning

hermes_rs — PyO3 native module. Compiled Rust RegexSet replaces Python regex for injection detection. 17x faster on real context files. Falls back to pure Python if not installed.

Scans for 10 threat patterns: prompt injection, instruction override, system prompt extraction, HTML comment injection, hidden divs, translate-and-execute, curl exfiltration, secret reading, invisible unicode (U+200B/C/D, U+2060, U+FEFF, U+202A-E).

Includes truncate_content() — smart 70% head / 20% tail truncation preserving UTF-8 boundaries.

cd hermes_rs && maturin develop --release

New: hermes_rs/ (Cargo.toml, src/lib.rs, prompt_scanner.rs, token_estimate.rs) | Modified: agent/prompt_builder.py

5. CLI Improvements

Dynamic terminal width — rules and boxes use shutil.get_terminal_size(), no more overflow in VS Code
Think block styling — <think> tags render as dim italic gray with ~ thinking ~ header, toggleable with /thinkon /thinkoff
Color schemes — "cyber" (green/blue) or "synthwave" (pink/purple), chosen on first launch
/provider hot-swap — switch openrouter|local|webgpu|custom mid-session
/copycode — imports skills from ~/.claude/commands/, local SKILL.md files, and anthropics/skills repo
Large paste collapse — pastes >20 lines saved to temp file, expanded on submit
Server lifecycle — local model server is a child process, killed on exit

New: hermes_cli/color_scheme.py | Modified: cli.py, agent/display.py, hermes_cli/banner.py

6. Evaluation Testbed

python3 -m testbed.repl --query "list files in /tmp"     # single query
python3 -m testbed.eval_runner                             # run eval suite

testbed/repl.py — REPL or single-query (--query, --toolsets, --model, --unsafe)
testbed/eval_runner.py — runs tasks from tasks.yaml, scores results
testbed/harness.py — programmatic AIAgent wrapper
Default: google/gemini-2.0-flash, file toolset only. --unsafe enables all tools.

7. RL Research Pipeline

Full pipeline for reinforcement learning with tool-calling agents. See RL_RESEARCH_WITH_HERMES.md.

Dataset → Batch Runner → Raw Trajectories → Compressor → GRPO Training → Fine-tuned Model

python3 batch_runner.py --dataset prompts.jsonl --run-name my-run --workers 4
python3 trajectory_compressor.py --input data/my-run/trajectories.jsonl \
  --output data/my-run/compressed.jsonl --max-tokens 15000

Providers

Provider	Flag	How It Works
OpenRouter	`--provider openrouter`	Cloud API (default)
Nous	`--provider nous`	Nous Research portal with OAuth
OpenAI Codex	`--provider openai-codex`	Codex Responses API
Local (MLX)	`--provider local`	Qwen3.5-9B on Apple Silicon GPU
WebGPU	`--provider webgpu`	Browser-side inference via WebLLM

Switch mid-session: /provider <name> | Set permanently: hermes model

Variable	Purpose
`HERMES_INFERENCE_PROVIDER`	Override provider
`OPENROUTER_API_KEY`	OpenRouter key
`OPENAI_API_KEY` / `OPENAI_BASE_URL`	Custom endpoint
`HERMES_WEBGPU_PORT`	Bridge port (default: 8801)

Project Structure

hermes-agent/
├── agent/                  # Core agent modules
│   ├── context_compressor.py # Auto context window compression
│   ├── display.py          #   Spinner, think-block formatting
│   ├── model_metadata.py   #   Token estimation, context lengths
│   ├── prompt_builder.py   #   System prompt + injection detection
│   └── ...                 #   caching, redaction, skills, trajectory
├── hermes_cli/             # CLI application
│   ├── runtime_provider.py #   Provider resolution + auto-start
│   ├── color_scheme.py     #   Theme picker (cyber/synthwave)
│   └── ...                 #   auth, config, setup, gateway, etc.
├── hermes_rs/              # Rust prompt scanner (PyO3)
├── local_models/           # Qwen3.5-9B MLX-VLM server
├── web_client/             # WebGPU browser inference bridge
│   ├── bridge.py           #   Starlette HTTP ↔ WebSocket ↔ browser
│   └── index.html          #   WebLLM model loader UI
├── testbed/                # Evaluation harness
├── cli.py                  # Interactive REPL (prompt_toolkit)
├── run_agent.py            # AIAgent orchestrator
├── batch_runner.py         # RL trajectory generation
├── trajectory_compressor.py # Trajectory compression
├── mini-swe-agent/         # SWE-Agent submodule
└── tinker-atropos/         # RL training (Atropos/GRPO)

Quick Install

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
source ~/.bashrc && hermes setup

Local model: hermes model → select Local → Qwen3.5-9B. Or: hermes --provider local

WebGPU: hermes --provider webgpu (needs Chrome 113+)

Rust scanner: cd hermes_rs && maturin develop --release

Faster image paste: brew install pngpaste (optional, PyObjC fallback works)

Usage

hermes                              # Interactive CLI
hermes --provider local             # Local Qwen3.5-9B (Apple Silicon)
hermes --provider webgpu            # Browser WebGPU inference
hermes model                        # Switch provider/model
hermes setup / doctor / status      # Configuration & diagnostics
hermes gateway                      # Telegram, Discord, Slack, WhatsApp
hermes skills / tools / cron        # Skills, tools, scheduled jobs

Shortcut	Action
Cmd+V / Ctrl+V	Paste clipboard image
Enter	Send message
Alt+Enter / Ctrl+J	Newline (multiline)
Ctrl+C	Interrupt agent
`/provider <name>`	Switch provider
`/copycode`	Import Claude Code skills
`/thinkon` `/thinkoff`	Toggle think blocks

Documentation

Full upstream docs: hermes-agent.nousresearch.com/docs

Contributing

git clone --recurse-submodules https://github.com/m0at/hermes-agent.git
cd hermes-agent
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv .venv --python 3.11
source .venv/bin/activate
uv pip install -e ".[all,dev]"
uv pip install -e ./mini-swe-agent
python3 -m pytest tests/ -q

License

MIT — see LICENSE.

Built by Nous Research. Fork maintained by Andy.

Name		Name	Last commit message	Last commit date
Latest commit History 853 Commits
.github		.github
agent		agent
assets		assets
cron		cron
datagen-config-examples		datagen-config-examples
docs		docs
environments		environments
gateway		gateway
hermes_cli		hermes_cli
hermes_rs		hermes_rs
honcho_integration		honcho_integration
landingpage		landingpage
local_models		local_models
mini-swe-agent @ 0ae903d		mini-swe-agent @ 0ae903d
scripts		scripts
skills		skills
swarm		swarm
testbed		testbed
tests		tests
tinker-atropos @ 65f084e		tinker-atropos @ 65f084e
tools		tools
web_client		web_client
website		website
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
IMPROVEMENTS.md		IMPROVEMENTS.md
README.md		README.md
RL_RESEARCH_WITH_HERMES.md		RL_RESEARCH_WITH_HERMES.md
TODO.md		TODO.md
batch_runner.py		batch_runner.py
cli-config.yaml.example		cli-config.yaml.example
cli.py		cli.py
hermes		hermes
hermes_constants.py		hermes_constants.py
hermes_state.py		hermes_state.py
mini_swe_runner.py		mini_swe_runner.py
model_tools.py		model_tools.py
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
rl_cli.py		rl_cli.py
run_agent.py		run_agent.py
setup-hermes.sh		setup-hermes.sh
toolset_distributions.py		toolset_distributions.py
toolsets.py		toolsets.py
trajectory_compressor.py		trajectory_compressor.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hermes Agent (Fork)

Fork vs Upstream — What's New

1. Local Qwen3.5-9B on Apple Silicon

2. WebGPU Client-Side Inference (Browser)

3. Clipboard Image Paste for VLMs

Why this was hard

How we solved it

4. Rust-Accelerated Prompt Scanning

5. CLI Improvements

6. Evaluation Testbed

7. RL Research Pipeline

Providers

Project Structure

Quick Install

Usage

Documentation

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hermes Agent (Fork)

Fork vs Upstream — What's New

1. Local Qwen3.5-9B on Apple Silicon

2. WebGPU Client-Side Inference (Browser)

3. Clipboard Image Paste for VLMs

Why this was hard

How we solved it

4. Rust-Accelerated Prompt Scanning

5. CLI Improvements

6. Evaluation Testbed

7. RL Research Pipeline

Providers

Project Structure

Quick Install

Usage

Documentation

Contributing

License

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages