Home

Hermes Agent - Custom Dangerous Patterns plugin

⚠️ USE AT YOUR OWN RISK

This plugin is in early development and has not been rigorously tested across all Hermes environments, versions, or edge cases. There is no guarantee that all destructive commands will be caught or blocked. Pattern matching is best-effort — creative command obfuscation, shell expansions, piped commands, or edge cases in the approval flow may bypass detection. Do not rely on this plugin as your sole safety net for critical operations.

The Problem

Hermes Agent ships with ~47 hardcoded dangerous command patterns (rm -rf, git reset --hard, docker stop, etc.). These cover common Unix commands but leave a large surface area unprotected:

Cloud CLI tools (vultr, gcloud, aws, az, oci, doctl) — each with their own destructive subcommands
IaC tools (terraform destroy, pulumi up, cdk deploy)
Database operations (DROP TABLE, mongodump --drop)
CI/CD commands (gh run delete, circleci purge)
Domain-specific tools unique to a user's workflow

Without this plugin, a user who wants to guard vultr instance delete or terraform destroy -auto-approve has no clean mechanism — they either rely on the agent's judgment (risky) or manually approve every command (tedious). The plugin gives users first-class access to Hermes's approval flow (once/session/always/deny, gateway /approve//deny, session persistence, permanent allowlist) for their own patterns.

Where It Fits in the Ecosystem

Command execution safety in AI coding agents has become a defining architectural concern. The landscape breaks into three philosophical camps:

A. Human-in-the-Loop (Approval Gates)

Tool	Approach
Claude Code	Strict read-only by default. Every destructive action requires explicit user approval. Fail-closed architecture.
Aider	No automatic command execution. Every shell command and code modification requires human confirmation. Git-based audit trail.
Cursor	Split-mode UI: "Interactive" (approval required) vs. "Auto-run" (YOLO). Optional `bwrap` sandboxing.

This camp treats the agent as fundamentally untrusted. The user is the ultimate gatekeeper. Hermes (with this plugin) belongs here — the approval prompt is the primary safety mechanism.

B. Sandboxing-First (Environment Isolation)

Tool	Approach
Open Interpreter	Explicit "sandboxing-first" documentation. Recommends Docker/container isolation as the primary safety layer.
OpenClaw	Three sandboxing modes (off / non-main / all). Container-based isolation with read-only mounts. Per-agent, per-session scoping.
E2B / CodeGate	Specialized secure runtimes (WebAssembly, ephemeral containers) as middleware between agent and OS.

This camp says: don't bother blocking individual commands — isolate the agent entirely so it can't damage anything important.

C. Policy-as-Code (Centralized Enforcement)

Tool	Approach
OpenClaw + Cedar	Policy engines evaluate tool requests based on context (who, what action, what resource). Allowlists over denylists.
CodeGate	Middleware proxy that inspects prompts and responses, enforcing security policies at the API boundary.
Enterprise guardrail services	Centralized policy enforcement with immutable audit trails.

This camp treats safety as an architectural property of the orchestration layer, not a feature of the agent itself. The agent never has direct filesystem access to safety configuration.

Where Hermes + This Plugin Fit

Hermes, as an open-source local agent, sits in Camp A with a light touch of Camp B (container backends skip approval checks). This plugin extends Camp A by making the approval gate user-configurable.

Key observation: The plugin operates in a local-agent trust model where the agent has the same filesystem access as the user. This is fundamentally different from Claude Code (which runs in Anthropic's cloud with an API boundary) or OpenClaw (which enforces policies at an orchestration layer outside the agent process). The plugin cannot change this — it can only work within it.

How Other Solutions Handle Config Tampering

Solution	How it handles config tampering
OpenClaw + Cedar	Policy engine runs outside the agent process. Agent can't modify policy files — they're owned by the orchestration layer.
Claude Code	Runs in Anthropic's cloud. The agent has no access to its own safety configuration.
Aider + CodeGate	CodeGate intercepts API traffic. The agent never touches the security proxy's config.
Cursor	Desktop app with separate security process. Config files are outside the agent's sandbox.
Hermes + this plugin	Agent and safety config share the same filesystem. The plugin can only detect tampering, not prevent it.

The key insight: Every other major solution places the safety configuration outside the agent's trust boundary. Hermes, as a local-first agent, cannot do this without OS-level user separation or container isolation — both of which the plugin documents but cannot enforce.

What the Plugin Can and Can't Do

What It Does Well

First-class approval integration. Custom patterns get the exact same treatment as built-in ones — same prompts, session persistence, permanent allowlist, gateway /approve//deny, smart mode assessment.
Deny patterns. Block commands immediately without an approval prompt — checked after allow, before block patterns.
Config integrity tracking. SHA-256 hash of config persisted across sessions; changes log a WARNING with old/new pattern counts.
Protected pattern tier. Patterns with protected: true have their regex hashes tracked; removal or modification logs a CRITICAL warning at startup.
Directory config loading. Config path can be a directory of *.yaml files, merged alphabetically.
Allow shadowing detection. Warns when an allow pattern could bypass a built-in dangerous pattern.
Allow patterns. Exempt read-only commands (e.g., vultr instance list) while still gating destructive ones. Allow wins over block.
Graceful degradation. Bad config never crashes the agent — warnings are logged, invalid patterns are skipped.
Hardline backstop. Catastrophic commands (rm -rf /, mkfs, dd to raw device) are checked before anything else and cannot be bypassed by any allow pattern, including .*.
CLI management. Full hermes custom-dangerous-patterns CLI for listing, testing, enabling/disabling, adding/removing, and validating patterns without editing YAML directly.

Honest Limitations

The agent can modify the config. If it has write access to ~/.hermes/custom-dangerous-patterns.yaml, it can add allow_patterns: [{pattern: '.*'}] and exempt itself from all custom and built-in patterns (except hardline). This is a fundamental property of the local-agent trust model — not a bug.
Regex has blind spots. Obfuscated commands (vultr$(echo) instance$(echo) delete), multi-step attacks, and file-write tool calls bypass command-line pattern matching. No regex-based solution in any ecosystem fully solves this.
No runtime reconfiguration. Config is loaded once at startup. Mid-session edits require a restart. The CLI (hermes custom-dangerous-patterns ...) supports editing configs but a restart is still needed for changes to take effect.
Import order dependency. Patterns must be injected before tools.approval is imported. Hermes's normal load order handles this, but it's not verified at runtime.

See Security-&-Risks for the full self-modification analysis and optional hardening steps.

Complementary Solutions

This plugin is complementary to, not competitive with:

Solution	How it complements
Hermes container backends	Container isolation + pattern-based approval = defense in depth. Container catches what patterns miss; patterns gate what containers allow.
Tirith security scanning	Patterns gate known commands; Tirith catches novel threats in command content.
Gateway approval `/approve` `/deny`	Plugin defines what triggers approval; gateway handles how approval is delivered across channels.
`command_allowlist` in config.yaml	Plugin's patterns trigger the prompt; user's "always" choice persists to allowlist.

What Other Ecosystems Teach Us

From OpenClaw: Allowlist-first mentality — "nothing allowed unless explicitly permitted" is more secure but less practical for open-ended coding tasks. The plugin could support a "default-deny" mode as a future feature. OpenClaw's Cedar integration also treats policies as version-controlled, testable artifacts — the plugin's YAML config is a light version of this.

From Claude Code: Transparency by default — showing what the agent is about to do in natural language before asking for approval. The plugin's descriptions serve this purpose but are static. Permission scoping (restricting writes to the working directory) is another idea Hermes could adopt with OS-level changes.

From Aider + CodeGate: Middleware as a security layer. CodeGate sits between Aider and the LLM API, inspecting traffic. This is architecturally cleaner than monkey-patching internal functions. If Hermes ever adds a pre-execution hook that can veto commands, plugins could use it instead.

Wiki Pages

Installation — Install, update, and enable the plugin
CLI-Reference — hermes custom-dangerous-patterns command reference
Configuration — YAML config format, patterns, allow patterns, evaluation order
Architecture — How the plugin works internally, plugin structure, design decisions
Roadmap — Planned features and enhancements
Security-&-Risks — Agent self-modification, evaluation order, hardline backstop

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Hermes Agent - Custom Dangerous Patterns plugin

The Problem

Where It Fits in the Ecosystem

A. Human-in-the-Loop (Approval Gates)

B. Sandboxing-First (Environment Isolation)

C. Policy-as-Code (Centralized Enforcement)

Where Hermes + This Plugin Fit

How Other Solutions Handle Config Tampering

What the Plugin Can and Can't Do

What It Does Well

Honest Limitations

Complementary Solutions

What Other Ecosystems Teach Us

Wiki Pages

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally