Skip to content
Stephen Cross edited this page Jun 9, 2026 · 8 revisions

Hermes Agent - Custom Dangerous Patterns plugin

⚠️ USE AT YOUR OWN RISK

This plugin is in early development and has not been rigorously tested across all Hermes environments, versions, or edge cases. There is no guarantee that all destructive commands will be caught or blocked. Pattern matching is best-effort — creative command obfuscation, shell expansions, piped commands, or edge cases in the approval flow may bypass detection. Do not rely on this plugin as your sole safety net for critical operations.


The Problem

Hermes Agent ships with ~47 hardcoded dangerous command patterns (rm -rf, git reset --hard, docker stop, etc.). These cover common Unix commands but leave a large surface area unprotected:

  • Cloud CLI tools (vultr, gcloud, aws, az, oci, doctl) — each with their own destructive subcommands
  • IaC tools (terraform destroy, pulumi up, cdk deploy)
  • Database operations (DROP TABLE, mongodump --drop)
  • CI/CD commands (gh run delete, circleci purge)
  • Domain-specific tools unique to a user's workflow

Without this plugin, a user who wants to guard vultr instance delete or terraform destroy -auto-approve has no clean mechanism — they either rely on the agent's judgment (risky) or manually approve every command (tedious). The plugin gives users first-class access to Hermes's approval flow (once/session/always/deny, gateway /approve//deny, session persistence, permanent allowlist) for their own patterns.

Where It Fits in the Ecosystem

Command execution safety in AI coding agents has become a defining architectural concern. The landscape breaks into three philosophical camps:

A. Human-in-the-Loop (Approval Gates)

Tool Approach
Claude Code Strict read-only by default. Every destructive action requires explicit user approval. Fail-closed architecture.
Aider No automatic command execution. Every shell command and code modification requires human confirmation. Git-based audit trail.
Cursor Split-mode UI: "Interactive" (approval required) vs. "Auto-run" (YOLO). Optional bwrap sandboxing.

This camp treats the agent as fundamentally untrusted. The user is the ultimate gatekeeper. Hermes (with this plugin) belongs here — the approval prompt is the primary safety mechanism.

B. Sandboxing-First (Environment Isolation)

Tool Approach
Open Interpreter Explicit "sandboxing-first" documentation. Recommends Docker/container isolation as the primary safety layer.
OpenClaw Three sandboxing modes (off / non-main / all). Container-based isolation with read-only mounts. Per-agent, per-session scoping.
E2B / CodeGate Specialized secure runtimes (WebAssembly, ephemeral containers) as middleware between agent and OS.

This camp says: don't bother blocking individual commands — isolate the agent entirely so it can't damage anything important.

C. Policy-as-Code (Centralized Enforcement)

Tool Approach
OpenClaw + Cedar Policy engines evaluate tool requests based on context (who, what action, what resource). Allowlists over denylists.
CodeGate Middleware proxy that inspects prompts and responses, enforcing security policies at the API boundary.
Enterprise guardrail services Centralized policy enforcement with immutable audit trails.

This camp treats safety as an architectural property of the orchestration layer, not a feature of the agent itself. The agent never has direct filesystem access to safety configuration.

Where Hermes + This Plugin Fit

Hermes, as an open-source local agent, sits in Camp A with a light touch of Camp B (container backends skip approval checks). This plugin extends Camp A by making the approval gate user-configurable.

Key observation: The plugin operates in a local-agent trust model where the agent has the same filesystem access as the user. This is fundamentally different from Claude Code (which runs in Anthropic's cloud with an API boundary) or OpenClaw (which enforces policies at an orchestration layer outside the agent process). The plugin cannot change this — it can only work within it.

How Other Solutions Handle Config Tampering

Solution How it handles config tampering
OpenClaw + Cedar Policy engine runs outside the agent process. Agent can't modify policy files — they're owned by the orchestration layer.
Claude Code Runs in Anthropic's cloud. The agent has no access to its own safety configuration.
Aider + CodeGate CodeGate intercepts API traffic. The agent never touches the security proxy's config.
Cursor Desktop app with separate security process. Config files are outside the agent's sandbox.
Hermes + this plugin Agent and safety config share the same filesystem. The plugin can only detect tampering, not prevent it.

The key insight: Every other major solution places the safety configuration outside the agent's trust boundary. Hermes, as a local-first agent, cannot do this without OS-level user separation or container isolation — both of which the plugin documents but cannot enforce.

What the Plugin Can and Can't Do

What It Does Well

  • First-class approval integration. Custom patterns get the exact same treatment as built-in ones — same prompts, session persistence, permanent allowlist, gateway /approve//deny, smart mode assessment.
  • Deny patterns. Block commands immediately without an approval prompt — checked after allow, before block patterns.
  • Config integrity tracking. SHA-256 hash of config persisted across sessions; changes log a WARNING with old/new pattern counts.
  • Protected pattern tier. Patterns with protected: true have their regex hashes tracked; removal or modification logs a CRITICAL warning at startup.
  • Directory config loading. Config path can be a directory of *.yaml files, merged alphabetically.
  • Allow shadowing detection. Warns when an allow pattern could bypass a built-in dangerous pattern.
  • Allow patterns. Exempt read-only commands (e.g., vultr instance list) while still gating destructive ones. Allow wins over block.
  • Graceful degradation. Bad config never crashes the agent — warnings are logged, invalid patterns are skipped.
  • Hardline backstop. Catastrophic commands (rm -rf /, mkfs, dd to raw device) are checked before anything else and cannot be bypassed by any allow pattern, including .*.
  • CLI management. Full hermes custom-dangerous-patterns CLI for listing, testing, enabling/disabling, adding/removing, and validating patterns without editing YAML directly.

Honest Limitations

  • The agent can modify the config. If it has write access to ~/.hermes/custom-dangerous-patterns.yaml, it can add allow_patterns: [{pattern: '.*'}] and exempt itself from all custom and built-in patterns (except hardline). This is a fundamental property of the local-agent trust model — not a bug.
  • Regex has blind spots. Obfuscated commands (vultr$(echo) instance$(echo) delete), multi-step attacks, and file-write tool calls bypass command-line pattern matching. No regex-based solution in any ecosystem fully solves this.
  • No runtime reconfiguration. Config is loaded once at startup. Mid-session edits require a restart. The CLI (hermes custom-dangerous-patterns ...) supports editing configs but a restart is still needed for changes to take effect.
  • Import order dependency. Patterns must be injected before tools.approval is imported. Hermes's normal load order handles this, but it's not verified at runtime.

See Security-&-Risks for the full self-modification analysis and optional hardening steps.

Complementary Solutions

This plugin is complementary to, not competitive with:

Solution How it complements
Hermes container backends Container isolation + pattern-based approval = defense in depth. Container catches what patterns miss; patterns gate what containers allow.
Tirith security scanning Patterns gate known commands; Tirith catches novel threats in command content.
Gateway approval /approve /deny Plugin defines what triggers approval; gateway handles how approval is delivered across channels.
command_allowlist in config.yaml Plugin's patterns trigger the prompt; user's "always" choice persists to allowlist.

What Other Ecosystems Teach Us

From OpenClaw: Allowlist-first mentality — "nothing allowed unless explicitly permitted" is more secure but less practical for open-ended coding tasks. The plugin could support a "default-deny" mode as a future feature. OpenClaw's Cedar integration also treats policies as version-controlled, testable artifacts — the plugin's YAML config is a light version of this.

From Claude Code: Transparency by default — showing what the agent is about to do in natural language before asking for approval. The plugin's descriptions serve this purpose but are static. Permission scoping (restricting writes to the working directory) is another idea Hermes could adopt with OS-level changes.

From Aider + CodeGate: Middleware as a security layer. CodeGate sits between Aider and the LLM API, inspecting traffic. This is architecturally cleaner than monkey-patching internal functions. If Hermes ever adds a pre-execution hook that can veto commands, plugins could use it instead.

Wiki Pages

  • Installation — Install, update, and enable the plugin
  • CLI-Referencehermes custom-dangerous-patterns command reference
  • Configuration — YAML config format, patterns, allow patterns, evaluation order
  • Architecture — How the plugin works internally, plugin structure, design decisions
  • Roadmap — Planned features and enhancements
  • Security-&-Risks — Agent self-modification, evaluation order, hardline backstop