Skip to content

Security & Risks

Stephen Cross edited this page Jun 3, 2026 · 3 revisions

Security & Risks

⚠️ Experimental Warning

This plugin is in early development and has not been rigorously tested across all Hermes environments, versions, or edge cases. There is no guarantee that all destructive commands will be caught or blocked. Pattern matching is best-effort — creative command obfuscation, shell expansions, piped commands, or edge cases in the approval flow may bypass detection. Do not rely on this plugin as your sole safety net for critical operations.

The Central Tension: Agent Self-Modification

This is the most important issue in the design. The agent can read and write files. If it has write access to ~/.hermes/custom-dangerous-patterns.yaml, it can:

  1. Remove block patterns it finds inconvenient
  2. Add allow patterns that exempt itself from dangerous commands
  3. Widen existing patterns to be less restrictive
  4. Replace the entire config with a permissive version

Bypass Vectors Through Custom Patterns

Allow patterns can override built-in protections. The evaluation order is:

1. Hardline (unconditional, not bypassable)
2. Sudo stdin guard (unconditional)
3. Yolo / mode=off
4. Custom allow patterns  ← agent can add arbitrary patterns here
5. Custom block patterns
6. Builtin DANGEROUS_PATTERNS
7. Tirith scan

If the agent adds an allow pattern that matches what Hermes would otherwise flag (e.g., \baws\b.*), the allow wins before the built-in check runs — the command runs without any prompt at all.

Concrete bypass scenarios:

  • Agent adds allow_patterns: [{pattern: '.*'}] → all dangerous commands bypass
  • Agent adds allow_patterns: [{pattern: '\baws\s+ec2\s+terminate-instances\b'}] → specific destructive commands exempt
  • Agent removes a block pattern by editing config → pattern no longer triggers

Agent Has Multiple Paths to Modify the Config

Command Why pattern matching doesn't see it
python -c "open('~/.hermes/...').write('...')" filename is in a Python string, not shell-visible
File-write tool call tool handles I/O; filename not in command string
sed -i 's/block/allow/' ~/.hermes/... sed's -i makes the filename visible, but Hermes's _SENSITIVE_WRITE_TARGET only covers config.yaml — NOT this plugin's config
tee ~/.hermes/... << 'EOF' same: not in _SENSITIVE_WRITE_TARGET

The Hardline Backstop

Hardline patterns cannot be bypassed. rm -rf /, mkfs, dd to raw device, fork bombs, and similar catastrophic commands are checked first with no allow-pattern escape. This is the only structural defense against self-modification.

The plugin's monkey-patch returns early before reaching hardline checks, so even allow_patterns: [{pattern: '.*'}] cannot bypass them.

Plugin's Stance: Detect, Don't Enforce

The plugin's stance on self-modification is: allow it, detect it, log it.

The plugin will NOT:

  • Fail to load if the config is writable by the agent's process user
  • Enforce OS-level ownership or permission checks
  • Block the agent from editing the config via tool calls or Python I/O

The plugin WILL:

  • Track the previous config hash and log a security warning when the config changes between sessions
  • Validate protected: true patterns at load and CRITICAL-log if they are missing or modified
  • Warn when a newly-added allow pattern appears to bypass built-in dangerous patterns
  • Document the risk and the optional OS-user separation approach

This matches the feedback: the agent is trusted to self-manage; the plugin's role is to surface changes loudly, not to police them.

Comparison with Other Solutions

Solution How it handles config tampering
OpenClaw + Cedar Policy engine runs outside the agent process. Agent can't modify policy files.
Claude Code Runs in Anthropic's cloud. The agent has no access to its own safety configuration.
Aider + CodeGate CodeGate intercepts API traffic. The agent never touches the security proxy's config.
Cursor Desktop app with separate security process. Config files are outside the agent's sandbox.
Hermes + this plugin Agent and safety config share the same filesystem. The plugin can only detect tampering, not prevent it.

The key insight: Every other major solution places the safety configuration outside the agent's trust boundary. Hermes, as a local-first agent, cannot do this without OS-level user separation or container isolation — both of which the plugin documents but cannot enforce.

What Can Custom Patterns Circumvent?

Yes, in bounded ways:

  • Allow patterns run before built-in patterns. If a user (or agent) adds an allow pattern like \baws\b.* while aws is in a built-in block pattern, the allow wins and all AWS commands bypass approval.
  • Built-in patterns with no customization are unaffected — they still trigger normally if no allow pattern matches.
  • Hardline checks (rm -rf /, mkfs, dd, etc.) run before everything. These cannot be bypassed by any allow pattern.

Optional Hardening: Config Path Outside Agent Sandbox

For users who need stronger guarantees, place the config file somewhere the agent cannot write:

# Create config owned by a different user
sudo mkdir -p /etc/hermes-patterns
sudo chown root:staff /etc/hermes-patterns
sudo chmod 755 /etc/hermes-patterns

# Move config there
sudo mv ~/.hermes/custom-dangerous-patterns.yaml /etc/hermes-patterns/custom-dangerous-patterns.yaml
sudo chown root:staff /etc/hermes-patterns/custom-dangerous-patterns.yaml

# Point the plugin at it
export HERMES_CUSTOM_PATTERNS_PATH=/etc/hermes-patterns/custom-dangerous-patterns.yaml

This is an optional hardening step — not the default, not enforced by the plugin.

Testing Safety

When testing the plugin's approval/blocking logic:

  • Never use real dangerous commands (rm -rf /, DROP DATABASE, git push --force)
  • Use patterns from examples/test-patterns.yaml which are enabled: false by default and scoped to safe targets (/tmp/, test_ prefixes)
  • Prefix custom test descriptions with [TEST] for clarity

Clone this wiki locally