Security & Risks

⚠️ Experimental Warning

This plugin is in early development and has not been rigorously tested across all Hermes environments, versions, or edge cases. There is no guarantee that all destructive commands will be caught or blocked. Pattern matching is best-effort — creative command obfuscation, shell expansions, piped commands, or edge cases in the approval flow may bypass detection. Do not rely on this plugin as your sole safety net for critical operations.

The Central Tension: Agent Self-Modification

This is the most important issue in the design. The agent can read and write files. If it has write access to ~/.hermes/custom-dangerous-patterns.yaml, it can:

Remove block patterns it finds inconvenient
Add allow patterns that exempt itself from dangerous commands
Widen existing patterns to be less restrictive
Replace the entire config with a permissive version

Bypass Vectors Through Custom Patterns

Allow patterns can override built-in protections. The evaluation order is:

 1. [Plugin]  Deny patterns (custom)        → BLOCKED immediately, no prompt
               (wraps original check_all_command_guards)
 2. [Hermes]  Hardline check                → blocked unconditionally
 3. [Hermes]  Sudo stdin guard              → blocked unconditionally
 4. [Hermes]  Yolo / mode=off               → bypasses steps 5-7
 5. [Plugin]  Allow patterns (custom)       → command runs, no prompt (allow wins)
 6. detect_dangerous_command():             — same approval prompt for both —
    a. [Plugin]  Block patterns (custom)    → [o]nce/[s]ession/[a]lways/[d]eny
    b. [Hermes]  Built-in patterns          → [o]nce/[s]ession/[a]lways/[d]eny
 7. [Hermes]  Tirith security scan          → approval prompt if findings

If the agent adds an allow pattern that matches what Hermes would otherwise flag (e.g., \baws\b.*), the allow wins before the built-in check runs — the command runs without any prompt at all.

Concrete bypass scenarios:

Agent adds allow_patterns: [{pattern: '.*'}] → all dangerous commands bypass
Agent adds allow_patterns: [{pattern: '\baws\s+ec2\s+terminate-instances\b'}] → specific destructive commands exempt
Agent removes a block pattern by editing config → pattern no longer triggers

Agent Has Multiple Paths to Modify the Config

Command	Why pattern matching doesn't see it
`python -c "open('~/.hermes/...').write('...')"`	filename is in a Python string, not shell-visible
File-write tool call	tool handles I/O; filename not in command string
`sed -i 's/block/allow/' ~/.hermes/...`	sed's `-i` makes the filename visible, but Hermes's `_SENSITIVE_WRITE_TARGET` only covers `config.yaml` — NOT this plugin's config
`tee ~/.hermes/... << 'EOF'`	same: not in `_SENSITIVE_WRITE_TARGET`

The Hardline Backstop

Hardline patterns cannot be bypassed. rm -rf /, mkfs, dd to raw device, fork bombs, and similar catastrophic commands are checked first with no allow-pattern escape. This is the only structural defense against self-modification.

The plugin's monkey-patch returns early before reaching hardline checks, so even allow_patterns: [{pattern: '.*'}] cannot bypass them.

Plugin's Stance: Detect, Don't Enforce

The plugin's stance on self-modification is: allow it, detect it, log it.

The plugin will NOT:

Fail to load if the config is writable by the agent's process user
Enforce OS-level ownership or permission checks
Block the agent from editing the config via tool calls or Python I/O

The plugin WILL:

Track the previous config hash and log a security warning when the config changes between sessions
Validate protected: true patterns at load and CRITICAL-log if they are missing or modified
Warn when a newly-added allow pattern appears to bypass built-in dangerous patterns
Block commands matching deny patterns immediately, no prompt. Unlike block patterns, --yolo does not bypass deny patterns.
Document the risk and the optional OS-user separation approach

This matches the feedback: the agent is trusted to self-manage; the plugin's role is to surface changes loudly, not to police them.

Comparison with Other Solutions

Solution	How it handles config tampering
OpenClaw + Cedar	Policy engine runs outside the agent process. Agent can't modify policy files.
Claude Code	Runs in Anthropic's cloud. The agent has no access to its own safety configuration.
Aider + CodeGate	CodeGate intercepts API traffic. The agent never touches the security proxy's config.
Cursor	Desktop app with separate security process. Config files are outside the agent's sandbox.
Hermes + this plugin	Agent and safety config share the same filesystem. The plugin can only detect tampering, not prevent it.

The key insight: Every other major solution places the safety configuration outside the agent's trust boundary. Hermes, as a local-first agent, cannot do this without OS-level user separation or container isolation — both of which the plugin documents but cannot enforce.

What Can Custom Patterns Circumvent?

Yes, in bounded ways:

Allow patterns run before built-in patterns. If a user (or agent) adds an allow pattern like \baws\b.* while aws is in a built-in block pattern, the allow wins and all AWS commands bypass approval.
Deny patterns block commands immediately, no prompt. Unlike block patterns, they cannot be bypassed by --yolo or mode=off.
Built-in patterns with no customization are unaffected — they still trigger normally if no allow pattern matches.
Hardline checks (rm -rf /, mkfs, dd, etc.) run before everything. These cannot be bypassed by any allow pattern.

Optional Hardening: Config Path Outside Agent Sandbox

For users who need stronger guarantees, place the config file somewhere the agent cannot write:

# Create config owned by a different user
sudo mkdir -p /etc/hermes-patterns
sudo chown root:staff /etc/hermes-patterns
sudo chmod 755 /etc/hermes-patterns

# Move config there
sudo mv ~/.hermes/custom-dangerous-patterns.yaml /etc/hermes-patterns/custom-dangerous-patterns.yaml
sudo chown root:staff /etc/hermes-patterns/custom-dangerous-patterns.yaml

# Point the plugin at it
export HERMES_CUSTOM_PATTERNS_PATH=/etc/hermes-patterns/custom-dangerous-patterns.yaml

This is an optional hardening step — not the default, not enforced by the plugin.

Testing Safety

When testing the plugin's approval/blocking logic:

Never use real dangerous commands (rm -rf /, DROP DATABASE, git push --force)
Use patterns from examples/test-patterns.yaml which are enabled: false by default and scoped to safe targets (/tmp/, test_ prefixes)
Prefix custom test descriptions with [TEST] for clarity

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security & Risks

Security & Risks

⚠️ Experimental Warning

The Central Tension: Agent Self-Modification

Bypass Vectors Through Custom Patterns

Agent Has Multiple Paths to Modify the Config

The Hardline Backstop

Plugin's Stance: Detect, Don't Enforce

Comparison with Other Solutions

What Can Custom Patterns Circumvent?

Optional Hardening: Config Path Outside Agent Sandbox

Testing Safety

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally