-
Notifications
You must be signed in to change notification settings - Fork 0
Security & Risks
This plugin is in early development and has not been rigorously tested across all Hermes environments, versions, or edge cases. There is no guarantee that all destructive commands will be caught or blocked. Pattern matching is best-effort — creative command obfuscation, shell expansions, piped commands, or edge cases in the approval flow may bypass detection. Do not rely on this plugin as your sole safety net for critical operations.
This is the most important issue in the design. The agent can read and write files. If it has write access to ~/.hermes/custom-dangerous-patterns.yaml, it can:
- Remove block patterns it finds inconvenient
- Add allow patterns that exempt itself from dangerous commands
- Widen existing patterns to be less restrictive
- Replace the entire config with a permissive version
Allow patterns can override built-in protections. The evaluation order is:
1. [Plugin] Deny patterns (custom) → BLOCKED immediately, no prompt
(wraps original check_all_command_guards)
2. [Hermes] Hardline check → blocked unconditionally
3. [Hermes] Sudo stdin guard → blocked unconditionally
4. [Hermes] Yolo / mode=off → bypasses steps 5-7
5. [Plugin] Allow patterns (custom) → command runs, no prompt (allow wins)
6. detect_dangerous_command(): — same approval prompt for both —
a. [Plugin] Block patterns (custom) → [o]nce/[s]ession/[a]lways/[d]eny
b. [Hermes] Built-in patterns → [o]nce/[s]ession/[a]lways/[d]eny
7. [Hermes] Tirith security scan → approval prompt if findings
If the agent adds an allow pattern that matches what Hermes would otherwise flag (e.g., \baws\b.*), the allow wins before the built-in check runs — the command runs without any prompt at all.
Concrete bypass scenarios:
- Agent adds
allow_patterns: [{pattern: '.*'}]→ all dangerous commands bypass - Agent adds
allow_patterns: [{pattern: '\baws\s+ec2\s+terminate-instances\b'}]→ specific destructive commands exempt - Agent removes a block pattern by editing config → pattern no longer triggers
| Command | Why pattern matching doesn't see it |
|---|---|
python -c "open('~/.hermes/...').write('...')" |
filename is in a Python string, not shell-visible |
| File-write tool call | tool handles I/O; filename not in command string |
sed -i 's/block/allow/' ~/.hermes/... |
sed's -i makes the filename visible, but Hermes's _SENSITIVE_WRITE_TARGET only covers config.yaml — NOT this plugin's config |
tee ~/.hermes/... << 'EOF' |
same: not in _SENSITIVE_WRITE_TARGET
|
Hardline patterns cannot be bypassed. rm -rf /, mkfs, dd to raw device, fork bombs, and similar catastrophic commands are checked first with no allow-pattern escape. This is the only structural defense against self-modification.
The plugin's monkey-patch returns early before reaching hardline checks, so even allow_patterns: [{pattern: '.*'}] cannot bypass them.
The plugin's stance on self-modification is: allow it, detect it, log it.
The plugin will NOT:
- Fail to load if the config is writable by the agent's process user
- Enforce OS-level ownership or permission checks
- Block the agent from editing the config via tool calls or Python I/O
The plugin WILL:
- Track the previous config hash and log a security warning when the config changes between sessions
- Validate
protected: truepatterns at load and CRITICAL-log if they are missing or modified - Warn when a newly-added allow pattern appears to bypass built-in dangerous patterns
- Block commands matching deny patterns immediately, no prompt. Unlike block patterns,
--yolodoes not bypass deny patterns. - Document the risk and the optional OS-user separation approach
This matches the feedback: the agent is trusted to self-manage; the plugin's role is to surface changes loudly, not to police them.
| Solution | How it handles config tampering |
|---|---|
| OpenClaw + Cedar | Policy engine runs outside the agent process. Agent can't modify policy files. |
| Claude Code | Runs in Anthropic's cloud. The agent has no access to its own safety configuration. |
| Aider + CodeGate | CodeGate intercepts API traffic. The agent never touches the security proxy's config. |
| Cursor | Desktop app with separate security process. Config files are outside the agent's sandbox. |
| Hermes + this plugin | Agent and safety config share the same filesystem. The plugin can only detect tampering, not prevent it. |
The key insight: Every other major solution places the safety configuration outside the agent's trust boundary. Hermes, as a local-first agent, cannot do this without OS-level user separation or container isolation — both of which the plugin documents but cannot enforce.
Yes, in bounded ways:
-
Allow patterns run before built-in patterns. If a user (or agent) adds an allow pattern like
\baws\b.*whileawsis in a built-in block pattern, the allow wins and all AWS commands bypass approval. -
Deny patterns block commands immediately, no prompt. Unlike block patterns, they cannot be bypassed by
--yoloormode=off. - Built-in patterns with no customization are unaffected — they still trigger normally if no allow pattern matches.
-
Hardline checks (
rm -rf /,mkfs,dd, etc.) run before everything. These cannot be bypassed by any allow pattern.
For users who need stronger guarantees, place the config file somewhere the agent cannot write:
# Create config owned by a different user
sudo mkdir -p /etc/hermes-patterns
sudo chown root:staff /etc/hermes-patterns
sudo chmod 755 /etc/hermes-patterns
# Move config there
sudo mv ~/.hermes/custom-dangerous-patterns.yaml /etc/hermes-patterns/custom-dangerous-patterns.yaml
sudo chown root:staff /etc/hermes-patterns/custom-dangerous-patterns.yaml
# Point the plugin at it
export HERMES_CUSTOM_PATTERNS_PATH=/etc/hermes-patterns/custom-dangerous-patterns.yamlThis is an optional hardening step — not the default, not enforced by the plugin.
When testing the plugin's approval/blocking logic:
-
Never use real dangerous commands (
rm -rf /,DROP DATABASE,git push --force) - Use patterns from
examples/test-patterns.yamlwhich areenabled: falseby default and scoped to safe targets (/tmp/,test_prefixes) - Prefix custom test descriptions with
[TEST]for clarity