██╗ ██╗███╗ ██╗ ██████╗ ███████╗████████╗██╗ ██████╗
██║ ██╔╝████╗ ██║██╔═══██╗██╔════╝╚══██╔══╝██║██╔════╝
█████╔╝ ██╔██╗ ██║██║ ██║███████╗ ██║ ██║██║
██╔═██╗ ██║╚██╗██║██║ ██║╚════██║ ██║ ██║██║
██║ ██╗██║ ╚████║╚██████╔╝███████║ ██║ ██║╚██████╗
╚═╝ ╚═╝╚═╝ ╚═══╝ ╚═════╝ ╚══════╝ ╚═╝ ╚═╝ ╚═════╝
By Knostic
openclaw-shield:** Security plugin for OpenClaw. Prevents your AI agent from leaking secrets, exposing PII, or executing destructive commands. CRITICAL: OpenClaw gets updated constantly, and without community updates, it won't stay effective for more than mere days. already we had to update it several times.
By Knostic
Also check out:
- openclaw-detect: https://github.com/knostic/openclaw-detect/
- openclaw-telemetry: https://github.com/knostic/openclaw-telemetry/
- Like what we do? Knostic helps you with visibility and control of your coding agents and MCP/extensions, from Cursor and Claude Code, to Copilot.
Five layers of defense-in-depth security, each independently toggleable:
| Layer | What it does | Hook |
|---|---|---|
| L1 Prompt Guard | Injects security policy into agent context so the LLM knows the rules | before_agent_start |
| L2 Output Scanner | Redacts secrets and PII from tool output before it hits the transcript | tool_result_persist |
| L3 Tool Blocker | Hard-blocks dangerous tool calls at the host level | before_tool_call |
| L4 Input Audit | Logs inbound messages and flags any secrets users accidentally send | message_received |
| L5 Security Gate | A gate tool the agent must call before exec or file-read, returning ALLOWED/DENIED | registerTool |
openclaw plugins install @knostic/openclaw-shieldThat's it. The plugin activates on the next gateway restart with all layers enabled in enforce mode.
Configure via OpenClaw's plugin config system. All settings are optional — defaults are secure out of the box.
| Option | Type | Default | Description |
|---|---|---|---|
mode |
"enforce" | "audit" |
"enforce" |
In audit mode, findings are logged but nothing is blocked or redacted |
layers |
object |
all true |
Toggle individual layers on/off |
sensitiveFilePaths |
string[] |
[] |
Additional regex patterns for file-read gating (merged with built-in list) |
destructiveCommands |
string[] |
[] |
Additional regex patterns for command blocking (merged with built-in list) |
Traditional prompt-injection defenses (telling the LLM "don't do X") fail when the user directly instructs the agent to do something dangerous. The L5 security gate solves this by registering a tool called knostic_shield that the agent must call before every exec or read operation.
sequenceDiagram
participant User
participant Agent
participant knostic_shield
participant exec/read
User->>Agent: "delete all files in /tmp"
Agent->>knostic_shield: command="rm -rf /tmp/*"
knostic_shield-->>Agent: STATUS: DENIED (destructive)
Agent->>User: "Blocked by security policy"
The gate tool returns ALLOWED or DENIED. If denied, the agent is instructed not to proceed. This works on all OpenClaw versions because it uses the tool registration API, not the before_tool_call hook.
Secrets: AWS keys, Stripe keys, GitHub tokens, OpenAI/Anthropic keys, Slack tokens, SendGrid keys, npm tokens, private keys, JWTs, bearer tokens, generic API keys.
PII: Email addresses, US SSNs, credit card numbers, US/international phone numbers, IBANs.
Destructive commands: rm, rmdir, unlink, del, format, mkfs, dd if= (plus your custom patterns).
Sensitive files: .env, credentials.json, .pem, .key, SSH keys, .netrc, .npmrc, .aws/credentials, .kube/config, /etc/shadow (plus your custom patterns).
- L3 (Tool Blocker): The
before_tool_callhook is not wired in the published OpenClaw binary (v2026.1.30). L3 registers with feature detection and activates automatically when host support ships. L5 covers this gap. - L2 timing gap:
tool_result_persistfires at transcript-write time, not before the LLM processes the result. The LLM sees raw content for the current turn. L5's file-read gating with "don't output raw values" instruction mitigates this. - L5 is advisory: The gate tool relies on the LLM following the security policy injected by L1. In testing, this is reliable but not cryptographically enforced. L3 will provide hard enforcement when the host wires it.
# Clone
git clone https://github.com/knostic/openclaw-shield
cd openclaw-shield
# Install locally for testing
openclaw plugins install ./No build step required — OpenClaw transpiles TypeScript at load time via jiti.
Apache 2.0 — see LICENSE for details.