Open-source security proxy that protects AI agents from prompt injection attacks.
Website β’ Quick Start β’ How It Works β’ Community Threat Feed β’ Want to Help?
AI agents that browse the web are vulnerable to prompt injection attacks. Malicious websites can embed hidden instructions that hijack your agent's behavior β stealing data, executing commands, or overriding safety guidelines. Simple input filtering isn't enough; this is an adversarial problem that requires defense-in-depth.
No existing open-source tool addresses this. FireClaw fills that gap.
FireClaw sits between your AI agent and the internet. Every web fetch passes through a hardened 4-stage pipeline that strips prompt injection payloads before content reaches your agent's context window.
Your agent calls FireClaw instead of fetching directly. FireClaw returns clean, factual content β no hidden instructions, no Unicode tricks, no encoding exploits.
Your Agent FireClaw The Web
β β β
βββ fetch("example.com") βββΆβ β
β βββ GET example.com βββββββββΆβ
β ββββ raw HTML βββββββββββββββ
β β β
β β ββββ Stage 1: DNS Check ββββββ
β β β Block known-malicious URLs β
β β ββββββββββββββββββββββββββββββ
β β β
β β ββββ Stage 2: Sanitize βββββββ
β β β Strip HTML tricks, hidden β
β β β Unicode, encoding exploits, β
β β β inject canary tokens β
β β ββββββββββββββββββββββββββββββ
β β β
β β ββββ Stage 3: LLM Summary ββββ
β β β Isolated LLM extracts facts β
β β β only β no tools, no memory β
β β ββββββββββββββββββββββββββββββ
β β β
β β ββββ Stage 4: Output Scan ββββ
β β β Check for residual inject- β
β β β ions, canary survival, β
β β β tool-call syntax β
β β ββββββββββββββββββββββββββββββ
β β
ββββ clean content ββββββββββ
Even if the summarization LLM in Stage 3 gets injected, it has no tools, no memory, and no access to your data. It can only return text. And that text still passes through Stage 4 output scanning. The attacker is in a dead end.
- 200+ Injection Patterns β Regex-based detection covering structural tricks, injection signatures, exfiltration attempts, and output manipulation
- DNS-Level Blocklists β Integrates URLhaus, PhishTank, OpenPhish, and the FireClaw community blocklist
- Canary Token System β Unique markers injected into content detect if summarization was bypassed
- Domain Trust Tiers β Configure trusted (skip sanitization), neutral (full pipeline), suspicious (aggressive), or blocked (reject) per domain
- Rate Limiting & Cost Controls β Per-minute/hour/day limits with auto-throttle and hard caps
- JSONL Audit Logging β Complete forensic trail of every fetch, detection, and alert
- No Bypass Mode β The pipeline is fixed. Even if your agent is compromised, it cannot disable FireClaw.
- OLED Display Support β Optional Raspberry Pi OLED integration for physical monitoring
- Dashboard β Web-based UI for monitoring, configuration, and log browsing
FireClaw runs on a Raspberry Pi as a dedicated security appliance with a live OLED display showing real-time stats β and animated fire claws when it catches a threat.
FireClaw gets smarter when we work together.
When you enable data sharing (opt-in), FireClaw anonymously contributes detection metadata to a shared community threat feed. No page content is ever sent β only:
- Domain name
- Number of detections and severity level
- Domain trust tier
- Whether the fetch was flagged
- Processing duration
This data helps the entire FireClaw community by:
- Identifying emerging threat domains across all instances
- Improving pattern detection through real-world signal
- Building a shared blocklist that benefits everyone
- Tracking injection trends over time
In your data/settings.json, just flip one switch:
{
"privacy": {
"shareData": true
}
}That's it. No API keys to configure β FireClaw ships with the community endpoint built in. All instances write to the same shared threat database, protected by Row Level Security (INSERT-only β no one can read, modify, or delete other instances' data through the public API).
Privacy first: Data sharing is disabled by default. You choose whether to participate. All data is anonymized with a random instance ID β no personal information, no IP addresses, no page content.
All community data submissions are validated and sanitized before being sent:
- Whitelisted fields only (no extra data can sneak in)
- Type checking and range limits on every field
- Supabase URL validated against expected domain patterns (SSRF protection)
- Instance IDs validated as UUID v4 format
- 5-second timeout on all submissions
- Non-blocking β submission failures never affect proxy operation
- Node.js 18+
- npm
git clone https://github.com/raiph-ai/fireclaw.git
cd fireclaw
npm installCopy the default settings:
cp data/settings.example.json data/settings.jsonEdit config.yaml for your environment:
fireclaw:
enabled: true
model: "anthropic/claude-haiku-4" # LLM for Stage 3
trust_tiers:
trusted:
- "wikipedia.org"
- "github.com"
alerts:
enabled: true
channel: "slack:YOUR_CHANNEL_ID"
threshold: "medium"node dashboard/server.mjsThe dashboard and proxy API will be available at http://localhost:8420.
curl -X POST http://localhost:8420/api/proxy \
-H 'Content-Type: application/json' \
-H 'X-FireClaw-Action: fetch' \
-d '{"url":"https://example.com","intent":"Get page summary"}'Fetch a URL through the FireClaw pipeline.
Headers:
Content-Type: application/jsonX-FireClaw-Action: fetch
Body:
{
"url": "https://example.com",
"intent": "What is this page about?"
}Response:
{
"content": "Sanitized summary of the page...",
"error": null,
"metadata": {
"fetchId": "a1b2c3d4",
"tier": "neutral",
"detections": 2,
"severity": 6,
"severityLevel": "medium",
"flagged": false,
"duration": 1234,
"canaries": 3,
"skippedSanitization": false
}
}Health check endpoint.
Runtime statistics (detections, blocks, rate limits, cache).
| File | Purpose |
|---|---|
fireclaw.mjs |
Main pipeline orchestrator |
sanitizer.mjs |
Pattern matching, sanitization, canary system |
patterns.json |
200+ regex patterns for injection detection |
config.yaml |
Full configuration |
proxy-prompt.md |
Hardened system prompt for Stage 3 |
- ResultCache β In-memory caching with configurable TTL
- RateLimiter β Token bucket rate limiting (per minute/hour/day)
- DNSBlocklistManager β Threat feed fetching and domain blocking
- DomainTrustManager β Per-domain sanitization intensity
- AuditLogger β Append-only JSONL with replay support
- AlertManager β Severity-tiered alerts with digest mode
- CanaryTokenSystem β Inject and detect bypass markers
FireClaw has no bypass mode. The pipeline is fixed and cannot be disabled at runtime:
inner_alignment:
allow_override: false # Cannot be changed
allow_bypass: false # Cannot be changed
log_override_attempts: trueIf your agent is compromised, the attacker cannot disable FireClaw. Period.
FireClaw can run as a dedicated physical appliance on a Raspberry Pi with a 3D-printed enclosure and OLED display.
The 128Γ64 OLED display (SSD1306, I2C) rotates through five screens every 5 seconds:
| Screen | What It Shows |
|---|---|
| Claw | Animated FireClaw logo β ignites with flames and sparks when a threat is detected, with !! THREAT !! banner |
| IP/Network | Device hostname and IP address |
| Today's Stats | Live fetch count and threat detections for the current day |
| Uptime | How long the proxy has been running (days/hours/minutes) with a heartbeat indicator |
| Health | CPU temperature, RAM usage, and disk usage |
OLED showing daily fetch and threat counts
When a threat is detected, the display interrupts its rotation to show the claw icon engulfed in animated flames for 5 seconds β a visual confirmation that FireClaw caught something.
See the oled/ directory for the display service, claw bitmap, and wiring details.
β
Embedded instructions in web content
β
Unicode tricks (RTL overrides, zero-width chars, homoglyphs)
β
HTML obfuscation (hidden CSS, comments, data URIs)
β
Encoding exploits (base64 blobs, URL encoding, hex escapes)
β
Jailbreak attempts ("ignore previous instructions", "you are now", "DAN mode")
β
Tool call injection (function syntax, escaped quotes in output)
β
Data exfiltration (webhooks, suspicious URLs, email addresses)
β
Summarization bypass (canary token detection)
β Image-based injection (text in images) β planned
β PDF-embedded exploits β planned
β Audio/video injection β out of scope
β Zero-day LLM vulnerabilities β requires model-level fixes
β Social engineering β requires human judgment
- Image content analysis (OCR + vision model)
- PDF sanitization pipeline
- Machine learning pattern detection
- Federated learning from community data
- Real-time pattern updates from threat feed
- Multi-framework integration guides
FireClaw is a community project and we'd love your contribution. Whether you're a security researcher, an AI engineer, or someone who cares about making AI agents safer β there's a place for you.
- π Share injection patterns β Found a new attack vector? Help us detect it.
- π§ͺ Test and break things β Try to bypass the pipeline and report what you find.
- π Improve documentation β Make FireClaw easier to understand and adopt.
- π§ Build integrations β Connect FireClaw to other AI agent frameworks.
- π Enable data sharing β Every instance that contributes detection data makes the community threat feed stronger.
- GitHub Issues β Bug reports, feature requests, pattern contributions
- Email β security@fireclaw.app for responsible disclosure
- Website β fireclaw.app
If you're interested in contributing or have questions, please open an issue or reach out. We're building this together.
FireClaw is licensed under the GNU Affero General Public License v3.0 (AGPLv3).
See LICENSE for the full text.
The community threat feed data is shared under separate dataset terms.
"FireClaw" is a trademark of Ralph Perez. See TRADEMARK.md for usage guidelines.
Found a bypass or vulnerability? Please report responsibly:
- Email: security@fireclaw.app
- Policy: 90-day coordinated disclosure
FireClaw β Defend Your Agent. Protect Your Data. Join the Community.
π‘οΈ fireclaw.app


