Skip to content

raiph-ai/fireclaw

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

26 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›‘οΈ FireClaw β€” A Firewall for Your Agent's Brain

FireClaw Logo

Open-source security proxy that protects AI agents from prompt injection attacks.

Website β€’ Quick Start β€’ How It Works β€’ Community Threat Feed β€’ Want to Help?


The Problem

AI agents that browse the web are vulnerable to prompt injection attacks. Malicious websites can embed hidden instructions that hijack your agent's behavior β€” stealing data, executing commands, or overriding safety guidelines. Simple input filtering isn't enough; this is an adversarial problem that requires defense-in-depth.

No existing open-source tool addresses this. FireClaw fills that gap.

What FireClaw Does

FireClaw sits between your AI agent and the internet. Every web fetch passes through a hardened 4-stage pipeline that strips prompt injection payloads before content reaches your agent's context window.

Your agent calls FireClaw instead of fetching directly. FireClaw returns clean, factual content β€” no hidden instructions, no Unicode tricks, no encoding exploits.


How It Works

  Your Agent                    FireClaw                     The Web
      β”‚                            β”‚                            β”‚
      │── fetch("example.com") ──▢│                            β”‚
      β”‚                            │── GET example.com ────────▢│
      β”‚                            │◀── raw HTML ──────────────│
      β”‚                            β”‚                            β”‚
      β”‚                            β”‚  β”Œβ”€β”€β”€ Stage 1: DNS Check ─────┐
      β”‚                            β”‚  β”‚ Block known-malicious URLs  β”‚
      β”‚                            β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      β”‚                            β”‚            ↓
      β”‚                            β”‚  β”Œβ”€β”€β”€ Stage 2: Sanitize ──────┐
      β”‚                            β”‚  β”‚ Strip HTML tricks, hidden   β”‚
      β”‚                            β”‚  β”‚ Unicode, encoding exploits, β”‚
      β”‚                            β”‚  β”‚ inject canary tokens        β”‚
      β”‚                            β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      β”‚                            β”‚            ↓
      β”‚                            β”‚  β”Œβ”€β”€β”€ Stage 3: LLM Summary ───┐
      β”‚                            β”‚  β”‚ Isolated LLM extracts facts β”‚
      β”‚                            β”‚  β”‚ only β€” no tools, no memory  β”‚
      β”‚                            β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      β”‚                            β”‚            ↓
      β”‚                            β”‚  β”Œβ”€β”€β”€ Stage 4: Output Scan ───┐
      β”‚                            β”‚  β”‚ Check for residual inject-  β”‚
      β”‚                            β”‚  β”‚ ions, canary survival,      β”‚
      β”‚                            β”‚  β”‚ tool-call syntax            β”‚
      β”‚                            β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      β”‚                            β”‚
      │◀── clean content ─────────│

The Key Insight

Even if the summarization LLM in Stage 3 gets injected, it has no tools, no memory, and no access to your data. It can only return text. And that text still passes through Stage 4 output scanning. The attacker is in a dead end.


Features

  • 200+ Injection Patterns β€” Regex-based detection covering structural tricks, injection signatures, exfiltration attempts, and output manipulation
  • DNS-Level Blocklists β€” Integrates URLhaus, PhishTank, OpenPhish, and the FireClaw community blocklist
  • Canary Token System β€” Unique markers injected into content detect if summarization was bypassed
  • Domain Trust Tiers β€” Configure trusted (skip sanitization), neutral (full pipeline), suspicious (aggressive), or blocked (reject) per domain
  • Rate Limiting & Cost Controls β€” Per-minute/hour/day limits with auto-throttle and hard caps
  • JSONL Audit Logging β€” Complete forensic trail of every fetch, detection, and alert
  • No Bypass Mode β€” The pipeline is fixed. Even if your agent is compromised, it cannot disable FireClaw.
  • OLED Display Support β€” Optional Raspberry Pi OLED integration for physical monitoring
  • Dashboard β€” Web-based UI for monitoring, configuration, and log browsing

πŸ”₯ Pi Appliance OLED Display

FireClaw OLED Display

FireClaw runs on a Raspberry Pi as a dedicated security appliance with a live OLED display showing real-time stats β€” and animated fire claws when it catches a threat.

Community Threat Feed

FireClaw gets smarter when we work together.

When you enable data sharing (opt-in), FireClaw anonymously contributes detection metadata to a shared community threat feed. No page content is ever sent β€” only:

  • Domain name
  • Number of detections and severity level
  • Domain trust tier
  • Whether the fetch was flagged
  • Processing duration

This data helps the entire FireClaw community by:

  • Identifying emerging threat domains across all instances
  • Improving pattern detection through real-world signal
  • Building a shared blocklist that benefits everyone
  • Tracking injection trends over time

How to Enable

In your data/settings.json, just flip one switch:

{
  "privacy": {
    "shareData": true
  }
}

That's it. No API keys to configure β€” FireClaw ships with the community endpoint built in. All instances write to the same shared threat database, protected by Row Level Security (INSERT-only β€” no one can read, modify, or delete other instances' data through the public API).

Privacy first: Data sharing is disabled by default. You choose whether to participate. All data is anonymized with a random instance ID β€” no personal information, no IP addresses, no page content.

Input Validation

All community data submissions are validated and sanitized before being sent:

  • Whitelisted fields only (no extra data can sneak in)
  • Type checking and range limits on every field
  • Supabase URL validated against expected domain patterns (SSRF protection)
  • Instance IDs validated as UUID v4 format
  • 5-second timeout on all submissions
  • Non-blocking β€” submission failures never affect proxy operation

Quick Start

Prerequisites

  • Node.js 18+
  • npm

Install

git clone https://github.com/raiph-ai/fireclaw.git
cd fireclaw
npm install

Configure

Copy the default settings:

cp data/settings.example.json data/settings.json

Edit config.yaml for your environment:

fireclaw:
  enabled: true
  model: "anthropic/claude-haiku-4"  # LLM for Stage 3
  
  trust_tiers:
    trusted:
      - "wikipedia.org"
      - "github.com"
  
  alerts:
    enabled: true
    channel: "slack:YOUR_CHANNEL_ID"
    threshold: "medium"

Run

node dashboard/server.mjs

The dashboard and proxy API will be available at http://localhost:8420.

Test

curl -X POST http://localhost:8420/api/proxy \
  -H 'Content-Type: application/json' \
  -H 'X-FireClaw-Action: fetch' \
  -d '{"url":"https://example.com","intent":"Get page summary"}'

API

POST /api/proxy

Fetch a URL through the FireClaw pipeline.

Headers:

  • Content-Type: application/json
  • X-FireClaw-Action: fetch

Body:

{
  "url": "https://example.com",
  "intent": "What is this page about?"
}

Response:

{
  "content": "Sanitized summary of the page...",
  "error": null,
  "metadata": {
    "fetchId": "a1b2c3d4",
    "tier": "neutral",
    "detections": 2,
    "severity": 6,
    "severityLevel": "medium",
    "flagged": false,
    "duration": 1234,
    "canaries": 3,
    "skippedSanitization": false
  }
}

GET /api/health

Health check endpoint.

GET /api/stats

Runtime statistics (detections, blocks, rate limits, cache).


Architecture

Core Components

File Purpose
fireclaw.mjs Main pipeline orchestrator
sanitizer.mjs Pattern matching, sanitization, canary system
patterns.json 200+ regex patterns for injection detection
config.yaml Full configuration
proxy-prompt.md Hardened system prompt for Stage 3

Modules

  • ResultCache β€” In-memory caching with configurable TTL
  • RateLimiter β€” Token bucket rate limiting (per minute/hour/day)
  • DNSBlocklistManager β€” Threat feed fetching and domain blocking
  • DomainTrustManager β€” Per-domain sanitization intensity
  • AuditLogger β€” Append-only JSONL with replay support
  • AlertManager β€” Severity-tiered alerts with digest mode
  • CanaryTokenSystem β€” Inject and detect bypass markers

Inner Alignment Protection

FireClaw has no bypass mode. The pipeline is fixed and cannot be disabled at runtime:

inner_alignment:
  allow_override: false    # Cannot be changed
  allow_bypass: false      # Cannot be changed
  log_override_attempts: true

If your agent is compromised, the attacker cannot disable FireClaw. Period.


Hardware Appliance (Optional)

FireClaw can run as a dedicated physical appliance on a Raspberry Pi with a 3D-printed enclosure and OLED display.

FireClaw Appliance

The 128Γ—64 OLED display (SSD1306, I2C) rotates through five screens every 5 seconds:

Screen What It Shows
Claw Animated FireClaw logo β€” ignites with flames and sparks when a threat is detected, with !! THREAT !! banner
IP/Network Device hostname and IP address
Today's Stats Live fetch count and threat detections for the current day
Uptime How long the proxy has been running (days/hours/minutes) with a heartbeat indicator
Health CPU temperature, RAM usage, and disk usage

OLED Display β€” Today's Stats
OLED showing daily fetch and threat counts

When a threat is detected, the display interrupts its rotation to show the claw icon engulfed in animated flames for 5 seconds β€” a visual confirmation that FireClaw caught something.

See the oled/ directory for the display service, claw bitmap, and wiring details.


Threat Model

Protects Against

βœ… Embedded instructions in web content
βœ… Unicode tricks (RTL overrides, zero-width chars, homoglyphs)
βœ… HTML obfuscation (hidden CSS, comments, data URIs)
βœ… Encoding exploits (base64 blobs, URL encoding, hex escapes)
βœ… Jailbreak attempts ("ignore previous instructions", "you are now", "DAN mode")
βœ… Tool call injection (function syntax, escaped quotes in output)
βœ… Data exfiltration (webhooks, suspicious URLs, email addresses)
βœ… Summarization bypass (canary token detection)

Does Not Protect Against

❌ Image-based injection (text in images) β€” planned
❌ PDF-embedded exploits β€” planned
❌ Audio/video injection β€” out of scope
❌ Zero-day LLM vulnerabilities β€” requires model-level fixes
❌ Social engineering β€” requires human judgment


Roadmap

  • Image content analysis (OCR + vision model)
  • PDF sanitization pipeline
  • Machine learning pattern detection
  • Federated learning from community data
  • Real-time pattern updates from threat feed
  • Multi-framework integration guides

Want to Help?

FireClaw is a community project and we'd love your contribution. Whether you're a security researcher, an AI engineer, or someone who cares about making AI agents safer β€” there's a place for you.

Ways to Contribute

  • πŸ” Share injection patterns β€” Found a new attack vector? Help us detect it.
  • πŸ§ͺ Test and break things β€” Try to bypass the pipeline and report what you find.
  • πŸ“ Improve documentation β€” Make FireClaw easier to understand and adopt.
  • πŸ”§ Build integrations β€” Connect FireClaw to other AI agent frameworks.
  • πŸ“Š Enable data sharing β€” Every instance that contributes detection data makes the community threat feed stronger.

Get in Touch

  • GitHub Issues β€” Bug reports, feature requests, pattern contributions
  • Email β€” security@fireclaw.app for responsible disclosure
  • Website β€” fireclaw.app

If you're interested in contributing or have questions, please open an issue or reach out. We're building this together.


License

FireClaw is licensed under the GNU Affero General Public License v3.0 (AGPLv3).

See LICENSE for the full text.

The community threat feed data is shared under separate dataset terms.

"FireClaw" is a trademark of Ralph Perez. See TRADEMARK.md for usage guidelines.


Security

Found a bypass or vulnerability? Please report responsibly:


FireClaw β€” Defend Your Agent. Protect Your Data. Join the Community.

πŸ›‘οΈ fireclaw.app

About

πŸ›‘οΈ A firewall for your agent's brain. Open-source security proxy that protects AI agents from prompt injection attacks. 4-stage pipeline, community threat feed, no bypass mode.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors