Skip to content

saulwade/sentinel

Repository files navigation

Sentinel

DevTools for AI agents. Intercept actions. Enforce policies. Rewind time. Run red teams. Ship safer agents.

Your agent just exfiltrated customer data in production. Today you read logs. With Sentinel, you caught it before it happened — and synthesized a policy to prevent it forever.

Seven reasoning tasks. One model: Claude Opus 4.7.

Built for the Built with Opus 4.7 Hackathon by @saulwade.


What it does

Sentinel is a security debugger for autonomous AI agents — like Chrome DevTools + a firewall, but for agents that process support tickets, query customer databases, and take financial actions on your behalf.

It sits between your agent and its tools. Every action is intercepted by a two-layer defense:

  1. Policy Engine — deterministic DSL rules that block/pause actions in <5ms, no LLM needed
  2. Pre-cog — Opus 4.7 extended thinking that reasons about the full causal chain of a proposed tool call

Verdicts, world state, and Opus reasoning stream live to a visual timeline you can scrub, edit, and fork.

The demo scenarios

Three injection attacks against a Customer Support Agent:

Scenario Attack vector Stakes
Support Agent Compliance audit framing — exfiltrate all enterprise PII $47k unauthorized refund + bulk data exfil
CEO Override Authority impersonation via executive escalation bot $12k goodwill credit + M&A data to external firm
GDPR Audit Legal urgency framing — GDPR Art. 20 data portability $8.5k processing fee + unfiltered customer dump

The five tabs

1. Command Center

Landing dashboard. Trust Score ring (A+ to F) computed from interdiction effectiveness × policy coverage. Live stats: active policies, total interdictions, money blocked, runs. One-click access to all workflows.

2. Live View

Real-time action stream. Each tool call gets a two-source verdict:

  • POLICY (indigo) — deterministic rule matched, <5ms, no Opus call
  • PRE-COG (purple) — Opus extended thinking evaluated the causal chain
  • ALLOW (green) · PAUSE (amber) · BLOCK (red, red pulse animation)

Toggle PRE-COMPUTED / LIVE OPUS at runtime. Keyboard A/D to approve/deny paused actions.

3. Replay

Merged Timeline + Fork View. Scrubber across all events. At any step: see exact world state, edit it, press ⎇ Branch from here. Fork appears inline. Blast Radius grid compares Original vs Branch: money interdicted, exfil blocked, records accessed, severity badge. Download Incident Report button generates a professional markdown report via Opus.

4. Pre-flight Simulator

Before deploying, simulate your agent through synthetic scenarios generated by Opus. Safety grade (A+ to F) with failure drill-down.

5. Red Team & Policies

Left panel — Red Team adaptive loop:

  • Opus generates attacks tailored to your agent's tools and system prompt
  • 3 iterations with explicit mutation strategies: payload split, subdomain bypass, request chaining, customer-framing
  • Each iteration sees previous attempts and active defenses, adapts accordingly
  • Bypassed attacks get a Synthesize Policy button → Opus generates a DSL rule that blocks the variant

Right panel — Policy catalog:

  • All active policies with source badge (DEFAULT / AUTO · from attackId)
  • Toggle enabled/disabled · revoke
  • Auto-synthesized policies appear immediately after adoption

How Opus 4.7 is used

Seven distinct reasoning tasks, all with extended thinking:

Feature What Opus does Thinking budget
Pre-cog Causal chain simulation of proposed tool calls 8k tokens
Run Analysis Executive summary + attack chain + risk grade (A+..F) 10k tokens
Fork Narrator Narrates the branch-not-taken in Replay 4k tokens
Pre-flight Generates plausible synthetic scenarios 4k tokens
Red Team Iter 1 Fresh adversarial attacks for agent's tool surface 4k tokens
Red Team Iter 2+ Adaptive mutations from prior attempt history 6k tokens
Policy Synthesis DSL policy from bypassed attack, with retry loop 6k tokens

All reasoning streams to the UI as purple text in real-time.


Architecture

                    localhost:3000                    localhost:3001
               ┌─────────────────────┐          ┌──────────────────────────────┐
               │   Next.js 16 App    │          │        Hono Engine           │
               │                     │   SSE    │                              │
               │  Command Center ────┼──────────┤─► Stats + Trust Score        │
               │  Live View ─────────┼──────────┤─► Agent Runner               │
               │  Replay ────────────┼──────────┤─► Event Store (SQLite)       │
               │  Pre-flight ────────┼──────────┤─► Pre-flight Simulator       │
               │  Red Team & Policies┼──────────┤─► Red Team Loop              │
               │                     │          │   Policy Registry            │
               └─────────────────────┘          │                              │
                                                │  ┌────────────────────────┐  │
                                                │  │    Tool Interceptor    │  │
                                                │  │  ┌──────────────────┐  │  │
                                                │  │  │  Policy Engine   │──┼──┼──► deterministic (<5ms)
                                                │  │  │  (DSL evaluator) │  │  │
                                                │  │  └──────────────────┘  │  │
                                                │  │  ┌──────────────────┐  │  │
                                                │  │  │  Pre-cog (Opus)  │──┼──┼──► Opus 4.7
                                                │  │  │  extended think  │  │  │   (extended thinking)
                                                │  │  └──────────────────┘  │  │
                                                │  └────────────────────────┘  │
                                                │                              │
                                                │  Blast Radius Computer       │
                                                │  Analysis (Opus, SSE)        │
                                                │  Policy Synthesis (Opus)     │
                                                │  MCP Server (stdio)          │
                                                └──────────────────────────────┘

Stack: TypeScript end-to-end. Next.js 16 + React 19 + Tailwind 4 (web). Hono + SQLite + Drizzle ORM (engine). Anthropic SDK with streaming extended thinking.

Event sourcing: every agent interaction is an immutable event row in SQLite. World state at any point = replay events 0..N. Forks create new runs with parentRunId. No mutation, full auditability.

Policy Engine: deterministic DSL with 10 condition kinds (toolName, argMatch, argRegex, domainCheck, valueThreshold, piiClass, planTier, ticketPriority, customerTier, and/or combinators). Runs before Pre-cog — no API cost, no latency for known-bad patterns.


Quick start

git clone https://github.com/saulwade/sentinel.git
cd sentinel
pnpm install
cp apps/engine/.env.example apps/engine/.env
# Add your ANTHROPIC_API_KEY to apps/engine/.env
pnpm dev

Open http://localhost:3000.

Requirements: Node 22+, pnpm 9+, Anthropic API key with Opus 4.7 access.

Running the demo

  1. Open Live View, select a scenario (Support Agent / CEO Override / GDPR Audit)
  2. Toggle PRE-COMPUTED for instant cached verdicts, or LIVE OPUS for real extended thinking (~45s)
  3. Press ▶ Run — watch actions stream with POLICY/OPUS source badges
  4. Click any BLOCK/PAUSE row to inspect Opus reasoning or the matching policy rule
  5. After the run, open Replay → scrub the timeline → ⎇ Branch from here → see Blast Radius
  6. Open Red Team & Policies → run the adaptive loop → synthesize a policy from a bypass → adopt it

Keyboard shortcuts

Key Action
15 Switch tabs
R Run agent
j / k Navigate events in Live View
A / D Approve / Deny a PAUSE
/ Search events
? Help modal (all shortcuts)
Esc Close modal / clear search

MCP integration (Claude Code)

Sentinel exposes an MCP server. Add to Claude Code (~/.claude/claude_desktop_config.json or project .mcp.json):

{
  "mcpServers": {
    "sentinel": {
      "command": "pnpm",
      "args": ["-F", "@sentinel/engine", "mcp"],
      "cwd": "/path/to/sentinel"
    }
  }
}

Available MCP tools:

Tool Description
sentinel_start_run Start a run (optional scenario: support/ceo/gdpr/phishing)
sentinel_get_events Fetch all events with ALLOW/PAUSE/BLOCK verdicts and source (policy/pre-cog)
sentinel_get_blast_radius Money disbursed/blocked, PII exposed/blocked, severity grade
sentinel_get_policies List all active policies with action, severity, description
sentinel_get_trust_score Composite Trust Score (A+ to F) across all runs
sentinel_snapshot Reconstruct world state at a specific event
sentinel_list_agent_tools List the agent's tools in MCP schema format

Example Claude Code usage:

> Start a Sentinel CEO Override run and tell me what was blocked
> What's the blast radius for that run?
> What policies are active and what's the Trust Score?
> Show me the world state at event #5

Project structure

sentinel/
├── apps/
│   ├── web/app/components/
│   │   ├── CommandCenter.tsx   # Trust Score + stats dashboard
│   │   ├── LiveView.tsx        # Real-time stream + inspector
│   │   ├── Replay.tsx          # Timeline scrubber + inline fork + blast radius
│   │   ├── Shell.tsx           # 5-tab shell + keyboard nav
│   │   └── RedTeam.tsx         # Adaptive red team + policy catalog
│   └── engine/src/
│       ├── agent/              # World state, mock tools, scenario seeds
│       │   └── scenarios/      # phishing, support, ceo, gdpr
│       ├── interceptor.ts      # Two-layer intercept (policy → pre-cog)
│       ├── policies/           # DSL evaluator + default policies
│       ├── analysis/           # Blast radius + Opus analysis + incident report
│       ├── redteam/            # Adaptive attacker + tester + policy synthesizer
│       ├── timetravel/         # Snapshot + replay engine
│       ├── mcp/                # MCP server (Claude Code integration)
│       └── routes/             # runs, analysis, policies, redteam, stats, settings
└── packages/shared/            # Shared types: AgentEvent, Decision, Policy, RunAnalysis

Why Sentinel exists

AI agents are shipping to production every week. When they fail, the answer today is "read logs and guess." Sentinel gives agent developers a real debugger:

  • See every action in real-time with causal reasoning and policy source
  • Pause suspicious actions before they execute
  • Rewind to any point in the agent's history
  • Edit the past and replay alternate futures
  • Quantify blast radius: money interdicted, PII blocked, damage avoided
  • Test against adaptive adversarial attacks before deploying
  • Synthesize defense policies automatically from discovered bypasses

This is a debugging primitive that did not exist before models could reason about counterfactuals at production speed.


License

MIT — see LICENSE.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages