Shows where sensitive or untrusted data flows inside multi-agent AI workflows, and blocks it before it reaches unsafe tools.
Built for AI employees that read tickets, files, emails, MCP outputs, and dispatch sub-agents.
Status: pre-alpha. Design phase. No runnable code yet.
Package: taintctl on npm (and pypi, pending).
AI employees in production today read external content — tickets, emails, web pages, API responses, MCP server outputs — and act on it through tools. They also dispatch sub-agents to specialize on tasks they can't or shouldn't do themselves.
This creates two security gaps existing guardrails don't close:
- Untrusted data crosses agent boundaries silently. Agent A reads a customer email containing
.envcontents or a prompt-injection payload. Agent A dispatches a sub-agent with that content in its prompt. Sub-agent B has no signal that the data was untrusted or sensitive. It acts. - Sensitive data leaks downstream invisibly. Agent A reads an internal credentials file. Agent A passes the content (or a summary) to Agent B, which has network egress tools. The data leaves the building. No log explains why.
Existing tools don't fill this gap:
- Single-call guardrails (Lakera, NeMo, guardrails-ai) operate per LLM call. They don't propagate state across sub-agent dispatch boundaries.
- MCP scanners (mcp-scan, mcp-shield) operate on static tool descriptions, not runtime data flow.
- Config pinning (mcp-context-protector) catches drift, not flow.
- Agent platform approval prompts (Claude Code, Cursor) are syntactic allowlists. They don't reason about what the data means.
TaintCTL fills that gap by making data provenance a first-class concept across the entire multi-agent workflow.
| Tool | What it does | Cross-subagent provenance? | Multi-framework? | Live visualization? |
|---|---|---|---|---|
mcp-scan |
Static MCP description scanning | ❌ | ❌ | ❌ |
mcp-context-protector |
Trust-on-first-use config pinning | ❌ | ❌ | ❌ |
| Lakera Guard / NeMo / guardrails-ai | Single-call content classification | ❌ | ❌ | ❌ |
| Claude Code / Cursor permission systems | Syntactic allow/deny prompts | ❌ | ❌ | ❌ |
| TaintCTL | Cross-subagent taint ledger + content classification + one flow-graph UI | ✅ | ✅ (Hermes + LangGraph in v1) | ✅ (Stage 3) |
| Stage | Deliverable |
|---|---|
| 1 | Framework-agnostic core engine + Hermes Agent adapter + LangGraph adapter (parallel). Content classifiers, fail-closed policy, terminal UI. AgentDojo native baseline via LangGraph. |
| 2 | Cross-subagent provenance working in both adapters from a shared ledger — channel-a content fingerprints + channel-b in-context warnings. |
| 3 | One static-SPA flow-graph UI that talks to either adapter via the normalized event schema. 30-second screencast demoing Hermes + LangGraph back-to-back. |
| Framework | Status |
|---|---|
| Hermes Agent (Nous Research, 160K stars) | v1 (parallel ship) |
| LangGraph | v1 (parallel ship) |
| Claude Agent SDK (Python + TypeScript) | v1.1 |
| CrewAI, AutoGen | v1.1-1.2 |
| OpenClaw | v1.2 |
| Generic OpenAI-compatible chat completions | v2 |
Limitations (acknowledged, not hidden)
- v1 only handles verbatim taint flow with high precision. When an LLM paraphrases sensitive data, channel-a (sha256 fingerprint) breaks. Channel-b (system-prompt warnings to sub-agents) is a partial mitigation but its effectiveness is an empirical question, not a guarantee.
- v1 prompt-injection detection is pattern-based. Paraphrased prompt injections will be missed. Documented as known gap, not silently broken.
- Not a defense against a malicious parent agent. Standard guardrail assumption: the agent we sit inside is honest-but-naive, not adversary-controlled.
- Adapter coverage is what the framework exposes. If a framework hides state from plugins, we hide it too.
- AgentDojo prompt-injection-marker subset: Stage 1 gate is recall ≥ 0.65 (deterministic detector floor)
- InjecAgent: baseline numbers in CI on every PR
- Multi-agent scenarios: 8-12 in
benchmarks/multiagent/, derived from a fork ofdamn-vulnerable-MCP-server
MIT — see LICENSE
This is the author's second project in MCP/agent security. The first is
MCP-Security-Framework, which scans MCP servers
for vulnerable patterns. The two projects are complementary: MCP-Security-Framework is a
static scanner; taintctl is a runtime provenance layer.