Skip to content

JoKFA/TaintCTL

Repository files navigation

TaintCTL

Shows where sensitive or untrusted data flows inside multi-agent AI workflows, and blocks it before it reaches unsafe tools.

Built for AI employees that read tickets, files, emails, MCP outputs, and dispatch sub-agents.

Status: pre-alpha. Design phase. No runnable code yet. Package: taintctl on npm (and pypi, pending).


Why this exists

AI employees in production today read external content — tickets, emails, web pages, API responses, MCP server outputs — and act on it through tools. They also dispatch sub-agents to specialize on tasks they can't or shouldn't do themselves.

This creates two security gaps existing guardrails don't close:

  1. Untrusted data crosses agent boundaries silently. Agent A reads a customer email containing .env contents or a prompt-injection payload. Agent A dispatches a sub-agent with that content in its prompt. Sub-agent B has no signal that the data was untrusted or sensitive. It acts.
  2. Sensitive data leaks downstream invisibly. Agent A reads an internal credentials file. Agent A passes the content (or a summary) to Agent B, which has network egress tools. The data leaves the building. No log explains why.

Existing tools don't fill this gap:

  • Single-call guardrails (Lakera, NeMo, guardrails-ai) operate per LLM call. They don't propagate state across sub-agent dispatch boundaries.
  • MCP scanners (mcp-scan, mcp-shield) operate on static tool descriptions, not runtime data flow.
  • Config pinning (mcp-context-protector) catches drift, not flow.
  • Agent platform approval prompts (Claude Code, Cursor) are syntactic allowlists. They don't reason about what the data means.

TaintCTL fills that gap by making data provenance a first-class concept across the entire multi-agent workflow.

What's different

Tool What it does Cross-subagent provenance? Multi-framework? Live visualization?
mcp-scan Static MCP description scanning
mcp-context-protector Trust-on-first-use config pinning
Lakera Guard / NeMo / guardrails-ai Single-call content classification
Claude Code / Cursor permission systems Syntactic allow/deny prompts
TaintCTL Cross-subagent taint ledger + content classification + one flow-graph UI ✅ (Hermes + LangGraph in v1) ✅ (Stage 3)

Roadmap

Stage Deliverable
1 Framework-agnostic core engine + Hermes Agent adapter + LangGraph adapter (parallel). Content classifiers, fail-closed policy, terminal UI. AgentDojo native baseline via LangGraph.
2 Cross-subagent provenance working in both adapters from a shared ledger — channel-a content fingerprints + channel-b in-context warnings.
3 One static-SPA flow-graph UI that talks to either adapter via the normalized event schema. 30-second screencast demoing Hermes + LangGraph back-to-back.

Supported frameworks

Framework Status
Hermes Agent (Nous Research, 160K stars) v1 (parallel ship)
LangGraph v1 (parallel ship)
Claude Agent SDK (Python + TypeScript) v1.1
CrewAI, AutoGen v1.1-1.2
OpenClaw v1.2
Generic OpenAI-compatible chat completions v2

Limitations (acknowledged, not hidden)

  • v1 only handles verbatim taint flow with high precision. When an LLM paraphrases sensitive data, channel-a (sha256 fingerprint) breaks. Channel-b (system-prompt warnings to sub-agents) is a partial mitigation but its effectiveness is an empirical question, not a guarantee.
  • v1 prompt-injection detection is pattern-based. Paraphrased prompt injections will be missed. Documented as known gap, not silently broken.
  • Not a defense against a malicious parent agent. Standard guardrail assumption: the agent we sit inside is honest-but-naive, not adversary-controlled.
  • Adapter coverage is what the framework exposes. If a framework hides state from plugins, we hide it too.

Validation

  • AgentDojo prompt-injection-marker subset: Stage 1 gate is recall ≥ 0.65 (deterministic detector floor)
  • InjecAgent: baseline numbers in CI on every PR
  • Multi-agent scenarios: 8-12 in benchmarks/multiagent/, derived from a fork of damn-vulnerable-MCP-server

License

MIT — see LICENSE

Related work

This is the author's second project in MCP/agent security. The first is MCP-Security-Framework, which scans MCP servers for vulnerable patterns. The two projects are complementary: MCP-Security-Framework is a static scanner; taintctl is a runtime provenance layer.

About

TaintCTL shows where sensitive or untrusted data flows inside multi-agent AI workflows, and blocks it before it reaches unsafe tools

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors