Sandboxing an LLM agent's tool authority with Capa's type system #5

nelsonduarte · 2026-05-16T22:58:15Z

nelsonduarte
May 16, 2026
Maintainer

Just landed a worked example in v0.8.2-beta: a Capa program that runs a real Anthropic-model agent loop where the agent loop's declared capabilities provably exclude Net, Fs, Env, and Unsafe, even though a real model is in the loop deciding which tools to call.

The pattern is three pieces.

First, each tool is a user-defined capability:

capability SearchWeb { fun query(self, q: String) -> Result<String, IoError> }
capability SendEmail { fun send(self, to: String, body: String) -> Result<Unit, IoError> }
capability RunCode   { fun exec(self, src: String) -> Result<String, IoError> }

Second, the agent loop only takes the tools it is allowed to call:

fun agent_loop(stdio: Stdio, llm: LlmClient, search: SearchWeb) -> Result<Unit, IoError>

Third, the manifest emits the bound. agent_loop's declared_capabilities is [Stdio, LlmClient, SearchWeb], and the provably_excluded_capabilities field lists Net, Fs, Env, Unsafe, and every other built-in or user-defined capability the call graph does not reach.

Compared to allow-list approaches: allow-lists at the model boundary are advisory (the model may still be tricked into formatting a disallowed tool call). The Capa version is structural (the runtime simply has no RunCode value to invoke).

Code

Static demo: examples/llm_tool_sandbox.capa
Mock LLM + dispatch loop: examples/llm_agent_runner.capa
Real Anthropic, single-turn: examples/llm_anthropic_real.capa
Capstone (real model + Capa dispatch end-to-end): examples/llm_anthropic_agent.capa

Writeup with the comparison vs allow-lists and OS-level sandboxes: docs/llm-tool-sandbox.md.

Honest limits

The discipline addresses the authority axis, not the content axis. A SearchWeb-only agent can still be prompted into leaking conversation history through search.query. The mitigation for that is orthogonal (redaction, output filtering, user-confirmation pattern). What Capa enforces is the upper bound on the set of side-channels the agent can use at all, auditable from the SBOM rather than from a runtime sandbox's config.

Question for the room

Does the pattern generalise to other agent frameworks you've used? Anything that breaks if you swap models, or if the dispatch shape is not string-keyed?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sandboxing an LLM agent's tool authority with Capa's type system #5

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Sandboxing an LLM agent's tool authority with Capa's type system #5

Uh oh!

nelsonduarte May 16, 2026 Maintainer

Code

Honest limits

Question for the room

Replies: 0 comments

nelsonduarte
May 16, 2026
Maintainer