Add Agent Host Protocol threat model by rwoll · Pull Request #88 · microsoft/agent-host-protocol

rwoll · 2026-04-27T21:44:43Z

Summary

adds a concise THREAT_MODEL.md for Agent Host Protocol implementations
frames the desired untrusted mode goal for remote/SSH/tunnel usage
documents core risks around hostile peers, token forwarding, client tools, resource APIs, terminals, plugins, packages, and multi-client ownership

Testing

documentation-only change

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

rwoll · 2026-04-27T21:52:50Z

@connor4312 @roblourens - here's my first draft. for the purposes of MSRC, it's important we are all on the same page on accepted risks. so please review and comment if you see anything factually incorrect, or that we want to fix. longer term, I do think it's critical we have an "untrusted" mode. Some of these risks (e.g. forwarding a credential to a client) will likely need UX treatments as well as public docs for education.

roblourens

This makes sense to me, thanks for digging into it

connor4312 · 2026-04-28T16:01:46Z

+
+## Current safety status
+
+The current protocol **is not safe by itself for untrusted remote use**. AHP defines the message shapes and state flows, but it does not itself establish peer identity, authorize capabilities, constrain token forwarding, sandbox resource access, sanitize rendered content, or make client-contributed tools safe to invoke. Implementations must add those controls before treating a remote host or client as safe.


I would actually say the opposite. The protocol is just RPC and it doesn't have an "execute locally" command. A vanilla implementation will be 'safe' by default unless a client gives it e.g. local tools it could use to escape.

Maybe it's the wording "the current protocol" -- there are no protocol changes we would make that would make it safer than it is now.

We might add things like sandboxing in the future but this is an optional implementation, it doesn't make the protocol inherently more or less safe.

connor4312 · 2026-04-28T16:12:36Z

+| **Hostile server content compromises the client** | A remote host sends malicious Markdown, HTML/SVG, terminal escapes, links, schema text, diffs, or content references that exploit or trick the client. | Render all peer-provided content as untrusted. Sanitize Markdown, block command links and unsafe URI schemes, constrain SVG/images, neutralize dangerous terminal sequences, and enforce size limits. |
+| **Token or secret exfiltration** | The host advertises `protectedResources`; the client obtains a GitHub/Azure/etc. token and sends it via `authenticate` to a compromised host. | Require authenticated transport, server identity, and explicit per-host/per-resource consent before token delivery. Use scoped, short-lived, audience-bound tokens. Never send refresh tokens or log bearer tokens. |
+| **Client-contributed tool abuse** | The active client contributes tools; the server starts a tool call with `toolClientId` and attacker-controlled input that causes the client to run tasks, inspect local context, or return sensitive output. | Client tools must be explicit, allowlisted, capability-scoped, and locally authorized. Treat server `confirmed: "not-needed"` as advisory, not as client approval. Confirm tools that read/write local data, run commands, open URLs, or use credentials. |
+| **Server-side workspace compromise by malicious clients** | An unauthorized client invokes `resourceWrite`, `resourceDelete`, terminal creation/input, tool confirmations, or config/customization actions against the host. | Authenticate and authorize each client independently. Bind `clientId` to the transport identity. Canonicalize resource URIs, sandbox filesystem access, gate terminal operations, and audit privileged actions. |


There is the converse as well -- resource commands are bidirectional. Clients should be intentional about the URIs it allows the server to interop with, such as by restricting them to only paths/directories the clients has already announced in a ContentRef

connor4312 · 2026-04-28T16:13:37Z

+| **Token or secret exfiltration** | The host advertises `protectedResources`; the client obtains a GitHub/Azure/etc. token and sends it via `authenticate` to a compromised host. | Require authenticated transport, server identity, and explicit per-host/per-resource consent before token delivery. Use scoped, short-lived, audience-bound tokens. Never send refresh tokens or log bearer tokens. |
+| **Client-contributed tool abuse** | The active client contributes tools; the server starts a tool call with `toolClientId` and attacker-controlled input that causes the client to run tasks, inspect local context, or return sensitive output. | Client tools must be explicit, allowlisted, capability-scoped, and locally authorized. Treat server `confirmed: "not-needed"` as advisory, not as client approval. Confirm tools that read/write local data, run commands, open URLs, or use credentials. |
+| **Server-side workspace compromise by malicious clients** | An unauthorized client invokes `resourceWrite`, `resourceDelete`, terminal creation/input, tool confirmations, or config/customization actions against the host. | Authenticate and authorize each client independently. Bind `clientId` to the transport identity. Canonicalize resource URIs, sandbox filesystem access, gate terminal operations, and audit privileged actions. |
+| **WebSocket or tunnel exposure** | An agent host listens on a reachable interface; a browser, local malware, or network attacker connects and sends AHP commands. | Bind loopback by default, require explicit opt-in for network exposure, authenticate during WebSocket upgrade, use `wss` for remote connections, enforce Origin checks for browser-reachable endpoints, and rate-limit. |


"Origin checks for browser-reachable endpoints, and rate-limit" is not sufficient to protect loopbacks. Loopbacks should require a randomized/secure token in their URIs to make them unguessable

connor4312 · 2026-04-28T16:14:41Z

+| **Client-contributed tool abuse** | The active client contributes tools; the server starts a tool call with `toolClientId` and attacker-controlled input that causes the client to run tasks, inspect local context, or return sensitive output. | Client tools must be explicit, allowlisted, capability-scoped, and locally authorized. Treat server `confirmed: "not-needed"` as advisory, not as client approval. Confirm tools that read/write local data, run commands, open URLs, or use credentials. |
+| **Server-side workspace compromise by malicious clients** | An unauthorized client invokes `resourceWrite`, `resourceDelete`, terminal creation/input, tool confirmations, or config/customization actions against the host. | Authenticate and authorize each client independently. Bind `clientId` to the transport identity. Canonicalize resource URIs, sandbox filesystem access, gate terminal operations, and audit privileged actions. |
+| **WebSocket or tunnel exposure** | An agent host listens on a reachable interface; a browser, local malware, or network attacker connects and sends AHP commands. | Bind loopback by default, require explicit opt-in for network exposure, authenticate during WebSocket upgrade, use `wss` for remote connections, enforce Origin checks for browser-reachable endpoints, and rate-limit. |
+| **Multi-client confusion** | One client races another to claim active-client status, approve a tool call, complete a client tool, or write terminal input. | Authorize every action against current server state and authenticated connection identity. Scope approvals, terminal claims, and tool completion to the owning client. Reject stale or replayed decisions. |


Scope approvals, terminal claims, and tool completion to the owning client

Multi-client support is the goal of AHP in the first place. I would just drop this row.

connor4312 · 2026-04-28T16:16:41Z

+| **Server-side workspace compromise by malicious clients** | An unauthorized client invokes `resourceWrite`, `resourceDelete`, terminal creation/input, tool confirmations, or config/customization actions against the host. | Authenticate and authorize each client independently. Bind `clientId` to the transport identity. Canonicalize resource URIs, sandbox filesystem access, gate terminal operations, and audit privileged actions. |
+| **WebSocket or tunnel exposure** | An agent host listens on a reachable interface; a browser, local malware, or network attacker connects and sends AHP commands. | Bind loopback by default, require explicit opt-in for network exposure, authenticate during WebSocket upgrade, use `wss` for remote connections, enforce Origin checks for browser-reachable endpoints, and rate-limit. |
+| **Multi-client confusion** | One client races another to claim active-client status, approve a tool call, complete a client tool, or write terminal input. | Authorize every action against current server state and authenticated connection identity. Scope approvals, terminal claims, and tool completion to the owning client. Reject stale or replayed decisions. |
+| **Plugin, customization, and package supply-chain execution** | A host loads a remote customization or a YOLO agent runs `npm install` / `pip install`; install scripts or plugin code read secrets and exfiltrate them. | Treat plugin loading and package installation as code execution. Require provenance, signatures or allowlists, sandboxing, restricted environment secrets, and egress monitoring. |


Require provenance, signatures or allowlists, sandboxing, restricted environment secrets, and egress monitoring

That's quite a high bar. I can't imagine many agent hosts will do this. We don't do any of this for plugins in VS Code today.

connor4312 · 2026-04-28T16:19:12Z

+| **WebSocket or tunnel exposure** | An agent host listens on a reachable interface; a browser, local malware, or network attacker connects and sends AHP commands. | Bind loopback by default, require explicit opt-in for network exposure, authenticate during WebSocket upgrade, use `wss` for remote connections, enforce Origin checks for browser-reachable endpoints, and rate-limit. |
+| **Multi-client confusion** | One client races another to claim active-client status, approve a tool call, complete a client tool, or write terminal input. | Authorize every action against current server state and authenticated connection identity. Scope approvals, terminal claims, and tool completion to the owning client. Reject stale or replayed decisions. |
+| **Plugin, customization, and package supply-chain execution** | A host loads a remote customization or a YOLO agent runs `npm install` / `pip install`; install scripts or plugin code read secrets and exfiltrate them. | Treat plugin loading and package installation as code execution. Require provenance, signatures or allowlists, sandboxing, restricted environment secrets, and egress monitoring. |
+| **Denial of service and privacy leakage** | A peer sends huge JSON frames, deep state snapshots, large resources, unbounded terminal output, or logs prompts/tokens/file contents. | Enforce message, resource, history, subscription, and terminal scrollback limits. Apply backpressure and rate limits. Redact tokens, prompts, file contents, terminal output, and secrets from logs/telemetry. |


A peer is already going to be authenticated. Standard best practice for resource management is good but I don't think this is a big deal. A client isn't authoritative on the state so I'm unsure what "Enforce message, resource, history, subscription, and terminal scrollback limits" would mean in that context

connor4312 · 2026-04-28T16:20:58Z

+
+## Minimum requirements for implementations
+
+1. **Authenticate before protocol use.** Remote transports must authenticate peers before `initialize`; `clientId` must be bound to the authenticated connection.


We don't do this. We let people stand up agent hosts and connect to them via websocket -- I have an agent host like this on my home server that I connect to from my devices. I don't tihnk we'd gain anything by removing this capability

connor4312 · 2026-04-28T16:21:46Z

+3. **Make trust local.** Clients and servers must make authorization decisions in their own policy layer, not from peer-provided text, labels, or confirmation flags.
+4. **Gate token delivery.** Sending OAuth/Bearer tokens to an agent host requires explicit consent and must be scoped to the intended resource and host.
+5. **Constrain client tools.** Do not expose powerful client tools to untrusted hosts by default. When enabled, require local allowlists, argument validation, and user confirmation for sensitive operations.
+6. **Constrain resource and terminal APIs.** Servers must sandbox filesystem access and terminal operations; clients that serve local resources must enforce their own scheme/path policy.


Servers must sandbox filesystem access and terminal operations

This requirement currently cannot be fulfilled on Windows, and even on other OS' I think is too heavy to have as a "minimum requirement". Sandboxing is complex.

connor4312 · 2026-04-28T16:23:22Z

+6. **Constrain resource and terminal APIs.** Servers must sandbox filesystem access and terminal operations; clients that serve local resources must enforce their own scheme/path policy.
+7. **Handle multi-client ownership.** Active-client state, tool calls, terminal claims, input requests, and approvals must be tied to the owning authenticated connection.
+8. **Treat plugins and packages as executable.** Customizations, MCP servers, package installs, hooks, and skills require supply-chain policy and sandboxing.
+9. **Validate and bound the protocol.** Use schema validation, fail closed on invalid messages, and enforce limits on frame size, JSON depth, resource size, subscriptions, replay history, and terminal output.


Similar to above https://github.com/microsoft/agent-host-protocol/pull/88/changes#r3155649734

rwoll · 2026-04-29T19:27:34Z

I'm going to close this and simplify including some minimal examples.

rwoll and others added 2 commits April 27, 2026 14:44

Add Agent Host Protocol threat model

fcb2cf4

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Clarify current AHP safety posture

3e275ff

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

roblourens approved these changes Apr 27, 2026

View reviewed changes

connor4312 reviewed Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Agent Host Protocol threat model#88

Add Agent Host Protocol threat model#88
rwoll wants to merge 2 commits intomainfrom
add-threat-model

rwoll commented Apr 27, 2026

Uh oh!

rwoll commented Apr 27, 2026

Uh oh!

roblourens left a comment

Uh oh!

connor4312 Apr 28, 2026 •

edited

Loading

Uh oh!

connor4312 Apr 28, 2026

Uh oh!

connor4312 Apr 28, 2026

Uh oh!

connor4312 Apr 28, 2026

Uh oh!

connor4312 Apr 28, 2026

Uh oh!

connor4312 Apr 28, 2026

Uh oh!

connor4312 Apr 28, 2026

Uh oh!

connor4312 Apr 28, 2026 •

edited

Loading

Uh oh!

connor4312 Apr 28, 2026

Uh oh!

rwoll commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		## Current safety status

		The current protocol is not safe by itself for untrusted remote use. AHP defines the message shapes and state flows, but it does not itself establish peer identity, authorize capabilities, constrain token forwarding, sandbox resource access, sanitize rendered content, or make client-contributed tools safe to invoke. Implementations must add those controls before treating a remote host or client as safe.


		## Minimum requirements for implementations

		1. Authenticate before protocol use. Remote transports must authenticate peers before `initialize`; `clientId` must be bound to the authenticated connection.

Conversation

rwoll commented Apr 27, 2026

Summary

Testing

Uh oh!

rwoll commented Apr 27, 2026

Uh oh!

roblourens left a comment

Choose a reason for hiding this comment

Uh oh!

connor4312 Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

connor4312 Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

connor4312 Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

connor4312 Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

connor4312 Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

connor4312 Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

connor4312 Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

connor4312 Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

connor4312 Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

rwoll commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

connor4312 Apr 28, 2026 •

edited

Loading

connor4312 Apr 28, 2026 •

edited

Loading