Skip to content

Add Agent Host Protocol threat model#88

Draft
rwoll wants to merge 2 commits intomainfrom
add-threat-model
Draft

Add Agent Host Protocol threat model#88
rwoll wants to merge 2 commits intomainfrom
add-threat-model

Conversation

@rwoll
Copy link
Copy Markdown
Member

@rwoll rwoll commented Apr 27, 2026

Summary

  • adds a concise THREAT_MODEL.md for Agent Host Protocol implementations
  • frames the desired untrusted mode goal for remote/SSH/tunnel usage
  • documents core risks around hostile peers, token forwarding, client tools, resource APIs, terminals, plugins, packages, and multi-client ownership

Testing

  • documentation-only change

rwoll and others added 2 commits April 27, 2026 14:44
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@rwoll
Copy link
Copy Markdown
Member Author

rwoll commented Apr 27, 2026

@connor4312 @roblourens - here's my first draft. for the purposes of MSRC, it's important we are all on the same page on accepted risks. so please review and comment if you see anything factually incorrect, or that we want to fix. longer term, I do think it's critical we have an "untrusted" mode. Some of these risks (e.g. forwarding a credential to a client) will likely need UX treatments as well as public docs for education.

Copy link
Copy Markdown
Member

@roblourens roblourens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me, thanks for digging into it

Comment thread THREAT_MODEL.md

## Current safety status

The current protocol **is not safe by itself for untrusted remote use**. AHP defines the message shapes and state flows, but it does not itself establish peer identity, authorize capabilities, constrain token forwarding, sandbox resource access, sanitize rendered content, or make client-contributed tools safe to invoke. Implementations must add those controls before treating a remote host or client as safe.
Copy link
Copy Markdown
Member

@connor4312 connor4312 Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually say the opposite. The protocol is just RPC and it doesn't have an "execute locally" command. A vanilla implementation will be 'safe' by default unless a client gives it e.g. local tools it could use to escape.

Maybe it's the wording "the current protocol" -- there are no protocol changes we would make that would make it safer than it is now.

We might add things like sandboxing in the future but this is an optional implementation, it doesn't make the protocol inherently more or less safe.

Comment thread THREAT_MODEL.md
| **Hostile server content compromises the client** | A remote host sends malicious Markdown, HTML/SVG, terminal escapes, links, schema text, diffs, or content references that exploit or trick the client. | Render all peer-provided content as untrusted. Sanitize Markdown, block command links and unsafe URI schemes, constrain SVG/images, neutralize dangerous terminal sequences, and enforce size limits. |
| **Token or secret exfiltration** | The host advertises `protectedResources`; the client obtains a GitHub/Azure/etc. token and sends it via `authenticate` to a compromised host. | Require authenticated transport, server identity, and explicit per-host/per-resource consent before token delivery. Use scoped, short-lived, audience-bound tokens. Never send refresh tokens or log bearer tokens. |
| **Client-contributed tool abuse** | The active client contributes tools; the server starts a tool call with `toolClientId` and attacker-controlled input that causes the client to run tasks, inspect local context, or return sensitive output. | Client tools must be explicit, allowlisted, capability-scoped, and locally authorized. Treat server `confirmed: "not-needed"` as advisory, not as client approval. Confirm tools that read/write local data, run commands, open URLs, or use credentials. |
| **Server-side workspace compromise by malicious clients** | An unauthorized client invokes `resourceWrite`, `resourceDelete`, terminal creation/input, tool confirmations, or config/customization actions against the host. | Authenticate and authorize each client independently. Bind `clientId` to the transport identity. Canonicalize resource URIs, sandbox filesystem access, gate terminal operations, and audit privileged actions. |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is the converse as well -- resource commands are bidirectional. Clients should be intentional about the URIs it allows the server to interop with, such as by restricting them to only paths/directories the clients has already announced in a ContentRef

Comment thread THREAT_MODEL.md
| **Token or secret exfiltration** | The host advertises `protectedResources`; the client obtains a GitHub/Azure/etc. token and sends it via `authenticate` to a compromised host. | Require authenticated transport, server identity, and explicit per-host/per-resource consent before token delivery. Use scoped, short-lived, audience-bound tokens. Never send refresh tokens or log bearer tokens. |
| **Client-contributed tool abuse** | The active client contributes tools; the server starts a tool call with `toolClientId` and attacker-controlled input that causes the client to run tasks, inspect local context, or return sensitive output. | Client tools must be explicit, allowlisted, capability-scoped, and locally authorized. Treat server `confirmed: "not-needed"` as advisory, not as client approval. Confirm tools that read/write local data, run commands, open URLs, or use credentials. |
| **Server-side workspace compromise by malicious clients** | An unauthorized client invokes `resourceWrite`, `resourceDelete`, terminal creation/input, tool confirmations, or config/customization actions against the host. | Authenticate and authorize each client independently. Bind `clientId` to the transport identity. Canonicalize resource URIs, sandbox filesystem access, gate terminal operations, and audit privileged actions. |
| **WebSocket or tunnel exposure** | An agent host listens on a reachable interface; a browser, local malware, or network attacker connects and sends AHP commands. | Bind loopback by default, require explicit opt-in for network exposure, authenticate during WebSocket upgrade, use `wss` for remote connections, enforce Origin checks for browser-reachable endpoints, and rate-limit. |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Origin checks for browser-reachable endpoints, and rate-limit" is not sufficient to protect loopbacks. Loopbacks should require a randomized/secure token in their URIs to make them unguessable

Comment thread THREAT_MODEL.md
| **Client-contributed tool abuse** | The active client contributes tools; the server starts a tool call with `toolClientId` and attacker-controlled input that causes the client to run tasks, inspect local context, or return sensitive output. | Client tools must be explicit, allowlisted, capability-scoped, and locally authorized. Treat server `confirmed: "not-needed"` as advisory, not as client approval. Confirm tools that read/write local data, run commands, open URLs, or use credentials. |
| **Server-side workspace compromise by malicious clients** | An unauthorized client invokes `resourceWrite`, `resourceDelete`, terminal creation/input, tool confirmations, or config/customization actions against the host. | Authenticate and authorize each client independently. Bind `clientId` to the transport identity. Canonicalize resource URIs, sandbox filesystem access, gate terminal operations, and audit privileged actions. |
| **WebSocket or tunnel exposure** | An agent host listens on a reachable interface; a browser, local malware, or network attacker connects and sends AHP commands. | Bind loopback by default, require explicit opt-in for network exposure, authenticate during WebSocket upgrade, use `wss` for remote connections, enforce Origin checks for browser-reachable endpoints, and rate-limit. |
| **Multi-client confusion** | One client races another to claim active-client status, approve a tool call, complete a client tool, or write terminal input. | Authorize every action against current server state and authenticated connection identity. Scope approvals, terminal claims, and tool completion to the owning client. Reject stale or replayed decisions. |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scope approvals, terminal claims, and tool completion to the owning client

Multi-client support is the goal of AHP in the first place. I would just drop this row.

Comment thread THREAT_MODEL.md
| **Server-side workspace compromise by malicious clients** | An unauthorized client invokes `resourceWrite`, `resourceDelete`, terminal creation/input, tool confirmations, or config/customization actions against the host. | Authenticate and authorize each client independently. Bind `clientId` to the transport identity. Canonicalize resource URIs, sandbox filesystem access, gate terminal operations, and audit privileged actions. |
| **WebSocket or tunnel exposure** | An agent host listens on a reachable interface; a browser, local malware, or network attacker connects and sends AHP commands. | Bind loopback by default, require explicit opt-in for network exposure, authenticate during WebSocket upgrade, use `wss` for remote connections, enforce Origin checks for browser-reachable endpoints, and rate-limit. |
| **Multi-client confusion** | One client races another to claim active-client status, approve a tool call, complete a client tool, or write terminal input. | Authorize every action against current server state and authenticated connection identity. Scope approvals, terminal claims, and tool completion to the owning client. Reject stale or replayed decisions. |
| **Plugin, customization, and package supply-chain execution** | A host loads a remote customization or a YOLO agent runs `npm install` / `pip install`; install scripts or plugin code read secrets and exfiltrate them. | Treat plugin loading and package installation as code execution. Require provenance, signatures or allowlists, sandboxing, restricted environment secrets, and egress monitoring. |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Require provenance, signatures or allowlists, sandboxing, restricted environment secrets, and egress monitoring

That's quite a high bar. I can't imagine many agent hosts will do this. We don't do any of this for plugins in VS Code today.

Comment thread THREAT_MODEL.md
| **WebSocket or tunnel exposure** | An agent host listens on a reachable interface; a browser, local malware, or network attacker connects and sends AHP commands. | Bind loopback by default, require explicit opt-in for network exposure, authenticate during WebSocket upgrade, use `wss` for remote connections, enforce Origin checks for browser-reachable endpoints, and rate-limit. |
| **Multi-client confusion** | One client races another to claim active-client status, approve a tool call, complete a client tool, or write terminal input. | Authorize every action against current server state and authenticated connection identity. Scope approvals, terminal claims, and tool completion to the owning client. Reject stale or replayed decisions. |
| **Plugin, customization, and package supply-chain execution** | A host loads a remote customization or a YOLO agent runs `npm install` / `pip install`; install scripts or plugin code read secrets and exfiltrate them. | Treat plugin loading and package installation as code execution. Require provenance, signatures or allowlists, sandboxing, restricted environment secrets, and egress monitoring. |
| **Denial of service and privacy leakage** | A peer sends huge JSON frames, deep state snapshots, large resources, unbounded terminal output, or logs prompts/tokens/file contents. | Enforce message, resource, history, subscription, and terminal scrollback limits. Apply backpressure and rate limits. Redact tokens, prompts, file contents, terminal output, and secrets from logs/telemetry. |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A peer is already going to be authenticated. Standard best practice for resource management is good but I don't think this is a big deal. A client isn't authoritative on the state so I'm unsure what "Enforce message, resource, history, subscription, and terminal scrollback limits" would mean in that context

Comment thread THREAT_MODEL.md

## Minimum requirements for implementations

1. **Authenticate before protocol use.** Remote transports must authenticate peers before `initialize`; `clientId` must be bound to the authenticated connection.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't do this. We let people stand up agent hosts and connect to them via websocket -- I have an agent host like this on my home server that I connect to from my devices. I don't tihnk we'd gain anything by removing this capability

Comment thread THREAT_MODEL.md
3. **Make trust local.** Clients and servers must make authorization decisions in their own policy layer, not from peer-provided text, labels, or confirmation flags.
4. **Gate token delivery.** Sending OAuth/Bearer tokens to an agent host requires explicit consent and must be scoped to the intended resource and host.
5. **Constrain client tools.** Do not expose powerful client tools to untrusted hosts by default. When enabled, require local allowlists, argument validation, and user confirmation for sensitive operations.
6. **Constrain resource and terminal APIs.** Servers must sandbox filesystem access and terminal operations; clients that serve local resources must enforce their own scheme/path policy.
Copy link
Copy Markdown
Member

@connor4312 connor4312 Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Servers must sandbox filesystem access and terminal operations

This requirement currently cannot be fulfilled on Windows, and even on other OS' I think is too heavy to have as a "minimum requirement". Sandboxing is complex.

Comment thread THREAT_MODEL.md
6. **Constrain resource and terminal APIs.** Servers must sandbox filesystem access and terminal operations; clients that serve local resources must enforce their own scheme/path policy.
7. **Handle multi-client ownership.** Active-client state, tool calls, terminal claims, input requests, and approvals must be tied to the owning authenticated connection.
8. **Treat plugins and packages as executable.** Customizations, MCP servers, package installs, hooks, and skills require supply-chain policy and sandboxing.
9. **Validate and bound the protocol.** Use schema validation, fail closed on invalid messages, and enforce limits on frame size, JSON depth, resource size, subscriptions, replay history, and terminal output.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rwoll
Copy link
Copy Markdown
Member Author

rwoll commented Apr 29, 2026

I'm going to close this and simplify including some minimal examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants