Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@connor4312 @roblourens - here's my first draft. for the purposes of MSRC, it's important we are all on the same page on accepted risks. so please review and comment if you see anything factually incorrect, or that we want to fix. longer term, I do think it's critical we have an "untrusted" mode. Some of these risks (e.g. forwarding a credential to a client) will likely need UX treatments as well as public docs for education. |
roblourens
left a comment
There was a problem hiding this comment.
This makes sense to me, thanks for digging into it
|
|
||
| ## Current safety status | ||
|
|
||
| The current protocol **is not safe by itself for untrusted remote use**. AHP defines the message shapes and state flows, but it does not itself establish peer identity, authorize capabilities, constrain token forwarding, sandbox resource access, sanitize rendered content, or make client-contributed tools safe to invoke. Implementations must add those controls before treating a remote host or client as safe. |
There was a problem hiding this comment.
I would actually say the opposite. The protocol is just RPC and it doesn't have an "execute locally" command. A vanilla implementation will be 'safe' by default unless a client gives it e.g. local tools it could use to escape.
Maybe it's the wording "the current protocol" -- there are no protocol changes we would make that would make it safer than it is now.
We might add things like sandboxing in the future but this is an optional implementation, it doesn't make the protocol inherently more or less safe.
| | **Hostile server content compromises the client** | A remote host sends malicious Markdown, HTML/SVG, terminal escapes, links, schema text, diffs, or content references that exploit or trick the client. | Render all peer-provided content as untrusted. Sanitize Markdown, block command links and unsafe URI schemes, constrain SVG/images, neutralize dangerous terminal sequences, and enforce size limits. | | ||
| | **Token or secret exfiltration** | The host advertises `protectedResources`; the client obtains a GitHub/Azure/etc. token and sends it via `authenticate` to a compromised host. | Require authenticated transport, server identity, and explicit per-host/per-resource consent before token delivery. Use scoped, short-lived, audience-bound tokens. Never send refresh tokens or log bearer tokens. | | ||
| | **Client-contributed tool abuse** | The active client contributes tools; the server starts a tool call with `toolClientId` and attacker-controlled input that causes the client to run tasks, inspect local context, or return sensitive output. | Client tools must be explicit, allowlisted, capability-scoped, and locally authorized. Treat server `confirmed: "not-needed"` as advisory, not as client approval. Confirm tools that read/write local data, run commands, open URLs, or use credentials. | | ||
| | **Server-side workspace compromise by malicious clients** | An unauthorized client invokes `resourceWrite`, `resourceDelete`, terminal creation/input, tool confirmations, or config/customization actions against the host. | Authenticate and authorize each client independently. Bind `clientId` to the transport identity. Canonicalize resource URIs, sandbox filesystem access, gate terminal operations, and audit privileged actions. | |
There was a problem hiding this comment.
There is the converse as well -- resource commands are bidirectional. Clients should be intentional about the URIs it allows the server to interop with, such as by restricting them to only paths/directories the clients has already announced in a ContentRef
| | **Token or secret exfiltration** | The host advertises `protectedResources`; the client obtains a GitHub/Azure/etc. token and sends it via `authenticate` to a compromised host. | Require authenticated transport, server identity, and explicit per-host/per-resource consent before token delivery. Use scoped, short-lived, audience-bound tokens. Never send refresh tokens or log bearer tokens. | | ||
| | **Client-contributed tool abuse** | The active client contributes tools; the server starts a tool call with `toolClientId` and attacker-controlled input that causes the client to run tasks, inspect local context, or return sensitive output. | Client tools must be explicit, allowlisted, capability-scoped, and locally authorized. Treat server `confirmed: "not-needed"` as advisory, not as client approval. Confirm tools that read/write local data, run commands, open URLs, or use credentials. | | ||
| | **Server-side workspace compromise by malicious clients** | An unauthorized client invokes `resourceWrite`, `resourceDelete`, terminal creation/input, tool confirmations, or config/customization actions against the host. | Authenticate and authorize each client independently. Bind `clientId` to the transport identity. Canonicalize resource URIs, sandbox filesystem access, gate terminal operations, and audit privileged actions. | | ||
| | **WebSocket or tunnel exposure** | An agent host listens on a reachable interface; a browser, local malware, or network attacker connects and sends AHP commands. | Bind loopback by default, require explicit opt-in for network exposure, authenticate during WebSocket upgrade, use `wss` for remote connections, enforce Origin checks for browser-reachable endpoints, and rate-limit. | |
There was a problem hiding this comment.
"Origin checks for browser-reachable endpoints, and rate-limit" is not sufficient to protect loopbacks. Loopbacks should require a randomized/secure token in their URIs to make them unguessable
| | **Client-contributed tool abuse** | The active client contributes tools; the server starts a tool call with `toolClientId` and attacker-controlled input that causes the client to run tasks, inspect local context, or return sensitive output. | Client tools must be explicit, allowlisted, capability-scoped, and locally authorized. Treat server `confirmed: "not-needed"` as advisory, not as client approval. Confirm tools that read/write local data, run commands, open URLs, or use credentials. | | ||
| | **Server-side workspace compromise by malicious clients** | An unauthorized client invokes `resourceWrite`, `resourceDelete`, terminal creation/input, tool confirmations, or config/customization actions against the host. | Authenticate and authorize each client independently. Bind `clientId` to the transport identity. Canonicalize resource URIs, sandbox filesystem access, gate terminal operations, and audit privileged actions. | | ||
| | **WebSocket or tunnel exposure** | An agent host listens on a reachable interface; a browser, local malware, or network attacker connects and sends AHP commands. | Bind loopback by default, require explicit opt-in for network exposure, authenticate during WebSocket upgrade, use `wss` for remote connections, enforce Origin checks for browser-reachable endpoints, and rate-limit. | | ||
| | **Multi-client confusion** | One client races another to claim active-client status, approve a tool call, complete a client tool, or write terminal input. | Authorize every action against current server state and authenticated connection identity. Scope approvals, terminal claims, and tool completion to the owning client. Reject stale or replayed decisions. | |
There was a problem hiding this comment.
Scope approvals, terminal claims, and tool completion to the owning client
Multi-client support is the goal of AHP in the first place. I would just drop this row.
| | **Server-side workspace compromise by malicious clients** | An unauthorized client invokes `resourceWrite`, `resourceDelete`, terminal creation/input, tool confirmations, or config/customization actions against the host. | Authenticate and authorize each client independently. Bind `clientId` to the transport identity. Canonicalize resource URIs, sandbox filesystem access, gate terminal operations, and audit privileged actions. | | ||
| | **WebSocket or tunnel exposure** | An agent host listens on a reachable interface; a browser, local malware, or network attacker connects and sends AHP commands. | Bind loopback by default, require explicit opt-in for network exposure, authenticate during WebSocket upgrade, use `wss` for remote connections, enforce Origin checks for browser-reachable endpoints, and rate-limit. | | ||
| | **Multi-client confusion** | One client races another to claim active-client status, approve a tool call, complete a client tool, or write terminal input. | Authorize every action against current server state and authenticated connection identity. Scope approvals, terminal claims, and tool completion to the owning client. Reject stale or replayed decisions. | | ||
| | **Plugin, customization, and package supply-chain execution** | A host loads a remote customization or a YOLO agent runs `npm install` / `pip install`; install scripts or plugin code read secrets and exfiltrate them. | Treat plugin loading and package installation as code execution. Require provenance, signatures or allowlists, sandboxing, restricted environment secrets, and egress monitoring. | |
There was a problem hiding this comment.
Require provenance, signatures or allowlists, sandboxing, restricted environment secrets, and egress monitoring
That's quite a high bar. I can't imagine many agent hosts will do this. We don't do any of this for plugins in VS Code today.
| | **WebSocket or tunnel exposure** | An agent host listens on a reachable interface; a browser, local malware, or network attacker connects and sends AHP commands. | Bind loopback by default, require explicit opt-in for network exposure, authenticate during WebSocket upgrade, use `wss` for remote connections, enforce Origin checks for browser-reachable endpoints, and rate-limit. | | ||
| | **Multi-client confusion** | One client races another to claim active-client status, approve a tool call, complete a client tool, or write terminal input. | Authorize every action against current server state and authenticated connection identity. Scope approvals, terminal claims, and tool completion to the owning client. Reject stale or replayed decisions. | | ||
| | **Plugin, customization, and package supply-chain execution** | A host loads a remote customization or a YOLO agent runs `npm install` / `pip install`; install scripts or plugin code read secrets and exfiltrate them. | Treat plugin loading and package installation as code execution. Require provenance, signatures or allowlists, sandboxing, restricted environment secrets, and egress monitoring. | | ||
| | **Denial of service and privacy leakage** | A peer sends huge JSON frames, deep state snapshots, large resources, unbounded terminal output, or logs prompts/tokens/file contents. | Enforce message, resource, history, subscription, and terminal scrollback limits. Apply backpressure and rate limits. Redact tokens, prompts, file contents, terminal output, and secrets from logs/telemetry. | |
There was a problem hiding this comment.
A peer is already going to be authenticated. Standard best practice for resource management is good but I don't think this is a big deal. A client isn't authoritative on the state so I'm unsure what "Enforce message, resource, history, subscription, and terminal scrollback limits" would mean in that context
|
|
||
| ## Minimum requirements for implementations | ||
|
|
||
| 1. **Authenticate before protocol use.** Remote transports must authenticate peers before `initialize`; `clientId` must be bound to the authenticated connection. |
There was a problem hiding this comment.
We don't do this. We let people stand up agent hosts and connect to them via websocket -- I have an agent host like this on my home server that I connect to from my devices. I don't tihnk we'd gain anything by removing this capability
| 3. **Make trust local.** Clients and servers must make authorization decisions in their own policy layer, not from peer-provided text, labels, or confirmation flags. | ||
| 4. **Gate token delivery.** Sending OAuth/Bearer tokens to an agent host requires explicit consent and must be scoped to the intended resource and host. | ||
| 5. **Constrain client tools.** Do not expose powerful client tools to untrusted hosts by default. When enabled, require local allowlists, argument validation, and user confirmation for sensitive operations. | ||
| 6. **Constrain resource and terminal APIs.** Servers must sandbox filesystem access and terminal operations; clients that serve local resources must enforce their own scheme/path policy. |
There was a problem hiding this comment.
Servers must sandbox filesystem access and terminal operations
This requirement currently cannot be fulfilled on Windows, and even on other OS' I think is too heavy to have as a "minimum requirement". Sandboxing is complex.
| 6. **Constrain resource and terminal APIs.** Servers must sandbox filesystem access and terminal operations; clients that serve local resources must enforce their own scheme/path policy. | ||
| 7. **Handle multi-client ownership.** Active-client state, tool calls, terminal claims, input requests, and approvals must be tied to the owning authenticated connection. | ||
| 8. **Treat plugins and packages as executable.** Customizations, MCP servers, package installs, hooks, and skills require supply-chain policy and sandboxing. | ||
| 9. **Validate and bound the protocol.** Use schema validation, fail closed on invalid messages, and enforce limits on frame size, JSON depth, resource size, subscriptions, replay history, and terminal output. |
There was a problem hiding this comment.
|
I'm going to close this and simplify including some minimal examples. |
Summary
Testing