Client side tool validation to defend against tool poisoning #348
alorispax8
started this conversation in
Ideas - Security
Replies: 1 comment 2 replies
-
|
I think this is not part of the protocol specification.
But best would be - use only MCP servers from trusted vendors. It includes both local STDIO and remote SSE. Example, if you want to integrate Slack with your LLM then use MCP server officially created and maintained by Slack, not someone else. If you use MCP server created by noname then it is your responsibility |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Pre-submission Checklist
Your Idea
Context
The folks at Invariant Labs posted about tool poisoning, I would like to open up a discussion to move towards a solution for situations where the MCP server is hosted remotely.
Proposal
It seems like the root cause of the problem is that clients don't keep track of server side tools locally. It might make sense to implement client side baselining and drift detection for tool descriptions exposed by the server. Possibly as follows:
Client registers with server
a) The client receives a copy of the complete tool description of all the tools in scope
b) The human reviews and approves the descriptions of the current tool set. If the descriptions contain instructions that might be hidden to the UI, then either the human is notified/warned by the client
c) The human is given the option to reject any tool from being used
d) All accepted tool names and descriptions (maybe a signed hash of them) are stored client side
When the human asks the agent to perform some action
a) An MCP server is selected but no tools can be used until descriptions have been checked for drift
b) MCP server gets polled for the description of all the approved tools
c) All descriptions are checked against the values stored locally
d) Only tools that have not changed their descriptions can be used, tools that have drifted are blocked
e) Tools that have drifted are submitted for re-approval by the human
f) LLM and MCP client can now interact with MCP server
Scope
Beta Was this translation helpful? Give feedback.
All reactions