This is a layman implementation of the idea behind the article Code execution with MCP: building more efficient AI agents \ Anthropic Similar existing implementations are limited to certain backends or too hard to unravel the concepts. So, I built this over a weekend.
The key building blocks of a basic implementation would be:
- Pick any Agentic framework : I chose Pydantic AI
- Provide it with a mechanism to search file system : This is essentially part of a delicate System Prompt
- Provide it with a mechanism to execute code (Python in our case) : System Prompt also includes how to use this 'tool' function
- Provide it access to a bunch of MCPs : I selected few simple ones - including the filesystem and time operations
- Finally, a Task that is worthy of all this complexity !!! - Could the number of files in a folder and could only .py and the number of lines in all the files, then save the results into a file with the 'current datetime as a suffix'
The actual sample implementation is in pydantic_main.py - which just demonstrates the bare minimum to use such an agent with 'code execution' capabilities.
Then there is the token_savings_demo.py code, that demonstrates using plain MCP vs code-execution MCP. It prints the savings and lot of intermediate contents for debug purposes. For the relatively trivial task given to the Agent, it showed a savings of between 50-90% token savings.
The code, as of today, does the bare minimum to help understand the flow. There are potentially many enhancements that we can do. For e.g. the 'search_tools' tool mentioned in the Anthropic article (maybe doign some smart search), add the ability to persistent code that resulted in successful outcomes (remembered Skills), security aspects of a Sandbox etc.
- Create a .env file with content like below:
ANTHROPIC_API_KEY=sk-ant-YOURKEY
# MCP_SERVERS_CONFIG=/absolute/path/to/mcp-servers.json
# ^ If not provided it looks for the a local mcp-servers.json in the execution folder
- Install the code dependencies using 'uv'
uv sync- Run the main method (for testing the functionality)
python pydantic_main.py
- Run a comparion against traiditional approach
python token_savings_demo.py
Look for the logs/ and requests/ folders to see what is happening (number of calls, the output of the code etc.). In addition watch the console for dynamic code being generated and executed. And finally the output of the savings in tokens. As part of the run (first time) - 'wrapper' code is generated for the MCP server logic for the dynamic code generation step to use. These will be under servers/ folder
Feedback welcome!