# Model Context Protocol (MCP)

We can use any LLM with MCP. In this repository we will use `llama3.2:1b` and `llama3.2:3b`

You can also download other models such as `llama3.2:8b` (if your machine supports)

| Model | Command |
| --- | --- |
| llama 3.2:1b | `ollama pull llama3.2:1b` |
| llama 3.2:3b | `ollama pull llama3.2:3b` |
| llama 3.2:8b | `ollama pull llama3.2:8b` |

In [None]:
import ollama
import os
import sys
from pprint import pprint
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

In [None]:
LARGE_MODEL='llama3.2:3b' # or llama3.2:1b for faster but less accurate results
SMALL_MODEL='llama3.2:1b' # or llama3.2:3b for better but slower results

We have a folder `mcp_servers`, inside this folder we have our mcp tool written under different python scripts. These scripts are also called servers in some cases (when these are running on a remote server waiting for a client connection). Because MCP tools can be used as plug and play unlike agents, these scripts can run on remote servers

First, we have a very simple mcp tool which is a tool to add two numbers under the script `the_math_server.py`.
We will use this tool first just to understand how the mcp works with an LLM (in this case llama3.2)


In [None]:
# Configuration for an MCP Server (Example: a local python script)
# If you don't have one, you can use a simple 'hello-world' MCP server script

MCP_SERVER_PATH = os.path.abspath("./mcp_servers/the_math_server.py")


In [None]:
async def run_mcp_query(mcp_server_path, user_prompt, model):
    server_params = StdioServerParameters(
        command="python",
        args=[mcp_server_path],
    )
    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            # 1. Initialize session
            await session.initialize()
            
            # 2. List available tools from the MCP server
            tools = await session.list_tools()
            
            # 3. Ask Ollama to decide which tool to use
            # Note: We format the MCP tools into Ollama's tool format
            ollama_tools = [
                {
                    "type": "function",
                    "function": {
                        "name": t.name,
                        "description": t.description,
                        "parameters": t.inputSchema,
                    },
                }
                for t in tools.tools
            ]

            response = ollama.chat(
                model=model,
                messages=[{'role': 'user', 'content': user_prompt}],
                tools=ollama_tools,
            )

            # 4. Handle Tool Calls
            if response.get('message', {}).get('tool_calls'):
                for tool_call in response['message']['tool_calls']:
                    tool_name = tool_call['function']['name']
                    tool_args = tool_call['function']['arguments']
                    
                    print(f"üõ†Ô∏è Calling tool: {tool_name} with {tool_args}")
                    
                    # Execute tool via MCP
                    result = await session.call_tool(tool_name, tool_args)
                    print(f"‚úÖ Tool Result: {result.content}")
                    
                    # Final response from LLM
                    final_response = ollama.chat(
                        model=model,
                        messages=[
                            {'role': 'user', 'content': user_prompt},
                            response['message'],
                            {'role': 'tool', 'content': str(result.content), 'name': tool_name}
                        ]
                    )
                    return final_response['message']['content']
            
            return response['message']['content']


### Code Explanation

`async def run_mcp_query(user_prompt, MODEL):` 
- The word async (asynchronous) just means your computer won't freeze up while it waits for the process to finish. It can go do other chores, like a person putting a phone on speaker while waiting on hold.
- The Layman Translation: "I am preparing to ask my assistant a question (the user_prompt), and I'm willing to wait patiently for the answer."

`async with stdio_client(server_params) as (read, write):`
- This step builds the actual physical/digital pipe between your main program and the hidden tool server. The server_params are like the phone number or the address of the office.
- The Layman Translation: "I am plugging a secure phone line directly into the assistant's office. I have an earpiece to listen (read) and a microphone to speak (write)."

`async with ClientSession(read, write) as session:`
- Just having a wire isn't enough; the computers need rules on how to format their messages. A ClientSession wraps the raw wire in a polite protocol (like agreeing to say "Over" when you finish speaking on a walkie-talkie).
- The Layman Translation: "Now that the wire is connected, we are agreeing to speak the same language so we don't talk over each other."

`await session.initialize()`
- Before you can ask your actual question or ask the server to run tools, you have to do a digital handshake. This proves the server is awake, functioning, and ready to accept commands.
- The Layman Translation: "I am saying 'Hello, are you there and ready to work?' and I will wait (await) until you say 'Yes, I am ready!'"

Let us now try to use our MCP tool

In [None]:
query = "What is 5 + 5?"

In [None]:
result = await run_mcp_query(MCP_SERVER_PATH,query, SMALL_MODEL)
print(f"LLM Final Response: {result}")

You will see above that the `SMALL_MODEL` fails to give the answer to a very simple question. Though the process successfully uses the `Math` MCP tool and the result was also calculated correctly as can be seen in the 2nd line of the output under `text='10'`.

Lets now try the same query with the `LARGE_MODEL`

In [None]:
result = await run_mcp_query(MCP_SERVER_PATH, query, LARGE_MODEL)
print(f"LLM Final Response: {result}")

## Run a more complex MCP servers

There is another server file `mcp_tools_server.py` inside the `mcp_servers` folder. 

This mcp server scripts consists of two tools.
1. Web Search Tool - using DuckDuckGo 
2. Reading a local file Tool

In [None]:
MCP_SERVER_PATH = os.path.abspath("./mcp_servers/mcp_tools_server.py")

Lets first try to use the `web_search` tool

In [None]:
# Try a web search:
query = "Who won the FIFA World Cup in 2026?"
print(await run_mcp_query(MCP_SERVER_PATH,query, SMALL_MODEL))

In [None]:
query = "Who won the FIFA World Cup in 2026?"
print(await run_mcp_query(MCP_SERVER_PATH, query, LARGE_MODEL))

Now lets use the `read_local_file` tool

In [None]:
# Try reading a file
query = "Summarize the contents of ai_response.txt"
pprint(await run_mcp_query(MCP_SERVER_PATH, query, SMALL_MODEL))

Try to run the above code block again if facing errors or weird outputs

The `SMALL_MODEL`, that is `llama3.2:1b` does not perform very well to even not so difficult tasks

In [None]:

# Try reading a file
query = "Summarize the contents of ai_response.txt"
pprint(await run_mcp_query(MCP_SERVER_PATH, query, LARGE_MODEL))

### Add System Prompts 

Now lets add some system prompts and try to see if that makes a difference

In [None]:
async def mcp_with_system_prompt(mcp_server_path, user_prompt, model):
    
    server_params = StdioServerParameters(
        command="python",
        args=[mcp_server_path],
    )

    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
            
            # List tools and format for Ollama
            tools_result = await session.list_tools()
            ollama_tools = [{
                "type": "function",
                "function": {
                    "name": t.name,
                    "description": t.description,
                    "parameters": t.inputSchema,
                }
            } for t in tools_result.tools]

            # 1. Ask Ollama
            response = ollama.chat(
                model=model,
                messages=[
                    # --- ADDED THIS SYSTEM ROLE ---
                    {
                        'role': 'system', 
                        'content': (
                            "You are a real-time assistant with access to tools. "
                            "IMPORTANT: Always prefer information from tool results over your internal knowledge. "
                            "The current year is 2026. If tool data contradicts your training, the tool is right."
                        )
                    },
                    {'role': 'user', 'content': user_prompt}
                ],
                tools=ollama_tools,
            )

            # 2. Check for tool calls
            message = response.get('message', {})
            # print(f"LLM Response: {message.get('content', '')}")
            if message.get('tool_calls'):
                # Handle all tool calls (Ollama might suggest more than one)
                tool_messages = []
                for tool_call in message['tool_calls']:
                    name = tool_call['function']['name']
                    args = tool_call['function']['arguments']
                    
                    print(f"üõ†Ô∏è LLM decided to use: {name}")
                    result = await session.call_tool(name, args)
                    
                    tool_messages.append({
                        'role': 'tool',
                        'content': str(result.content),
                        'name': name
                    })
                
                # 3. Final synthesis
                final_response = ollama.chat(
                    model=model,
                    messages=[
                        {
                            'role': 'system', 'content': (
                                "Summarize the tool results accurately only when necessary." 
                                "Do not rely on your training data but use the tools to get up-to-date information." 
                                "Always prefer tool results over your internal knowledge. "
                                )
                        },
                        {'role': 'user', 'content': user_prompt},
                        message,
                        *tool_messages
                    ]
                )
                return final_response['message']['content']
            
            return message['content']



Now, lets try again with both MODELS one by one

In [None]:
query = "Who won the FIFA World Cup in 2026?"
pprint(await mcp_with_system_prompt(MCP_SERVER_PATH, query, SMALL_MODEL))

In [None]:
query = "Who won the FIFA World Cup in 2026?"
pprint(await mcp_with_system_prompt(MCP_SERVER_PATH, query, LARGE_MODEL))

In [None]:
query = "Summarize the contents of the text file ./ai_response.txt"
pprint(await mcp_with_system_prompt(MCP_SERVER_PATH,query, LARGE_MODEL))

If you are getting json responses try changing system prompt. 

<details>
    <summary>Try adding this (check only if needed)</summary>
    "You are an MCP Assistant. If you need information, use a tool. DO NOT write raw JSON in your response. If you call a tool, use the proper tool-call format."
</details>

### Getting simple python code from web search

In [None]:
query = "Search for python code for calculating the first 10 numbers of the Fibonacci sequence."
print(await mcp_with_system_prompt(MCP_SERVER_PATH, query, LARGE_MODEL))

Copy paste the above code in a python file and name it as `fibonacci_numbers.py`. Create a folder `scripts` and move the python script under the `scripts` folder

Try running this python script in the terminal to check if the script is working or not

```bash
python scripts/fibonacci_numbers.py
```