# Model Context Protocol

Modern AI applications face a fundamental challenge: how to securely and efficiently connect language models to the vast ecosystem of external data sources, APIs, and tools they need to be truly useful. Previously, developers would often build custom integrations for each external service, creating a fragmented landscape that is difficult to maintain, scale, and standardize. This fragmentation limits the potential of agentic AI systems, which require access to diverse external resources to perform complex tasks effectively.

[Model Context Protocol (MCP)](https://modelcontextprotocol.io) addresses this challenge by providing a universal standard for AI-to-service communication. Rather than building point-to-point integrations, MCP establishes a common language that allows any AI model to connect to any compatible service through a standardized interface. This transforms agentic AI workflows by enabling models to dynamically discover and utilize external capabilities—from databases and file systems to specialized APIs and knowledge graphs—without requiring custom integration code for each service. The result is more flexible, maintainable, and powerful AI applications that can adapt to new tools and data sources as they become available.

In this notebook, we will be exploring how to use connect [Granite models](https://www.ibm.com/granite) to MCP servers, enabling advanced agentic workflows.



## MCP with Granite

The **Model Context Protocol (MCP)** is an open standard that enables AI applications to securely connect to external data sources and tools. It consists of two main components:

### MCP Server
The **MCP Server** exposes resources, tools, and prompts that LLMs can access. Servers can provide access to databases, APIs, file systems, or any external service through a standardized interface.

### MCP Client  
The **MCP Client** connects to MCP servers and facilitates communication between your application and the server's capabilities. It handles:
- Authentication and connection management
- Resource discovery (what tools/resources are available)
- Tool execution and data retrieval
- Protocol-level communication with the server

Together, these components will allow your application to connect Granite LLMs to external resources (servers) using a standardized MCP interface.

First we will install the requisite packages.

In [None]:
! pip install git+https://github.com/ibm-granite-community/utils \
    "mcp[cli]" \
    fastmcp \
    replicate

In [None]:
import json
import os
from typing import Dict, Any, List, Optional
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
import replicate
import datetime
from ibm_granite_community.notebook_utils import get_env_var

## Creating the MCP Client

Let's start by building the MCP client. Our client needs a few main methods: 

- start(): Connects the LLM to the MCP server
- close(): Cleans up server connections
- create_system_prompt(): Provide the LLM access to the server's tools, resources, and prompts
- parse_model_response(): Parse LLMs response and execute tool calls when provided

We also provide additional chat method, which allows the model to call multiple tools in series to answer a users prompt.

In [None]:
class GraniteMCPClient:
    def __init__(self, replicate_model: str, server_command: str, server_args: Optional[List[str]] = None):
        """
        Initialize the client with persistent MCP connection.
        
        Args:
            replicate_model: The Replicate model identifier
            server_command: Command to run MCP server (e.g., "npx")
            server_args: Arguments for the server command
        """
        self.model = replicate_model
        self.replicate_client = replicate.Client(api_token=get_env_var("REPLICATE_API_TOKEN"))
        self.server_params = StdioServerParameters(
            command=server_command,
            args=server_args or [],
            env=os.environ.copy()
        )
        self.session = None
        self.read_stream = None
        self.write_stream = None
        self._client_context = None

    async def start(self):
        """Initialize the persistent MCP connection"""
        self._client_context = stdio_client(self.server_params)
        self.read_stream, self.write_stream = await self._client_context.__aenter__()
        
        self.session = ClientSession(self.read_stream, self.write_stream)
        await self.session.__aenter__()
        await self.session.initialize()
        
        # Get and log available tools
        tools_result = await self.session.list_tools()
        available_tools = tools_result.tools
        print(f"🔧 Connected to MCP server. Available tools: {[tool.name for tool in available_tools]}")

    async def close(self):
        """Clean up MCP connections"""
        if self.session:
            await self.session.__aexit__(None, None, None)
        if self._client_context:
            await self._client_context.__aexit__(None, None, None)

    def create_system_prompt(self, custom_system_prompt: Optional[str], tools: List) -> str:
        """Create system prompt combining custom prompt with tool information."""
        # Default system prompt if none provided
        if custom_system_prompt is None:
            custom_system_prompt = "You are a helpful assistant."
        
        if not tools:
            return f"<|start_of_role|>system<|end_of_role|>{custom_system_prompt}<|end_of_text|>"
        
        # Convert MCP tool definitions to the expected format
        tools_info = []
        for tool in tools:
            # Convert MCP inputSchema to the expected arguments format
            arguments = {}
            if tool.inputSchema and 'properties' in tool.inputSchema:
                for prop_name, prop_info in tool.inputSchema['properties'].items():
                    arguments[prop_name] = {
                        "description": prop_info.get('description', f"The {prop_name} parameter")
                    }
            
            tool_info = {
                "name": tool.name,
                "description": tool.description,
                "arguments": arguments
            }
            tools_info.append(tool_info)
        
        # Create the system prompt combining custom prompt with tool information
        tools_json = json.dumps(tools_info, indent=4)
        
        prompt = f"""<|start_of_role|>system<|end_of_role|>Knowledge Cutoff Date: April 2024.
Today's Date: {datetime.date.today().strftime("%B %d %Y")}. {custom_system_prompt} You have access to the following tools. When a tool is required to answer the user's query, respond only with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request. <|end_of_text|>
<|start_of_role|>available_tools<|end_of_role|>{tools_json}<|end_of_text|>"""
        
        return prompt

    def parse_model_response(self, response: str) -> Optional[List[Dict[str, Any]]]:
        """Parse model response to extract tool calls"""
        try:
            response = response.strip()
            
            # Check for <|tool_call|> format first
            if '<|tool_call|>' in response:
                tool_call_start = response.find('<|tool_call|>') + len('<|tool_call|>')
                json_part = response[tool_call_start:].strip()
                if '<|end_of_text|>' in json_part:
                    json_part = json_part[:json_part.find('<|end_of_text|>')].strip()
            # If no marker, check if the entire response is a JSON array
            elif response.startswith('[') and response.endswith(']'):
                json_part = response
            else:
                return None
            
            json_part = json_part.replace('\\"', '"')

            tool_calls = json.loads(json_part)
            if not isinstance(tool_calls, list):
                tool_calls = [tool_calls]
            return tool_calls
            
        except (json.JSONDecodeError, ValueError):
            return None

    async def chat(self, user_message: str, system_prompt: Optional[str] = None) -> str:
        """
        Chat with the model using the persistent MCP connection with agentic tool calling loop
        
        Args:
            user_message: The user's message
            system_prompt: Custom system prompt to use (optional)
            
        Returns:
            The final response from the model
        """
        if not self.session:
            raise RuntimeError("Client not started. Call start() first.")
        
        try:
            # Get available tools from the persistent session
            tools_result = await self.session.list_tools()
            available_tools = tools_result.tools
            
            # Build the initial prompt with system instructions and user message
            system_prompt_text = self.create_system_prompt(system_prompt, available_tools)
            current_prompt = f"{system_prompt_text}\n<|start_of_role|>user<|end_of_role|>{user_message}<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>"
            
            max_tool_executions = 5
            tool_execution_count = 0
            last_response = ""
            
            # Agentic tool calling loop
            while tool_execution_count < max_tool_executions:
                # Call the model
                response = self.replicate_client.run(
                    self.model,
                    input={
                        "prompt": current_prompt,
                        "max_tokens": 4096,
                        "temperature": 0.1,
                    }
                )
                
                # Handle streaming responses
                if hasattr(response, '__iter__') and not isinstance(response, str):
                    response_text = "".join(str(chunk) for chunk in response)
                else:
                    response_text = str(response)
                
                print(response_text)
                last_response = response_text
                
                # Check if the model wants to call tools
                tool_calls = self.parse_model_response(response_text)
                
                if not tool_calls:
                    # No tool calls, exit the loop
                    break
                    
                print(f"🧠 Model wants to use {len(tool_calls)} tool(s)")
                
                # Execute all tool calls
                tool_results = []
                for tool_call in tool_calls:
                    tool_name = tool_call['name']
                    tool_args = tool_call.get('arguments', {})
                    
                    print(f"   • Calling {tool_name}({tool_args})")
                    
                    # Call the tool via the persistent session
                    result = await self.session.call_tool(tool_name, tool_args)
                    print(result)
                    
                    # Extract result text
                    if result.content:
                        if isinstance(result.content, list):
                            result_text = "\n".join(
                                item.text if hasattr(item, 'text') else str(item)
                                for item in result.content
                            )
                        else:
                            result_text = str(result.content)
                    else:
                        result_text = "Tool executed successfully"
                    
                    tool_results.append(result_text)
                
                # Build tool response and append to context
                tool_response_text = json.dumps(tool_results)
                current_prompt = f"{current_prompt}{response_text}\n<|start_of_role|>tool_response<|end_of_role|>{tool_response_text}<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>"
                
                tool_execution_count += 1
            
            # If the last response contained tool calls (hit max executions), give one final chance
            if self.parse_model_response(last_response):
                print("🔄 Max tool executions reached, giving final response opportunity")
                
                final_response = self.replicate_client.run(
                    self.model,
                    input={
                        "prompt": current_prompt,
                        "max_tokens": 1024,
                        "temperature": 0.7,
                    }
                )
                
                if hasattr(final_response, '__iter__') and not isinstance(final_response, str):
                    return "".join(str(chunk) for chunk in final_response)
                else:
                    return str(final_response)
            
            # Return the last response (which had no tool calls)
            return last_response
            
        except Exception as e:
            return f"Error: {str(e)}"

## Connecting MCP Servers
Now that we have built an MCP client, we can now connect our LLM to any MCP server of our choice! 

### Memory

Let's connect our model to an external memory base, that allows it to store and retrieve information about the user, even if it was stated in a previous conversation it no longer has access to.

We will be using the [Official MCP Filesystem Server]("https://github.com/modelcontextprotocol/servers/tree/main/src/filesystem"). This enables our model to create, read, edit, and remove files in a specified directory. Let's first connect our server to the MCP Client.

In [None]:
replicate_model = "ibm-granite/granite-3.3-8b-instruct"
server_command = "npx"
server_args = ["-y", "@modelcontextprotocol/server-filesystem", "."]

client = GraniteMCPClient(replicate_model, server_command, server_args)

await client.start()

Now let's write some instructions to tell the model how to store, edit and retrieve memories.

In [None]:
instructions = """
You are an AI assistant with persistent memory capabilities. You can store and retrieve information about users across conversations using a memory.txt file.

ALWAYS start every response by using read_file with path "memory.txt" to check for relevant user information. If the read_file call fails because memory.txt doesn't exist, immediately use write_file with path "memory.txt" and content starting with "1. " to create it.

WHENEVER the user shares information about themselves, use read_file first to get current contents, then use write_file with path "memory.txt" to save the updated memory list. When adding new information, append it as a new numbered entry to the existing list - do not overwrite existing entries.

When the user provides new information, use read_file to get the current memory contents, then use write_file to save the complete updated numbered list with the new information added to the end.

If the user disputes information or asks to delete a memory, use read_file to get current memories, remove that specific numbered entry, renumber the remaining list, then use write_file to save the updated list. If they provide information that conflicts with existing memories, use read_file first, then use write_file to replace the old memory with the new one in the complete updated list.

Always use read_file on memory.txt first when the user sends a message and reference relevant memories naturally in your responses when they provide helpful context. Don't mention irrelevant memories and don't announce when you're reading or writing to memory unless specifically asked. When the user says "forget that" or "delete that memory" use read_file then write_file to remove the specified memory. When they ask "what do you remember about me" use read_file to get memories and summarize them. Be natural and conversational when using stored information to provide more personalized responses.
"""

Now we can start a conversation with our memory-powered AI assistant:

In [None]:
message = "Hello Granite, I am a AI engineer at IBM research. I have a dog named Sunday. She is a border collie and she is 13 years old."

prompt = f"{instructions} Users message: {message}"

response = await client.chat(prompt)

The LLM will store this information in it's memory bank. Now we can open up a new chat, and ask it a question about this information.

In [None]:
message = "What is my dog's name?"

prompt = f"{instructions} Users message: {message}"

response = await client.chat(prompt)

### Conclusion

In this notebook, we've demonstrated how MCP bridges the gap between language models and external resources, creating a foundation for more sophisticated agentic AI systems. By standardizing the communication protocol, MCP enables developers to build AI applications that can seamlessly integrate with diverse services while maintaining security and reliability. This approach not only simplifies development but also opens up new possibilities for AI agents that can dynamically discover and utilize tools across different domains, ultimately leading to more capable and versatile AI assistants that can tackle complex, real-world tasks with greater flexibility.