# Model Context Protocol

Modern AI applications face a fundamental challenge: how to securely and efficiently connect language models to the vast ecosystem of external data sources, APIs, and tools they need to be truly useful. Previously, developers would often build custom integrations for each external service, creating a fragmented landscape that is difficult to maintain, scale, and standardize. This fragmentation limits the potential of agentic AI systems, which require access to diverse external resources to perform complex tasks effectively.

[Model Context Protocol (MCP)](https://modelcontextprotocol.io) addresses this challenge by providing a universal standard for AI-to-service communication. Rather than building point-to-point integrations, MCP establishes a common language that allows any AI model to connect to any compatible service through a standardized interface. This transforms agentic AI workflows by enabling models to dynamically discover and utilize external capabilities, without requiring custom integration code for each service. The result is more flexible, maintainable, and powerful AI applications that can adapt to new tools and data sources as they become available.

In this notebook, we will be exploring how to connect [Granite models](https://www.ibm.com/granite) to MCP servers, enabling advanced agentic workflows.



## MCP with Granite

The **Model Context Protocol (MCP)** is an open standard that enables AI applications to securely connect to external data sources and tools. It consists of two main components:

### MCP Server
The **MCP Server** exposes resources, tools, and prompts that LLMs can access. Servers can provide access to databases, APIs, file systems, or any external service through a standardized interface.

### MCP Client  
The **MCP Client** connects to MCP servers and facilitates communication between your application and the server's capabilities. It handles:
- Authentication and connection management
- Resource discovery (what tools/resources are available)
- Tool execution and data retrieval
- Protocol-level communication with the server

Together, these components will allow your application to connect Granite LLMs to external resources (servers) using a standardized MCP interface.

First we will install the requisite packages.

In [None]:
%pip install git+https://github.com/ibm-granite-community/utils langchain langgraph langchain-mcp-adapters langchain_core langchain-community replicate

In [None]:
from langchain_community.llms import Replicate
from langchain_core.messages import AIMessage, ToolMessage, ToolCall
from langchain_core.prompts import (
    MessagesPlaceholder,
    HumanMessagePromptTemplate,
    PromptTemplate,
)
from langchain_core.utils.json import parse_json_markdown
from ibm_granite_community.langchain import TokenizerChatPromptTemplate
from transformers import AutoTokenizer
import os, datetime
from typing import Dict, Any, List, Optional
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

## Creating the MCP Client

Let's start by building the MCP client. Our client needs a few main methods: 

- start(): Connects the LLM to the MCP server
- close(): Cleans up server connections
- create_tool_prompt(): Provide the LLM access to the server's tools, resources, and prompts
- parse_model_response(): Parse LLMs response and execute tool calls when provided

We also provide additional chat method, which allows the model to call multiple tools in series to answer a users prompt.

In [None]:
class GraniteMCPClient:
    def __init__(self, replicate_model: str, server_command: str, server_args: Optional[List[str]] = None):
        """
        Initialize the client with persistent MCP connection.
        
        Args:
            replicate_model: The Replicate model identifier
            server_command: Command to run MCP server (e.g., "npx")
            server_args: Arguments for the server command
        """
        self.model = replicate_model

        self.llm = Replicate(
            model=replicate_model,
            model_kwargs={"max_tokens": 4096, "temperature": 0.1},
        )

        self.tokenizer = AutoTokenizer.from_pretrained(replicate_model, trust_remote_code=True)

        self.server_params = StdioServerParameters(
            command=server_command,
            args=server_args or [],
            env=os.environ.copy(),
        )
        self.session = self.read_stream = self.write_stream = self._client_context = None

    async def start(self):
        """Initialize the persistent MCP connection"""
        self._client_context = stdio_client(self.server_params)
        self.read_stream, self.write_stream = await self._client_context.__aenter__()
        
        self.session = ClientSession(self.read_stream, self.write_stream)
        await self.session.__aenter__()
        await self.session.initialize()
        
        # Get and log available tools
        tools_result = await self.session.list_tools()
        available_tools = tools_result.tools
        print(f"🔧 Connected to MCP server. Available tools: {[tool.name for tool in available_tools]}")

    async def close(self):
        """Clean up MCP connections"""
        if self.session:
            await self.session.__aexit__(None, None, None)
        if self._client_context:
            await self._client_context.__aexit__(None, None, None)

    def create_tool_prompt(
        self,
        tools: List,
    ) -> str:
        """Return plain-text JSON block that enumerates the MCP tools.

        The calling code will insert this string into its own
        TokenizerChatPromptTemplate (`{tools_block}` placeholder),
        so we deliberately include *nothing* except the JSON.
        """
        if not tools:
            return "[]"

        tools_info = []
        for tool in tools:
            arguments = {}
            if tool.inputSchema and "properties" in tool.inputSchema:
                for name, meta in tool.inputSchema["properties"].items():
                    arguments[name] = {
                        "description": meta.get("description", f"The {name} parameter")
                    }

            tools_info.append(
                {
                    "name": tool.name,
                    "description": tool.description,
                    "arguments": arguments,
                }
            )
            
        return tools_info

    def parse_model_response(self, response: str) -> Optional[List[Dict[str, Any]]]:
        """Parse model response to extract tool calls"""
        try:
            # If Granite used its special tag, isolate the JSON segment first
            if "<|tool_call|>" in response:
                json_blob = response.split("<|tool_call|>", 1)[1]
                # Trim any trailing role tags
                if "<|" in json_blob:
                    json_blob = json_blob.split("<|", 1)[0]
            else:
                json_blob = response

            tool_calls = parse_json_markdown(json_blob)
            if tool_calls is None:
                return None
            if isinstance(tool_calls, dict):
                tool_calls = [tool_calls]
            return tool_calls
        except Exception:
            return None
        
    async def chat(     
        self,
        user_message: str,
        system_prompt: str | None = None,
        *,
        max_tool_executions: int = 5,
    ) -> str:
        """
        Agent loop for Granite + MCP.

        ‣ Prints the full prompt, raw model output, parsed tool calls,
          and tool responses each turn.
        """

        if not self.session:
            raise RuntimeError("Client not started. Call start() first.")
        
        if system_prompt is None:
            system_prompt = (
                f"Knowledge Cutoff Date: April 2024.\n"
                f"Today's Date: {datetime.date.today():%B %d, %Y}.\n"
                "You are a helpful assistant. "
                "When a tool is required, reply ONLY with <|tool_call|> "
                "followed by a JSON array."
            )

        prompt_template = TokenizerChatPromptTemplate.from_messages(
            [
                ("system", "{sys}"),
                HumanMessagePromptTemplate(
                    prompt=PromptTemplate.from_template("{query}")
                ),
                MessagesPlaceholder("tool_results", optional=True),
            ],
            tokenizer=self.tokenizer,
        )

        tools_schema = self.create_tool_prompt(
            (await self.session.list_tools()).tools
        )  # list[dict] JSON schema objects

        tool_results_msgs: list = []             # running AI / Tool messages
        query_text = user_message                # first human turn
        turn = 0

        while turn < max_tool_executions:
            turn += 1
            prompt_text = prompt_template.format_prompt(
                sys=system_prompt,
                query=query_text,
                tools=tools_schema,
                tool_results=tool_results_msgs,
            ).text

            # Call Granite via LangChain-Replicate wrapper
            response_text = str(self.llm.invoke(prompt_text))

            print("\n========== MODEL RAW RESPONSE ==========")
            print(response_text)
            print("========================================")

            tool_calls = self.parse_model_response(response_text)

            print("\nParsed tool calls:", tool_calls)

            # Natural-language answer → exit loop
            if not tool_calls:
                return response_text

            # Execute each tool call
            for idx, call_dict in enumerate(tool_calls, start=1):
                name = call_dict["name"]
                args_dict = call_dict.get("arguments", {})
                print(f"\n>>> Executing tool {name} {args_dict}")

                result = await self.session.call_tool(name, args_dict)

                # Flatten MCP result to string
                if result.content:
                    if isinstance(result.content, list):
                        result_text = "\n".join(
                            x.text if hasattr(x, "text") else str(x)
                            for x in result.content
                        )
                    else:
                        result_text = str(result.content)
                else:
                    result_text = "Tool executed successfully"

                print("Tool output -->", result_text[:300], "...\n")

                # Record messages for next prompt
                tc_obj = ToolCall(name=name, args=args_dict, id=str(idx))
                tool_results_msgs.append(
                    AIMessage(content=response_text, tool_calls=[tc_obj])
                )
                tool_results_msgs.append(
                    ToolMessage(content=result_text, tool_call_id=idx)
                )

            # After first round the human has no new text; keep placeholder only
            query_text = " "

        # If we hit max_tool_executions without a final answer
        prompt_text = prompt_template.format_prompt(
            sys=system_prompt,
            query=" ",                       # no new user input
            tools=tools_schema,
            tool_results=tool_results_msgs,  # full history of tool use
        ).text

        final_answer = str(self.llm.invoke(prompt_text))

        print(final_answer)

        return final_answer


## Connecting MCP Servers
Now that we have built an MCP client, we can now connect our LLM to any MCP server of our choice! 

### Memory

Let's connect our model to an external memory base, that allows it to store and retrieve information about the user, even if it was stated in a previous conversation it no longer has access to.

We will be using the [Official MCP Filesystem Server]("https://github.com/modelcontextprotocol/servers/tree/main/src/filesystem"). This enables our model to create, read, edit, and remove files in a specified directory. Let's first connect our server to the MCP Client.

In [None]:
replicate_model = "ibm-granite/granite-3.3-8b-instruct"
server_command = "npx"
server_args = ["-y", "@modelcontextprotocol/server-filesystem", "."]

client = GraniteMCPClient(replicate_model, server_command, server_args)

await client.start()

Now let's write some instructions to tell the model how to store, edit and retrieve memories.

In [None]:
instructions = """
You are an AI assistant with persistent memory stored in 'memory.txt'.

At the start of each response:
- Always call read_file with path 'memory.txt' to get the current memory.

If 'memory.txt' does not exist:
- Create it by calling write_file with content starting with '1. '.

When the user shares new personal information:
- Read the current memory.
- Append the new fact as a new numbered line.
- Save the updated list with write_file.

When the user asks to remove or replace information:
- Read the current memory.
- Update the list accordingly (remove or replace).
- Renumber the entries starting from 1.
- Save the updated list with write_file.

When the user asks something:
- Read 'memory.txt' and use this information to answer naturally.

When you answer, always:
- Start by performing tool calls first.
- Then give a clear, final natural language response to the user.

Never read or edit any file except 'memory.txt'.
you are strictly forbidden from using the following tools: list_directory, get_file_info
"""

Now we can start a conversation with our memory-powered AI assistant:

In [None]:
user_text = (
    "Hello Granite, I am an AI engineer at IBM Research. "
    "I have a dog named Sunday. She is a border collie and she is 13 years old."
)

# prepend memory instructions, separated by a line of dashes
combined_prompt = f"{instructions}\n\n---\n\n{user_text}"

response = await client.chat(combined_prompt)
print(response)


The LLM will store this information in it's memory bank. Now we can open up a new chat, and ask it a question about this information.

In [None]:
user_text = (
    "what kind of dog do I have?"
)

prompt = f"{instructions}\n\n---\n\n{user_text}"

response = await client.chat(prompt)
print(response)

### Conclusion

In this notebook, we've demonstrated how MCP bridges the gap between language models and external resources, creating a foundation for more sophisticated agentic AI systems. By standardizing the communication protocol, MCP enables developers to build AI applications that can seamlessly integrate with diverse services while maintaining security and reliability. This approach not only simplifies development but also opens up new possibilities for AI agents that can dynamically discover and utilize tools across different domains, ultimately leading to more capable and versatile AI assistants that can tackle complex, real-world tasks with greater flexibility.