# Model Context Protocol

Modern AI applications face a fundamental challenge: how to securely and efficiently connect language models to the vast ecosystem of external data sources, APIs, and tools they need to be truly useful. Previously, developers would often build custom integrations for each external service, creating a fragmented landscape that is difficult to maintain, scale, and standardize. This fragmentation limits the potential of agentic AI systems, which require access to diverse external resources to perform complex tasks effectively.

[Model Context Protocol (MCP)](https://modelcontextprotocol.io) addresses this challenge by providing a universal standard for AI-to-service communication. Rather than building point-to-point integrations, MCP establishes a common language that allows any AI model to connect to any compatible service through a standardized interface. This transforms agentic AI workflows by enabling models to dynamically discover and utilize external capabilities, without requiring custom integration code for each service. The result is more flexible, maintainable, and powerful AI applications that can adapt to new tools and data sources as they become available.

In this notebook, we will be exploring how to connect [Granite models](https://www.ibm.com/granite) to MCP servers, enabling advanced agentic workflows.



## MCP with Granite

The **Model Context Protocol (MCP)** is an open standard that enables AI applications to securely connect to external data sources and tools. It consists of two main components:

### MCP Server
The **MCP Server** exposes resources, tools, and prompts that LLMs can access. Servers can provide access to databases, APIs, file systems, or any external service through a standardized interface.

### MCP Client  
The **MCP Client** connects to MCP servers and facilitates communication between your application and the server's capabilities. It handles:
- Authentication and connection management
- Resource discovery (what tools/resources are available)
- Tool execution and data retrieval
- Protocol-level communication with the server

Together, these components will allow your application to connect Granite LLMs to external resources (servers) using a standardized MCP interface.

First we will install the requisite packages.

In [None]:
! echo "::group::Install Dependencies"
%pip install uv
! uv pip install git+https://github.com/ibm-granite-community/utils.git \
    langchain_core \
    'langchain_replicate @ git+https://github.com/ibm-granite-community/langchain-replicate.git' \
    transformers \
    mcp \
    mcp-server-fetch
! echo "::endgroup::"

In [None]:
from langchain_replicate import Replicate
from langchain_core.messages import AIMessage, ToolMessage, ToolCall
from langchain_core.prompts import HumanMessagePromptTemplate, MessagesPlaceholder
from langchain_core.utils.json import parse_json_markdown
from ibm_granite_community.langchain import TokenizerChatPromptTemplate
from ibm_granite_community.notebook_utils import get_env_var
from transformers import AutoTokenizer
import ast
import json
import os
from contextlib import AbstractAsyncContextManager, AsyncExitStack
from typing import Any, Self
from mcp import ClientSession, StdioServerParameters, stdio_client

## Creating the MCP Client

Let's start by building the MCP client. Our client needs a few methods: 

- `__aenter__`: Connects the LLM to the MCP server
- `__aexit__`: Cleans up server connections
- `create_tool_prompt`: Provide the LLM access to the server's tools, resources, and prompts
- `parse_model_response`: Parse LLMs response and execute tool calls when provided

We also provide a `chat` method, which allows the model to call multiple tools in series to answer a users prompt.

In [None]:
class GraniteMCPClient(AbstractAsyncContextManager):
    def __init__(self, replicate_model: str, server_command: str, server_args: list[str] | None = None):
        """
        Initialize the client with persistent MCP connection.

        Args:
            replicate_model: The Replicate model identifier
            server_command: Command to run MCP server (e.g., "npx")
            server_args: Arguments for the server command
        """
        self.model = replicate_model

        self.llm = Replicate(
            model=replicate_model,
            model_kwargs={"max_tokens": 4096, "temperature": 0},
            replicate_api_token=get_env_var('REPLICATE_API_TOKEN'),
        )

        self.tokenizer = AutoTokenizer.from_pretrained(replicate_model, trust_remote_code=True)

        self.server_params = StdioServerParameters(
            command=server_command,
            args=server_args or [],
            env=os.environ.copy(),
        )
        self.session = None
        self._exit_stack = AsyncExitStack()

    async def __aenter__(self) -> Self:
        """Initialize the persistent MCP connection"""
        client_context = stdio_client(self.server_params)
        read_stream, write_stream = await self._exit_stack.enter_async_context(client_context)

        self.session = ClientSession(read_stream, write_stream)
        await self._exit_stack.enter_async_context(self.session)
        await self.session.initialize()

        # Get and log available tools
        tools_result = await self.session.list_tools()
        available_tools = tools_result.tools
        print("ðŸ”§ Connected to MCP server.")
        print(f"Available tools: {[tool.name for tool in available_tools]}")
        return self

    async def __aexit__(self, *exc_details):
        """Clean up MCP connections"""
        await self._exit_stack.aclose()
        print("ðŸ”§ Disconnected from MCP server.")

    def create_tool_prompt(
        self,
        tools: list,
    ) -> list[dict[str, Any]]:
        """Return JSON block that enumerates the MCP tools.
        """
        if not tools:
            return []

        tools_info = []
        for tool in tools:
            arguments = {}
            if tool.inputSchema and "properties" in tool.inputSchema:
                for name, meta in tool.inputSchema["properties"].items():
                    arguments[name] = {
                        "description": meta.get("description", f"The {name} parameter")
                    }

            tools_info.append(
                {
                    "name": tool.name,
                    "description": tool.description,
                    "arguments": arguments,
                }
            )

        return tools_info

    def parse_model_response(self, response: str) -> list[dict[str, Any]] | None:
        """Parse model response to extract tool calls"""
        try:
            # If Granite used its special tag, isolate the JSON segment first
            if "<tool_call>" in response:
                json_blob = response.split("<tool_call>", 1)[1]
                # Trim any trailing role tags
                if "</tool_call>" in json_blob:
                    json_blob = json_blob.split("</tool_call>", 1)[0]
            else:
                json_blob = response

            tool_calls = parse_json_markdown(json_blob)
            if tool_calls is None:
                return None
            if isinstance(tool_calls, dict):
                tool_calls = [tool_calls]
            return tool_calls
        except Exception:
            return None

    async def chat(
        self,
        user_message: str,
        *,
        max_tool_executions: int = 5,
    ) -> str:
        """
        Agent loop for Granite + MCP.

        â€£ Prints the full prompt, raw model output, parsed tool calls,
          and tool responses each turn.
        """

        if not self.session:
            raise RuntimeError("Client not started.")

        prompt_template = TokenizerChatPromptTemplate.from_messages(
            [
                HumanMessagePromptTemplate.from_template("{query}"),
                MessagesPlaceholder("tool_results", optional=True),
            ],
            tokenizer=self.tokenizer,
        )

        tools_schema = self.create_tool_prompt(
            (await self.session.list_tools()).tools
        )  # list[dict] JSON schema objects

        tool_results_msgs: list = []             # running AI / Tool messages
        query_text = user_message                # first human turn
        turn = 0

        while turn < max_tool_executions:
            turn += 1
            prompt_text = prompt_template.format(
                query=query_text,
                tools=tools_schema,
                tool_results=tool_results_msgs,
            )

            # Call Granite via LangChain-Replicate wrapper
            response_text = str(self.llm.invoke(prompt_text))

            print("\n========== MODEL RAW RESPONSE ==========")
            print(response_text)
            print("========================================")

            tool_calls = self.parse_model_response(response_text)

            print("\nParsed tool calls:", tool_calls)

            # Natural-language answer â†’ exit loop
            if not tool_calls:
                return response_text

            # Execute each tool call
            for idx, call_dict in enumerate(tool_calls, start=1):
                name = call_dict["name"]
                args_dict = call_dict.get("arguments", {})
                # Check if arguments quoted string
                if isinstance(args_dict, str) and len(args_dict) >= 2:
                    quote = args_dict[0]
                    if quote in "\"'" and args_dict[-1] == quote:  # Remove surrounding quotes
                        args_dict = args_dict[1:-1].encode().decode("unicode_escape")
                    # Check if valid JSON
                    try:
                        args_dict = json.loads(args_dict)
                    except json.JSONDecodeError:
                        # Not valid JSON; assume dict repr
                        args_dict = ast.literal_eval(args_dict)

                print(f"\n>>> Executing tool {name} {args_dict}")

                result = await self.session.call_tool(name, args_dict) # type: ignore

                # Flatten MCP result to string
                if result.content:
                    if isinstance(result.content, list):
                        result_text = "\n".join(
                            x.text if hasattr(x, "text") else str(x)
                            for x in result.content
                        )
                    else:
                        result_text = str(result.content)
                else:
                    result_text = "Tool executed successfully"

                print("Tool output -->", result_text[:300], "...\n")

                # Record messages for next prompt
                tc_obj = ToolCall(name=name, args=args_dict, id=str(idx))
                tool_results_msgs.append(
                    AIMessage(content=response_text, tool_calls=[tc_obj])
                )
                tool_results_msgs.append(
                    ToolMessage(content=result_text, tool_call_id=str(idx))
                )

            # After first round the human has no new text; keep placeholder only
            query_text = " "

        # If we hit max_tool_executions without a final answer
        prompt_text = prompt_template.format(
            query="Directly answer user's original message",                       # no new user input
            tools=tools_schema,
            tool_results=tool_results_msgs,  # full history of tool use
        )

        final_answer = str(self.llm.invoke(prompt_text))

        return final_answer


## Connecting MCP Servers
Now that we have built an MCP client, we can now connect our LLM to any MCP server of our choice! 

### Web Search

Let's connect our model to an external web content fetch server, which allows it to retrieve and process information from web pages in real time. 

We will be using the [Official MCP Fetch Server](https://github.com/modelcontextprotocol/servers/tree/main/src/fetch). This enables our model to fetch, read, and summarize content from any URL, making it possible to answer questions using up-to-date information from the internet. 

In [None]:
replicate_model = "ibm-granite/granite-4.0-h-small"
server_command = "python"
server_args = ["-m", "mcp_server_fetch"]

async with GraniteMCPClient(replicate_model, server_command, server_args) as client:
    instructions = """
    You are a helpful assistant with access to a fetch, a tool that helps you read webpages given a url. You can use this to help answer user queries. ALWAYS answer the user's query DIRECTLY regardless of whether you use a tool call or not. Do not include a max_length parameter in your calls to fetch. The output will never be truncated. You are provided all of it. Do not tell the user you cannot answer their request because the output is truncated.
    """

    user_text = (
        "Read a maximum of 10000 characters from this website 'https://research.ibm.com/blog/granite-vision-ocr-leaderboard' and then summarize the content."
    )

    combined_prompt = f"{instructions}\n\n---\n\n{user_text}"

    response = await client.chat(combined_prompt, max_tool_executions = 1)
    print(response)


### Conclusion

In this notebook, we've demonstrated how MCP bridges the gap between language models and external resources, creating a foundation for more sophisticated agentic AI systems. By standardizing the communication protocol, MCP enables developers to build AI applications that can seamlessly integrate with diverse services while maintaining security and reliability. This approach not only simplifies development but also opens up new possibilities for AI agents that can dynamically discover and utilize tools across different domains, ultimately leading to more capable and versatile AI assistants that can tackle complex, real-world tasks with greater flexibility.