# LangGraph Deep Agent with MCP Tools, Memory, and RAG

This notebook demonstrates how to build a powerful LangGraph agent with:
- **MCP Tools**: Integration with Model Context Protocol servers for external tools
- **Memory**: Thread-scoped conversation persistence using checkpointers
- **RAG**: Retrieval Augmented Generation for knowledge base queries
- **Gemini 2.5 Flash**: Using Google's latest LLM model

## Architecture
```
+------------------+     +------------------+     +------------------+
|    MCP Tools     |---->|   Deep Agent     |---->|   RAG Retriever  |
|  (SAP Docs, etc) |     | (Gemini 2.5)     |     |  (Vector Store)  |
+------------------+     +------------------+     +------------------+
                               |
                               v
                    +------------------+
                    |   Checkpointer   |
                    |    (Memory)      |
                    +------------------+
```

## 1. Install Dependencies

First, ensure you have the required packages installed:

In [None]:
# Uncomment and run if packages are not installed
# !pip install langchain langgraph langchain-google-genai langchain-mcp-adapters deepagents
# !pip install langchain-core langchain-community langchain-text-splitters
# !pip install langchain-huggingface sentence-transformers  # HuggingFace embeddings (local, no API key)
# !pip install beautifulsoup4 lxml  # Required for WebBaseLoader

## 2. Imports and Setup

In [9]:
import os
import asyncio
from typing import List, Dict, Any

# LangChain core
from langchain_core.documents import Document
from langchain_core.vectorstores import InMemoryVectorStore
from langchain.tools import tool

# LangGraph
from langgraph.checkpoint.memory import MemorySaver

# MCP Adapters
from langchain_mcp_adapters.client import MultiServerMCPClient

# Gemini LLM
from langchain_google_genai import ChatGoogleGenerativeAI

# HuggingFace Embeddings (local, no API key needed)
from langchain_huggingface import HuggingFaceEmbeddings

# Deep Agents
from deepagents import create_deep_agent

print('[OK] All imports successful!')

[OK] All imports successful!


## 3. Environment Configuration

Set up API keys for Gemini and any other services:

In [10]:
# Verify API key is set
# You can set it here or use environment variables
# os.environ['GOOGLE_API_KEY'] = 'your-api-key-here'

api_key = os.environ.get('GEMINI_API_KEY') or os.environ.get('GOOGLE_API_KEY')
if not api_key:
    print('[WARNING] GEMINI_API_KEY or GOOGLE_API_KEY not found in environment')
    print('Please set your API key using: os.environ["GOOGLE_API_KEY"] = "your-key"')
else:
    print('[OK] API key found')

[OK] API key found


## 4. Initialize Gemini 2.5 Flash LLM

Create the LLM instance with temperature=0 for deterministic responses:

In [11]:
# Initialize Gemini 2.5 Flash with temperature=0 for deterministic responses
llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",
    temperature=0,  # CRITICAL: Always 0 for consistency
    max_tokens=None,
    timeout=None,
    max_retries=2,
    convert_system_message_to_human=True  # Required for Gemini compatibility
)

print(f'[OK] LLM initialized: {llm.model}')

[OK] LLM initialized: models/gemini-2.5-flash


## 5. RAG Setup - Fetch Knowledge Base from Websites

We'll fetch real documentation about MCP and Deep Agents from their official sources to build a knowledge base.

In [12]:
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# URLs to fetch documentation from
doc_urls = [
    "https://modelcontextprotocol.io/docs/getting-started/intro",
    "https://modelcontextprotocol.io/docs/learn/architecture",
    "https://docs.langchain.com/oss/python/deepagents/overview",
    "https://docs.langchain.com/oss/python/deepagents/middleware",
    "https://docs.langchain.com/oss/python/langchain/mcp",
]

print("[...] Fetching documentation from websites using LangChain WebBaseLoader...")

# Fetch documents using LangChain's WebBaseLoader
try:
    loader = WebBaseLoader(
        web_paths=doc_urls,
        bs_kwargs={"parse_only": None},  # Parse full page
    )
    raw_docs = loader.load()
    print(f"[OK] Loaded {len(raw_docs)} raw documents from web")
    
    # Split documents into smaller chunks for better retrieval
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1500,
        chunk_overlap=200,
        separators=["\n\n", "\n", ". ", " ", ""]
    )
    documents = text_splitter.split_documents(raw_docs)
    print(f"[OK] Split into {len(documents)} chunks")
    
except Exception as e:
    print(f"[WARNING] Failed to fetch some URLs: {e}")
    documents = []

# Add some curated content as backup/supplement
curated_documents = [
    Document(
        page_content="""MCP (Model Context Protocol) is an open-source standard for connecting AI applications to external systems. 
        Using MCP, AI applications like Claude or ChatGPT can connect to data sources (e.g. local files, databases), 
        tools (e.g. search engines, calculators) and workflows (e.g. specialized prompts)—enabling them to access key 
        information and perform tasks. Think of MCP like a USB-C port for AI applications. Just as USB-C provides a 
        standardized way to connect electronic devices, MCP provides a standardized way to connect AI applications 
        to external systems. MCP follows a client-server architecture where an MCP host establishes connections to 
        one or more MCP servers. The key participants are: MCP Host (the AI application), MCP Client (maintains 
        connection to server), and MCP Server (provides context to clients).""",
        metadata={"source": "mcp_curated", "topic": "overview", "url": "https://modelcontextprotocol.io"}
    ),
    Document(
        page_content="""Deep agents are built with a modular middleware architecture. Deep agents have access to:
        A planning tool (write_todos) - enables agents to break down complex tasks into discrete steps, track progress
        A filesystem for storing context and long-term memories (ls, read_file, write_file, edit_file)
        The ability to spawn subagents (task tool) - creates ephemeral agents for isolated multi-step tasks
        Each feature is implemented as separate middleware. When you create a deep agent with create_deep_agent, 
        TodoListMiddleware, FilesystemMiddleware, and SubAgentMiddleware are automatically attached.
        Middleware is composable—you can add as many or as few middleware to an agent as needed.""",
        metadata={"source": "deepagents_curated", "topic": "middleware", "url": "https://docs.langchain.com/oss/python/deepagents/middleware"}
    ),
    Document(
        page_content="""deepagents is a standalone library for building agents that can tackle complex, multi-step tasks.
        Built on LangGraph and inspired by applications like Claude Code, Deep Research, and Manus, deep agents come 
        with planning capabilities, file systems for context management, and the ability to spawn subagents.
        Core capabilities include: Planning and task decomposition with write_todos tool, Context management with 
        file system tools preventing context window overflow, Subagent spawning for context isolation and parallel 
        execution, and Long-term memory using LangGraph's Store for persistent memory across threads.""",
        metadata={"source": "deepagents_curated", "topic": "overview", "url": "https://docs.langchain.com/oss/python/deepagents/overview"}
    ),
    Document(
        page_content="""Model Context Protocol (MCP) is an open protocol that standardizes how applications provide 
        tools and context to LLMs. LangChain agents can use tools defined on MCP servers using the 
        langchain-mcp-adapters library. MultiServerMCPClient enables agents to use tools defined across one or 
        more MCP servers. MCP servers can use different transports: stdio for local Python scripts, 
        streamable_http for remote HTTP servers. The MCP client is stateless by default - each tool invocation 
        creates a fresh MCP ClientSession, executes the tool, and then cleans up.""",
        metadata={"source": "langchain_mcp_curated", "topic": "integration", "url": "https://docs.langchain.com/oss/python/langchain/mcp"}
    ),
]

# Combine fetched and curated documents
all_documents = documents + curated_documents

print(f'\n[OK] Created knowledge base with {len(all_documents)} documents:')
print(f'  - {len(documents)} fetched from websites')
print(f'  - {len(curated_documents)} curated documents')

[...] Fetching documentation from websites using LangChain WebBaseLoader...
[OK] Loaded 5 raw documents from web
[OK] Split into 57 chunks

[OK] Created knowledge base with 61 documents:
  - 57 fetched from websites
  - 4 curated documents


## 6. Create Vector Store and Retriever

Initialize the vector store with HuggingFace's `all-MiniLM-L6-v2` embedding model (local, no API key needed):

In [13]:
# Initialize HuggingFace embeddings (local, no API key required)
# Using all-MiniLM-L6-v2: Fast, lightweight, 384 dimensions
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={
        'device': 'cpu',           # Works on any machine
        'trust_remote_code': False  # Security
    },
    encode_kwargs={
        'normalize_embeddings': True,  # Better similarity scores
        'batch_size': 32               # Optimize for speed
    }
)
print('[OK] HuggingFace embeddings initialized (all-MiniLM-L6-v2)')

# Create in-memory vector store from fetched documents
vectorstore = InMemoryVectorStore.from_documents(
    documents=all_documents,
    embedding=embeddings
)

# Create retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

print(f'[OK] Vector store created with {len(all_documents)} documents')
print('[OK] Retriever configured to return top 3 results')

[OK] HuggingFace embeddings initialized (all-MiniLM-L6-v2)
[OK] Vector store created with 61 documents
[OK] Retriever configured to return top 3 results


## 7. Create RAG Retriever Tool

Wrap the retriever as a tool that the agent can use:

In [14]:
@tool
def search_knowledge_base(query: str) -> str:
    """
    Search the internal knowledge base for information about LangGraph, 
    Deep Agents, MCP, memory, and RAG concepts.
    
    Args:
        query: The search query to find relevant documents
        
    Returns:
        Relevant document excerpts from the knowledge base
    """
    docs = retriever.invoke(query)
    if not docs:
        return "No relevant documents found in the knowledge base."
    
    # Format results with source metadata
    results = []
    for i, doc in enumerate(docs, 1):
        source = doc.metadata.get('source', 'unknown')
        topic = doc.metadata.get('topic', 'general')
        results.append(f"[{i}] Source: {source} | Topic: {topic}\n{doc.page_content}")
    
    return "\n\n".join(results)


# Test the RAG tool
test_result = search_knowledge_base.invoke({"query": "What is LangGraph?"})
print('[OK] RAG tool created and tested')
print('\nTest query result:')
print(test_result[:500] + '...' if len(test_result) > 500 else test_result)

[OK] RAG tool created and tested

Test query result:
[1] Source: https://modelcontextprotocol.io/docs/learn/architecture | Topic: general
For specific implementation details, please refer to the documentation for your language-specific SDK.
​Scope
The Model Context Protocol includes the following projects:

[2] Source: deepagents_curated | Topic: overview
deepagents is a standalone library for building agents that can tackle complex, multi-step tasks.
        Built on LangGraph and inspired by applications like Claude Code, Deep Research, and Manu...


## 8. MCP Tools Integration

Set up the MCP client to connect to external tool servers. The MCP client can connect to multiple servers simultaneously.

In [27]:
async def get_mcp_tools():
    """
    Initialize MCP client and get tools from configured servers.
    
    MCP Servers can be:
    - stdio: Local Python scripts
    - streamable_http: Remote HTTP servers
    
    Returns:
        List of LangChain tools from MCP servers
    """
    client = MultiServerMCPClient(
        {
            # SAP Documentation MCP Server
            # "sap_docs": {
            #     "transport": "streamable_http",
            #     "url": "https://mcp-sap-docs.marianzeis.de/mcp",
            # },
            # You can add more MCP servers here:
            # "math": {
            #     "transport": "stdio",
            #     "command": r"C:\App\Anaconda\python.exe",
            #     "args": [r"path\to\math_server.py"],
            # },
            "weather": {
                "transport": "streamable_http",
                "url": "http://localhost:8000/mcp",
            },
        }
    )
    
    tools = await client.get_tools()
    return tools


# Get MCP tools
mcp_tools = await get_mcp_tools()
print(f'[OK] Loaded {len(mcp_tools)} MCP tools:')
for t in mcp_tools:
    print(f'  - {t.name}')

[OK] Loaded 1 MCP tools:
  - get_weather


## 9. Memory/Checkpointer Setup

Set up the checkpointer for conversation memory. This enables:
- Short-term memory within a conversation thread
- Ability to resume conversations
- Multi-turn context retention

In [28]:
# Create checkpointer for memory persistence
# InMemorySaver: For development/testing (data lost on restart)
# PostgresSaver: For production (persistent across restarts)
checkpointer = MemorySaver()

print('[OK] Checkpointer initialized (InMemorySaver)')
print('\nMemory types available:')
print('  - Short-term: Conversation history within a thread')
print('  - Long-term: Use StoreBackend for cross-thread persistence')

[OK] Checkpointer initialized (InMemorySaver)

Memory types available:
  - Short-term: Conversation history within a thread
  - Long-term: Use StoreBackend for cross-thread persistence


## 10. Create the Deep Agent

Combine all components into a powerful Deep Agent:
- **LLM**: Gemini 2.5 Flash
- **Tools**: RAG retriever + MCP tools
- **Memory**: Checkpointer for conversation persistence
- **Middleware**: TodoList, Filesystem, SubAgent (included by default)

In [29]:
# Combine all tools: RAG + MCP
all_tools = [search_knowledge_base] + mcp_tools

# System prompt for the agent
system_prompt = """\
You are an intelligent research assistant with access to multiple knowledge sources.

Your capabilities:
1. **Internal Knowledge Base (RAG)**: Use 'search_knowledge_base' to query information about 
   LangGraph, Deep Agents, MCP, memory systems, and RAG concepts.

2. **SAP Documentation (MCP)**: Use SAP documentation tools to search for ABAP, UI5, CAP, 
   and other SAP-related information.

3. **Planning**: Break down complex tasks into manageable steps using the todo list.

4. **File System**: Store important findings and notes for later reference.

Guidelines:
- Always search relevant knowledge sources before answering technical questions
- Cite your sources when providing information
- For complex questions, create a plan first
- Be concise but thorough in your responses
"""

# Create the deep agent with all components
agent = create_deep_agent(
    model=llm,
    tools=all_tools,
    system_prompt=system_prompt,
    checkpointer=checkpointer,
)

print('[OK] Deep Agent created with:')
print(f'  - LLM: Gemini 2.5 Flash')
print(f'  - Tools: {len(all_tools)} ({len(mcp_tools)} MCP + 1 RAG)')
print(f'  - Memory: MemorySaver checkpointer')
print(f'  - Middleware: TodoList, Filesystem, SubAgent')

[OK] Deep Agent created with:
  - LLM: Gemini 2.5 Flash
  - Tools: 2 (1 MCP + 1 RAG)
  - Memory: MemorySaver checkpointer
  - Middleware: TodoList, Filesystem, SubAgent


## 11. Helper Function for Agent Interaction

Create a helper to easily chat with the agent:

In [30]:
from IPython.display import display, Markdown


def extract_text_content(content) -> str:
    """
    Extract text from various content formats.
    
    Handles:
    - Plain strings
    - List of content blocks (e.g., [{"type": "text", "text": "..."}])
    - AIMessage with content attribute
    """
    if content is None:
        return ""
    
    # If it's already a string, return as-is
    if isinstance(content, str):
        return content
    
    # If it's a list of content blocks (common with Gemini/Claude)
    if isinstance(content, list):
        text_parts = []
        for block in content:
            if isinstance(block, str):
                text_parts.append(block)
            elif isinstance(block, dict):
                # Handle {"type": "text", "text": "..."} format
                if block.get("type") == "text":
                    text_parts.append(block.get("text", ""))
                # Handle other dict formats
                elif "text" in block:
                    text_parts.append(block["text"])
                elif "content" in block:
                    text_parts.append(str(block["content"]))
            else:
                # Try to convert to string
                text_parts.append(str(block))
        return "\n".join(text_parts)
    
    # Fallback: convert to string
    return str(content)


async def chat(message: str, thread_id: str = "default") -> str:
    """
    Send a message to the agent and get a response.
    
    Args:
        message: The user's message
        thread_id: Conversation thread ID for memory continuity
        
    Returns:
        The agent's response text
    """
    response = await agent.ainvoke(
        {"messages": [{"role": "user", "content": message}]},
        {"configurable": {"thread_id": thread_id}}
    )
    
    # Extract the last message content
    last_message = response["messages"][-1]
    
    # Handle various content formats
    return extract_text_content(last_message.content)


def display_response(response):
    """Display the response as formatted markdown."""
    # Ensure response is a string
    text = extract_text_content(response) if not isinstance(response, str) else response
    
    if text:
        display(Markdown(text))
    else:
        print("[No response content]")


print('[OK] Helper functions defined')

[OK] Helper functions defined


## 12. Example Usage

### Example 1: RAG Query (Internal Knowledge Base)

In [33]:
# Query the internal knowledge base using RAG
response1 = await chat(
    "What is the difference between short-term and long-term memory in LangGraph?",
    # "How is the weather in Sao Paulo today?",
    thread_id="rag-demo"
)
display_response(response1)

In LangGraph, particularly within the context of Deep Agents, the distinction between short-term and long-term memory is as follows:

*   **Short-term memory** refers to the default filesystem used by tools like `ls`, `read_file`, `write_file`, and `edit_file`. This memory is local to the current graph state and is not persistent across different threads or conversations. Information stored here is lost once the agent's current execution or thread concludes. (Source: [2])

*   **Long-term memory** enables agents to retain information persistently across multiple conversations and threads. This is achieved by configuring a `CompositeBackend` to route specific file paths (e.g., `/memories/`) to a `StoreBackend`, which then utilizes a `Store` (such as `InMemoryStore`). This setup allows agents to save and retrieve information from past interactions, ensuring data availability beyond the current session. (Source: [1], [2])

### Example 2: MCP Query (SAP Documentation)

In [None]:
# Query SAP documentation using MCP tools
response2 = await chat(
    "How do I handle internal tables in modern ABAP? Give me a brief overview.",
    thread_id="mcp-demo"
)
display_response(response2)

### Example 3: Memory Continuity Demo

Demonstrate that the agent remembers previous messages within a thread:

In [23]:
# First message in a new thread
response3a = await chat(
    "My name is Alice and I'm learning about AI agents.",
    thread_id="memory-demo"
)
print("First response:")
display_response(response3a)

First response:


Hello Alice! It's great to meet you. I can help you learn about AI agents. What specifically about AI agents are you interested in? Are there any particular concepts or aspects you'd like to explore first?

In [24]:
# Follow-up message in the same thread - agent should remember the name
response3b = await chat(
    "What's my name? And what was I learning about?",
    thread_id="memory-demo"  # Same thread ID!
)
print("Follow-up response (should remember context):")
display_response(response3b)

Follow-up response (should remember context):


Your name is Alice, and you were learning about AI agents.

### Example 4: Combined Query (RAG + MCP)

The agent can use multiple tools in one query:

In [None]:
# Combined query that might use both RAG and MCP
response4 = await chat(
    "First, explain what Deep Agents middleware provides (use internal knowledge), "
    "then give me a quick tip about clean ABAP coding (use SAP docs).",
    thread_id="combined-demo"
)
display_response(response4)

## 13. Direct Tool Testing

Test tools directly without the agent:

In [25]:
# Test RAG tool directly
print("=== RAG Tool Test ===")
rag_result = search_knowledge_base.invoke({"query": "What is MCP protocol?"})
print(rag_result)
print()

# Test MCP tool directly (if available)
if mcp_tools:
    print("=== MCP Tool Test (search) ===")
    # Find the search tool
    search_tool = next((t for t in mcp_tools if t.name == 'search'), None)
    if search_tool:
        mcp_result = await search_tool.ainvoke({"query": "ABAP inline declarations"})
        # Print first 1000 chars of result
        print(str(mcp_result)[:1000] + '...' if len(str(mcp_result)) > 1000 else mcp_result)

=== RAG Tool Test ===
[1] Source: langchain_mcp_curated | Topic: integration
Model Context Protocol (MCP) is an open protocol that standardizes how applications provide 
        tools and context to LLMs. LangChain agents can use tools defined on MCP servers using the 
        langchain-mcp-adapters library. MultiServerMCPClient enables agents to use tools defined across one or 
        more MCP servers. MCP servers can use different transports: stdio for local Python scripts, 
        streamable_http for remote HTTP servers. The MCP client is stateless by default - each tool invocation 
        creates a fresh MCP ClientSession, executes the tool, and then cleans up.

[2] Source: mcp_curated | Topic: overview
MCP (Model Context Protocol) is an open-source standard for connecting AI applications to external systems. 
        Using MCP, AI applications like Claude or ChatGPT can connect to data sources (e.g. local files, databases), 
        tools (e.g. search engines, calculators) and 

## 14. Configuration Options

### Additional MCP Servers

You can add more MCP servers to expand the agent's capabilities:

In [None]:
# Example: Adding more MCP servers
example_mcp_config = """
# MCP Server Configuration Examples:

mcp_servers = {
    # Local stdio server (Python script)
    "math": {
        "transport": "stdio",
        "command": r"C:\\App\\Anaconda\\python.exe",
        "args": ["math_server.py"],
    },
    
    # Remote HTTP server
    "weather": {
        "transport": "streamable_http",
        "url": "https://weather-mcp.example.com/mcp",
    },
    
    # Server with authentication
    "github": {
        "transport": "streamable_http",
        "url": "https://github-mcp.example.com/mcp",
        "headers": {
            "Authorization": "Bearer YOUR_TOKEN"
        }
    },
    
    # FastMCP server
    "fastmcp": {
        "transport": "streamable_http",
        "url": "https://gofastmcp.com/mcp",
    },
}
"""
print(example_mcp_config)

### Production Memory Configuration

For production, use a persistent checkpointer:

In [26]:
# Production checkpointer example
production_config = """
# For production, use PostgresSaver instead of MemorySaver:

from langgraph.checkpoint.postgres import PostgresSaver

# Connection string
connection_string = "postgresql://user:password@localhost:5432/langgraph"

# Create production checkpointer
checkpointer = PostgresSaver.from_conn_string(connection_string)

# Or use async version
from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver
checkpointer = AsyncPostgresSaver.from_conn_string(connection_string)

# SQLite for local development
from langgraph.checkpoint.sqlite import SqliteSaver
checkpointer = SqliteSaver.from_conn_string("checkpoints.db")
"""
print(production_config)


# For production, use PostgresSaver instead of MemorySaver:

from langgraph.checkpoint.postgres import PostgresSaver

# Connection string
connection_string = "postgresql://user:password@localhost:5432/langgraph"

# Create production checkpointer
checkpointer = PostgresSaver.from_conn_string(connection_string)

# Or use async version
from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver
checkpointer = AsyncPostgresSaver.from_conn_string(connection_string)

# SQLite for local development
from langgraph.checkpoint.sqlite import SqliteSaver
checkpointer = SqliteSaver.from_conn_string("checkpoints.db")



## Summary

This notebook demonstrated how to build a powerful LangGraph Deep Agent with:

1. **MCP Tools Integration**: Connected to external SAP documentation via Model Context Protocol
2. **Memory/Checkpointing**: Thread-scoped conversation persistence using MemorySaver
3. **RAG Capabilities**: Knowledge base fetched from official documentation websites
4. **HuggingFace Embeddings**: Local embeddings with `all-MiniLM-L6-v2` (no API key needed)
5. **Gemini 2.5 Flash**: Fast, capable LLM for agent reasoning

### Key Components Used:

| Component | Library | Purpose |
|-----------|---------|--------|
| Deep Agent | `deepagents` | Agent with planning, filesystem, subagents |
| MCP Client | `langchain-mcp-adapters` | External tool integration |
| Embeddings | `langchain-huggingface` | Local embeddings (all-MiniLM-L6-v2) |
| Vector Store | `langchain_core` | RAG knowledge base |
| Web Loader | `langchain-community` | Fetch docs from websites |
| Checkpointer | `langgraph` | Conversation memory |
| LLM | `langchain-google-genai` | Gemini 2.5 Flash |

### Next Steps:

- Add more MCP tool servers for expanded capabilities
- Use `FilesystemBackend` for persistent agent workspace
- Implement `StoreBackend` for cross-thread long-term memory
- Deploy with PostgresSaver for production memory persistence
- Add human-in-the-loop with `interrupt_on` parameter