## 1. Verify Pre-installed Packages

First, let's check which packages are already installed. This helps identify what needs to be installed or upgraded.

The key packages for this notebook are:
- **langgraph**: LangChain's graph-based agent execution framework
- **langchain-aws**: AWS integrations for LangChain (Bedrock support)
- **langchain-mcp-adapters**: Bridges MCP tools to LangChain's tool format
- **mcp**: The Model Context Protocol client library

# Neo4j MCP Agent with LangGraph

Query a Neo4j graph database using natural language with **LangGraph** and **AgentCore Gateway MCP**.

## Overview

This notebook demonstrates how to build an AI agent that can query a Neo4j graph database using natural language. The agent uses:

- **LangGraph**: LangChain's graph-based agent framework for building complex AI workflows
- **Model Context Protocol (MCP)**: A standard protocol for connecting AI agents to external tools
- **Neo4j MCP Server**: Exposes Neo4j databases through MCP, enabling schema introspection and Cypher query execution
- **Amazon Bedrock**: Provides the Claude LLM for reasoning and query generation

**How it works**: When you ask a question, the agent first retrieves the database schema to understand what data exists, then generates and executes Cypher queries to answer your question. LangGraph's ReAct agent pattern handles the reasoning loop automatically.

## Prerequisites

Before running this notebook, ensure you have:

1. **Configured `CONFIG.txt`** with the following values:
   - `MCP_GATEWAY_URL`: The AgentCore Gateway endpoint URL
   - `MCP_ACCESS_TOKEN`: Your authentication token for the gateway
   
2. **Obtained credentials** either from:
   - Your workshop host, or
   - Deploying the Neo4j MCP Server yourself using [aws-starter](https://github.com/neo4j-partners/aws-starter)

3. **AWS credentials** configured for Amazon Bedrock access (handled automatically in SageMaker)

See the [README](./README.md) for detailed setup instructions.

In [None]:
import importlib.metadata

packages = [
    "langchain",
    "langchain-core",
    "langgraph",
    "langchain-aws",
    "langchain-mcp-adapters",
    "mcp",
    "httpx",
    "boto3",
]

print("Pre-installed packages:")
print("-" * 50)
for pkg in packages:
    try:
        version = importlib.metadata.version(pkg)
        print(f"{pkg:30} {version}")
    except importlib.metadata.PackageNotFoundError:
        print(f"{pkg:30} NOT INSTALLED")

In [None]:
# Install packages
%pip install -U langgraph langchain-aws langchain-mcp-adapters -q

## 2. Imports

Import the required libraries for building the agent:

- **ChatBedrockConverse**: LangChain's chat model wrapper for Amazon Bedrock's Converse API
- **load_mcp_tools**: Converts MCP tools into LangChain tool format
- **create_react_agent**: Creates a ReAct (Reasoning + Acting) agent that can use tools
- **streamablehttp_client**: MCP's modern HTTP transport for server communication
- **ClientSession**: Manages the MCP protocol session lifecycle

In [None]:
import asyncio
import concurrent.futures
import warnings
from datetime import timedelta

from langchain_aws import ChatBedrockConverse
from langchain_mcp_adapters.tools import load_mcp_tools

# Suppress known bug: https://github.com/langchain-ai/langgraph/issues/6404
warnings.filterwarnings("ignore", message="create_react_agent has been moved")
from langgraph.prebuilt import create_react_agent

from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client

print("All imports successful!")

## 3. Configuration

Load the model and MCP Gateway credentials from `CONFIG.txt`. The configuration includes:

- **MODEL_ID**: The Bedrock model to use for reasoning (e.g., Claude). For cross-region inference profiles (starting with `us.`), we also derive the base model ID.
- **REGION**: AWS region for Bedrock API calls
- **MCP_GATEWAY_URL**: The AgentCore Gateway endpoint that routes requests to the Neo4j MCP Server
- **MCP_ACCESS_TOKEN**: Bearer token for authenticating with the gateway

The gateway URL and token are provided by your workshop host or generated when you deploy your own MCP server.

In [None]:
# Load configuration from CONFIG.txt
from dotenv import load_dotenv
import os

load_dotenv("../CONFIG.txt")

MODEL_ID = os.getenv("MODEL_ID")
REGION = os.getenv("REGION", "us-west-2")
GATEWAY_URL = os.getenv("MCP_GATEWAY_URL")
ACCESS_TOKEN = os.getenv("MCP_ACCESS_TOKEN")

# Derive BASE_MODEL_ID for Claude inference profiles
# us.anthropic.claude-* -> anthropic.claude-*
if MODEL_ID and MODEL_ID.startswith("us.anthropic."):
    BASE_MODEL_ID = MODEL_ID.replace("us.anthropic.", "anthropic.")
else:
    BASE_MODEL_ID = None

print(f"Model:   {MODEL_ID}")
if BASE_MODEL_ID:
    print(f"Base:    {BASE_MODEL_ID}")
print(f"Region:  {REGION}")

# Validate gateway credentials
if not GATEWAY_URL or "your-" in GATEWAY_URL:
    print("\nWARNING: Set MCP_GATEWAY_URL in CONFIG.txt before running MCP cells")
elif not ACCESS_TOKEN or "your-" in ACCESS_TOKEN:
    print("\nWARNING: Set MCP_ACCESS_TOKEN in CONFIG.txt before running MCP cells")
else:
    print(f"Gateway: {GATEWAY_URL[:50]}...")
    print("Configuration OK!")

## 4. System Prompt

Define the system prompt that guides the agent's behavior. This prompt is critical for getting good results:

**Key instructions in the prompt:**
1. **Schema-first**: Always retrieve the schema before querying to understand available node labels and relationship types
2. **Read-only**: Only execute read queries to prevent accidental data modification
3. **Result formatting**: Present results in a clear, human-readable format
4. **Error handling**: If a query returns no results, explain what was searched

The prompt also includes Cypher best practices to help the LLM generate valid queries.

In [None]:
SYSTEM_PROMPT = """You are a helpful Neo4j database assistant with access to tools that let you query a Neo4j graph database.

Your capabilities include:
- Retrieve the database schema to understand node labels, relationship types, and properties
- Execute read-only Cypher queries to answer questions about the data
- Do not execute any write Cypher queries

When answering questions about the database:
1. First retrieve the schema to understand the database structure
2. Formulate appropriate Cypher queries based on the actual schema
3. If a query returns no results, explain what you looked for and suggest alternatives
4. Format results in a clear, human-readable way
5. Cite the actual data returned in your response

Important Cypher notes:
- Use MATCH patterns that align with the actual schema
- For counting, use MATCH (n:Label) RETURN count(n)
- For listing items, add LIMIT to avoid overwhelming results
- Handle potential NULL values gracefully

Be concise but thorough in your responses."""

## 5. Initialize LLM

Create the LangChain chat model connected to Amazon Bedrock.

**ChatBedrockConverse** uses Bedrock's Converse API, which provides a unified interface across different model providers. Key settings:

- **temperature=0**: Makes responses deterministic (important for consistent Cypher generation)
- **base_model_id**: Required for cross-region inference profiles to enable tool use

In [None]:
# Build LLM config - add base_model_id for Claude inference profiles
llm_kwargs = {
    "model": MODEL_ID,
    "region_name": REGION,
    "temperature": 0,
}

if BASE_MODEL_ID:
    llm_kwargs["base_model_id"] = BASE_MODEL_ID

llm = ChatBedrockConverse(**llm_kwargs)

print(f"LLM initialized with {MODEL_ID}!")

## 6. Query Helper

Create helper functions that handle the async/sync complexity of MCP and LangGraph.

**Why the complexity?** 
- MCP uses async I/O for network efficiency
- Jupyter notebooks have their own event loop that can conflict with async code
- The solution: run async code in a separate thread with its own event loop

**What `query_async` does:**
1. Opens an authenticated HTTP connection to the MCP Gateway
2. Initializes an MCP session and loads available tools
3. Creates a LangGraph ReAct agent with the LLM and tools
4. Invokes the agent with your question
5. Returns the final response

The **ReAct pattern** (Reasoning + Acting) lets the agent iteratively think, call tools, observe results, and decide next steps until it has enough information to answer.

In [None]:
async def query_async(question: str) -> str:
    """Ask the agent a question about the Neo4j database."""
    headers = {"Authorization": f"Bearer {ACCESS_TOKEN}"}
    
    async with streamablehttp_client(
        GATEWAY_URL,
        headers,
        timeout=timedelta(seconds=120),
        terminate_on_close=False
    ) as (read_stream, write_stream, _):
        async with ClientSession(read_stream, write_stream) as session:
            await session.initialize()
            tools = await load_mcp_tools(session)
            
            agent = create_react_agent(
                model=llm,
                tools=tools,
                prompt=SYSTEM_PROMPT,
            )
            
            result = await agent.ainvoke({
                "messages": [{"role": "user", "content": question}]
            })
            
            messages = result.get("messages", [])
            if messages:
                last_msg = messages[-1]
                return getattr(last_msg, "content", str(last_msg))
            return "No response"


def _run_async(coro):
    """Run async code in a new event loop in a separate thread."""
    loop = asyncio.new_event_loop()
    try:
        return loop.run_until_complete(coro)
    finally:
        loop.close()


def query(question: str) -> str:
    """Ask the agent a question about the Neo4j database."""
    print("=" * 70)
    print(f"Q: {question}")
    print("=" * 70)
    
    # Run in a separate thread to avoid Jupyter's event loop conflicts
    with concurrent.futures.ThreadPoolExecutor() as executor:
        future = executor.submit(_run_async, query_async(question))
        answer = future.result(timeout=180)
    
    print(f"\nA: {answer}")
    return answer

## 7. Demo Queries

Run these sample queries to see the agent in action. Watch the output to observe:

- **Tool calls**: Which MCP tools the agent invokes and in what order
- **Cypher generation**: The queries the LLM creates based on the schema
- **Result synthesis**: How raw data is transformed into natural language answers

Each query demonstrates a different capability—from simple schema inspection to relationship traversal.

In [None]:
_ = query("What is the database schema? Give me a brief summary.")

In [None]:
_ = query("How many nodes are in the database by label?")

In [None]:
_ = query("What types of relationships exist in the database?")

## 8. Your Queries

Try your own natural language questions! Some ideas based on the manufacturing dataset:

- "What requirements does the HVB_3900 component have?"
- "What defects have been detected and what are their severities?"
- "Which technology domains does the R2D2 product cover?"
- "What components belong to the Electric Powertrain domain?"
- "What changes affect battery-related requirements?"

The agent will figure out the right Cypher query—you don't need to know the syntax.

In [None]:
_ = query("List 5 sample records from the most populated node type.")

In [None]:
# Your custom query
# _ = query("Your question here")