In [None]:
!pip install --quiet --upgrade langchain langchain-neo4j langchain-openai langchain-mcp-adapters

The **LangChain framework for Python** is a toolkit for building applications powered by large language models. It provides composable chains and agents, a vast integration ecosystem, memory and retrieval systems, and production essentials like callbacks, tracing, and evaluation tools.

In this notebook, we'll build a company research agent that queries a Neo4j graph database.

In [2]:
import base64
import json

from pydantic import BaseModel

from langchain_core.tools import StructuredTool
from langchain.agents import create_agent
from langchain.tools import tool
from langchain_mcp_adapters.client import MultiServerMCPClient
from langchain_neo4j import Neo4jGraph, Neo4jVector, AsyncNeo4jSaver
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

LangChain integrates with virtually every major LLM provider like OpenAI, Anthropic, Google, Cohere, Mistral, AWS Bedrock, Azure, and many more. This makes it easy to swap models or run comparisons without rewriting your application logic.

In this example, we'll use OpenAI as our LLM provider, specifically **GPT-5.1**

In [3]:
import os
from getpass import getpass

os.environ["OPENAI_API_KEY"] = getpass("OpenAI API key")

OpenAI API key··········


In [4]:
model =  ChatOpenAI(model="gpt-5.1")

## Neo4j MCP Server

We'll start by using the official [Neo4j MCP Server](https://github.com/neo4j/mcp) to extend the agent with Neo4j tools. This MCP server provides the agent with capabilities to read the graph schema and execute Cypher queries, enabling it to fetch and analyze data directly from the database.

The following code installs the latest version on Google Colab and similar Linux-based systems. For other operating systems, please consult the [official installation documentation](https://neo4j.com/docs/mcp/current/installation/).

In [5]:
import requests

# Get latest release info from GitHub API
release = requests.get("https://api.github.com/repos/neo4j/mcp/releases/latest").json()
version = release["tag_name"]
print(f"Latest version: {version}")

# Download the latest Linux binary
!wget -q https://github.com/neo4j/mcp/releases/download/{version}/neo4j-mcp_Linux_x86_64.tar.gz

# Extract
!tar -xzf neo4j-mcp_Linux_x86_64.tar.gz

# Make executable
!chmod +x neo4j-mcp

# Cleanup
!rm neo4j-mcp_Linux_x86_64.tar.gz

# Move
!mv neo4j-mcp /usr/local/bin/

# Verify installation
!neo4j-mcp -v

Latest version: v1.2.0
neo4j-mcp version: v1.2.0


For this example, we'll use the companies database from the Neo4j demo server, which contains organizations, people, investors, and news articles.

For HTTP transport, you only need to set the `NEO4J_URI` and optionally `NEO4J_DATABASE` (if connecting to a specific database).

In [6]:
os.environ["NEO4J_URI"] = "neo4j+s://demo.neo4jlabs.com"
os.environ["NEO4J_DATABASE"] = "companies"
os.environ["NEO4J_MCP_TRANSPORT"] = "http"

# Run the server in the background
import subprocess
subprocess.Popen(["neo4j-mcp"], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)

<Popen: returncode: None args: ['neo4j-mcp']>

Credentials are passed via the `Authorization` header using Basic authentication. The HTTP transport listens on port 80 by default, so the MCP endpoint is available at `http://localhost:80/mcp`.

In [7]:
# Credentials are passed via bearer auth
os.environ["NEO4J_USERNAME"] = "companies"
os.environ["NEO4J_PASSWORD"] = "companies"

credentials = base64.b64encode(f"{os.environ["NEO4J_USERNAME"]}:{os.environ["NEO4J_PASSWORD"]}".encode()).decode()

cypher_mcp_config = {
    "neo4j-database": {
        "transport": "http",
        "url": "http://localhost:80/mcp",
        "headers": {
            "Authorization": f"Basic {credentials}"
        },
    }
}

With the MCP server running, we initialize a client to connect to it and retrieve the available tools. These tools will allow our agent to query the Neo4j database.

In [8]:
# If there is an error, just rerun as the MCP server might not be running yet

client = MultiServerMCPClient(cypher_mcp_config)
mcp_tools = await client.get_tools()

## Temporary Workaround

The `get-schema` and `list-gds-procedures` tools return an invalid `args_schema` format (`{'type': 'object'}` dict instead of a Pydantic `BaseModel` class). This causes validation errors when the tools are invoked.

The following workaround replaces the malformed schema in `langchain` with a proper empty Pydantic model until the upstream MCP adapter is fixed.

In [9]:
# Create an empty schema for tools with no arguments
class EmptySchema(BaseModel):
    """Schema for tools that take no arguments."""
    pass

def fix_empty_args_schema(tools: list[StructuredTool]) -> list[StructuredTool]:
    """Fix tools that have dict-based empty schemas instead of proper Pydantic models."""
    fixed_tools = []
    for tool in tools:
        # Check if args_schema is a dict (incorrect) instead of a BaseModel class
        if isinstance(tool.args_schema, dict) and tool.args_schema == {'type': 'object'}:
            # Create a new tool with the proper empty schema
            fixed_tool = StructuredTool(
                name=tool.name,
                description=tool.description,
                args_schema=EmptySchema,
                metadata=tool.metadata,
                response_format=tool.response_format,
                coroutine=tool.coroutine,
                func=tool.func,
            )
            fixed_tools.append(fixed_tool)
        else:
            fixed_tools.append(tool)
    return fixed_tools

# Usage:
mcp_tools = fix_empty_args_schema(mcp_tools)

We define a system prompt that instructs the agent on its role and capabilities. The `create_agent` function constructs a **ReAct-style** agent that follows a reasoning loop: it observes the current state, decides which tool to use (if any), executes the tool, and incorporates the result into its next step. This architecture allows the agent to chain multiple tool calls together to answer complex questions.

In [10]:
system_prompt = """
You are a helpful assistant with access to a Neo4j graph database containing company data.
Use the available tools to query the database and answer questions.
"""

agent = create_agent(model, mcp_tools, system_prompt=system_prompt)

Let's test it!

In [11]:
prompt = "How many people are in the database?"

async for event in agent.astream(
    {"messages": [{"role": "user", "content": prompt}]},
    stream_mode="values",
):
    event["messages"][-1].pretty_print()


How many people are in the database?
Tool Calls:
  read-cypher (call_yT5PMB85MOU0WSpjb73EsMCD)
 Call ID: call_yT5PMB85MOU0WSpjb73EsMCD
  Args:
    query: MATCH (p:Person) RETURN count(p) AS people
Name: read-cypher

[{'type': 'text', 'text': '[\n  {\n    "people": 8064\n  }\n]', 'id': 'lc_28e9bb93-3c45-4523-b027-15d31e04abc3'}]

There are 8,064 people in the database.


# Custom tools

Beyond using existing MCP servers, you can also implement your own custom tools and add them directly to the agent. This allows you to create specialized functionality tailored to your specific use case. Custom tools can be implemented using the `@tool` decorator, which turns any function into a tool the agent can invoke.

Here, we use `Neo4jGraph` from the `langchain-neo4j` package, a direct integration in the LangChain ecosystem, to establish a connection to our database and build a tool that queries investment relationships, giving you more control over the query logic.


In [12]:
neo4j_graph = Neo4jGraph(refresh_schema=False)

@tool
async def get_investments(company: str) -> str:
    """Returns the investments by a company by name. Returns list of investment ids, names and types."""
    try:
        results = neo4j_graph.query("""
            MATCH (o:Organization)-[:HAS_INVESTOR]->(i)
            WHERE o.name = $company
            RETURN i.id as id, i.name as name, head(labels(i)) as type
        """, {"company": company})
        return json.dumps(results, indent=2)
    except Exception as e:
        raise Exception(f"Error fetching investments: {str(e)}")

The `langchain-neo4j` package also provides `Neo4jVector`, a vector store integration that enables semantic search over your graph data. Here, we connect to an existing vector index and create a tool that uses OpenAI embeddings to search for relevant news chunks.

In [13]:
vector_store = Neo4jVector.from_existing_index(
    OpenAIEmbeddings(),
    index_name="news",
    node_label="Chunk",
    retrieval_query="""
    MATCH (node)<-[:HAS_CHUNK]-(a:Article)
    RETURN node.text AS text, score, {date: a.date} AS metadata
    """
)

@tool
def retrieve_news(query: str) -> str:
    """Search for relevant news articles. Returns up to 5 articles with their source metadata and content."""
    retrieved_docs = vector_store.similarity_search(query, k=5)
    serialized = "\n\n".join(
        (f"Source: {doc.metadata}\nContent: {doc.page_content}")
        for doc in retrieved_docs
    )
    return serialized

We combine the MCP tools with our custom tools into a single list and create a new agent with access to all of them.

In [14]:
custom_tools = mcp_tools + [get_investments, retrieve_news]
# If desired, specify custom instructions
prompt = (
    "You are a helpful assistant with access to a Neo4j graph database containing company data. Use the available tools to query the database and answer questions."
)
custom_agent = create_agent(model, custom_tools, system_prompt=prompt)

Let's test it!

In [15]:
prompt = "Which companies did Google invest in?"

async for event in custom_agent.astream(
    {"messages": [{"role": "user", "content": prompt}]},
    stream_mode="values",
):
    event["messages"][-1].pretty_print()


Which companies did Google invest in?
Tool Calls:
  get_investments (call_twCNXWBXezSYoJJH6IouLm6p)
 Call ID: call_twCNXWBXezSYoJJH6IouLm6p
  Args:
    company: Google
Name: get_investments

[
  {
    "id": "ELsv5bECSOiWG_Uhf_txI2w",
    "name": "Ionic Security",
    "type": "Organization"
  },
  {
    "id": "EUkm62r-bMOidNtPjTkdVvg",
    "name": "Avere Systems",
    "type": "Organization"
  },
  {
    "id": "EX-RLztfkOFqTLoM6xIVnlg",
    "name": "FlexiDAO",
    "type": "Organization"
  },
  {
    "id": "EtqXbQ9LaMGq8om4dhYY0Fw",
    "name": "Cloudflare",
    "type": "Organization"
  },
  {
    "id": "EWIvDLNCSMCCBYUyz0oFPVQ",
    "name": "Trifacta",
    "type": "Organization"
  }
]

Google has invested in the following companies in this dataset:

1. Ionic Security  
2. Avere Systems  
3. FlexiDAO  
4. Cloudflare  
5. Trifacta  

If you’d like, I can look up more details (e.g., what these companies do or their other investors).


## Short-term memory - Checkpoint saver

LangChain agents are stateless by default—each invocation starts fresh with no memory of previous interactions. For multi-turn conversations, you need a **checkpointer** to persist the agent's state between calls.

The `langchain-neo4j` package provides `Neo4jSaver` and `AsyncNeo4jSaver`, which store conversation checkpoints directly in Neo4j. This enables:
- **Conversation continuity** across multiple interactions
- **Session recovery** if the application restarts

In [16]:
CHECKPOINT_NEO4J_URI = "bolt://44.203.8.239:7687"
CHECKPOINT_NEO4J_USERNAME = "neo4j"
CHECKPOINT_NEO4J_PASSWORD = "reenlistment-quarts-battleship"
CHECKPOINT_NEO4J_DATABASE = "neo4j"

async with await AsyncNeo4jSaver.from_conn_string(
    uri=CHECKPOINT_NEO4J_URI,
    user=CHECKPOINT_NEO4J_USERNAME,
    password=CHECKPOINT_NEO4J_PASSWORD,
    database=CHECKPOINT_NEO4J_DATABASE
) as checkpointer:
    await checkpointer.setup()

    agent = create_agent(
        model,
        custom_tools,
        system_prompt=prompt,
        checkpointer=checkpointer,
    )

    async for event in agent.astream(
        {"messages": [{"role": "user", "content": prompt}]},
        {"configurable": {"thread_id": "11"}},
        stream_mode="values",
    ):
        event["messages"][-1].pretty_print()


Which companies did Google invest in?
Tool Calls:
  get_investments (call_iU3Nm1AwB3uJMRGVGx1VRsOV)
 Call ID: call_iU3Nm1AwB3uJMRGVGx1VRsOV
  Args:
    company: Google
Name: get_investments

[
  {
    "id": "ELsv5bECSOiWG_Uhf_txI2w",
    "name": "Ionic Security",
    "type": "Organization"
  },
  {
    "id": "EUkm62r-bMOidNtPjTkdVvg",
    "name": "Avere Systems",
    "type": "Organization"
  },
  {
    "id": "EX-RLztfkOFqTLoM6xIVnlg",
    "name": "FlexiDAO",
    "type": "Organization"
  },
  {
    "id": "EtqXbQ9LaMGq8om4dhYY0Fw",
    "name": "Cloudflare",
    "type": "Organization"
  },
  {
    "id": "EWIvDLNCSMCCBYUyz0oFPVQ",
    "name": "Trifacta",
    "type": "Organization"
  }
]

Based on the data available here, Google has invested in the following companies:

1. Ionic Security  
2. Avere Systems  
3. FlexiDAO  
4. Cloudflare  
5. Trifacta  

If you want, I can add brief descriptions of what each of these companies does.



The `thread_id` in the config uniquely identifies a conversation. Messages with the same thread ID share context, while different thread IDs maintain separate conversation histories.

---

## Summary

In this notebook, we built a company research agent using LangChain with Neo4j:

1. **MCP Integration** — Connected to Neo4j using the official Neo4j MCP server for schema reading and Cypher queries
2. **ReAct Agent** — Created a reasoning agent with `create_agent` that chains tool calls to answer complex questions
3. **Custom Tools** — Built specialized tools using the `@tool` decorator with direct `Neo4jGraph` or `Neo4jVector` integrations
4. **Short-term Memory** — Added conversation persistence with `AsyncNeo4jSaver` to enable multi-turn interactions

The LangChain framework makes it straightforward to combine MCP servers with custom tools, swap LLM providers, persist conversation state, and build composable agent workflows.