# GraphRAG with MongoDB and LangChain

This notebook is a companion to the [GraphRAG with MongoDB and LangChain](https://www.mongodb.com/docs/atlas/atlas-vector-search/ai-integrations/langchain/graph-rag/) tutorial. Refer to the page for set-up instructions and detailed explanations.

This notebook demonstrates a GraphRAG implementation using MongoDB Atlas and LangChain. Compared to vector-based RAG which structures your data as vector embeddings, GraphRAG structures data as a knowledge graph with entities and their relationships. This enables relationship-aware retrieval and multi-hop reasoning.

<a target="_blank" href="https://colab.research.google.com/github/mongodb/docs-notebooks/blob/main/ai-integrations/langchain-graphrag.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [None]:
pip install --quiet --upgrade pymongo langchain_community wikipedia langchain_openai langchain_mongodb pyvis

## Set up your environment

Before you begin, make sure you have the following:

- An Atlas cluster up and running (you'll need the [connection string](https://www.mongodb.com/docs/guides/atlas/connection-string/))
- An API key to access an LLM (This tutorial uses a model from OpenAI, but it can be any model [supported by LangChain](https://python.langchain.com/docs/integrations/chat/))

In [None]:
import os

os.environ["OPENAI_API_KEY"] = "<api-key>"
ATLAS_CONNECTION_STRING = "<connection-string>"
ATLAS_DB_NAME = "langchain_db"    # MongoDB database to store the knowledge graph
ATLAS_COLLECTION = "wikipedia"    # MongoDB collection to store the knowledge graph

## Use MongoDB Atlas as a knowledge graph

Use the `MongoDBGraphStore` component to store your data as a knowledge graph. This component allows you to implement GraphRAG by storing entities (nodes) and their relationships (edges) in a MongoDB collection. It stores each entity as a document with relationship fields that reference other documents in your collection.

In [None]:
from langchain_openai import OpenAI
from langchain.chat_models import init_chat_model

# For best results, use latest models such as gpt-4o and Claude Sonnet 3.5+, etc.
chat_model = init_chat_model("gpt-4o", model_provider="openai", temperature=0)

In [None]:
from langchain_community.document_loaders import WikipediaLoader
from langchain.text_splitter import TokenTextSplitter

# Load Wikipedia pages corresponding to the query "Sherlock Holmes"
wikipedia_pages = WikipediaLoader(query="Sherlock Holmes", load_max_docs=3).load()

# Split the documents into chunks for efficient downstream processing (graph creation)
text_splitter = TokenTextSplitter(chunk_size=1024, chunk_overlap=0)
wikipedia_docs = text_splitter.split_documents(wikipedia_pages)

# Print the first document
wikipedia_docs[0]

In [None]:
from langchain_mongodb.graphrag.graph import MongoDBGraphStore

graph_store = MongoDBGraphStore(
    connection_string = ATLAS_CONNECTION_STRING,
    database_name = ATLAS_DB_NAME,
    collection_name = ATLAS_COLLECTION,
    entity_extraction_model = chat_model
)

In [None]:
# Extract entities and create knowledge graph in Atlas
# This might take a few minutes; you can ignore any warnings
graph_store.add_documents(wikipedia_docs)

## Visualize the knowledge graph

To visualize the knowledge graph, you can export the structured data to a visualization library like `pyvis`.
This helps you to explore and understand the relationships and hierarchies within your data.

In [None]:
import networkx as nx
from pyvis.network import Network
from pymongo import MongoClient

client = MongoClient(ATLAS_CONNECTION_STRING)
collection = client[ATLAS_DB_NAME][ATLAS_COLLECTION]

docs = list(collection.find())

# Function to convert attributes dictionary to a display string
def format_attributes(attributes):
    if not attributes:
        return ""
    parts = []
    for key, values in attributes.items():
        parts.append(f"{key}: {', '.join(values)}")
    return "<br>".join(parts)

# Create a NetworkX graph
G = nx.DiGraph()

# Add nodes with their attributes
for doc in docs:
    node_id = doc["_id"]
    # Combine document type and its attributes (if any) for the hover tooltip
    node_info = f"Type: {doc.get('type', '')}"
    if "attributes" in doc:
        attr_str = format_attributes(doc["attributes"])
        if attr_str:
            node_info += "<br>" + attr_str
    G.add_node(node_id, title=node_info, label=node_id)

# Add edges based on relationships
for doc in docs:
    source = doc["_id"]
    rels = doc.get("relationships", {})
    target_ids = rels.get("target_ids", [])
    rel_types = rels.get("types", [])
    rel_attrs = rels.get("attributes", [])
    
    # Ensure all three lists have the same length
    for i in range(len(target_ids)):
        target = target_ids[i]
        edge_type = rel_types[i] if i < len(rel_types) else ""
        # Get edge attributes info if available
        extra_attr = {}
        if i < len(rel_attrs) and rel_attrs[i]:
            extra_attr = rel_attrs[i]
        edge_info = f"Relationship: {edge_type}"
        if extra_attr:
            edge_info += "<br>" + format_attributes(extra_attr)
        # Add the edge with title attribute for hover
        G.add_edge(source, target, title=edge_info, label=edge_type)

# Create and show the network using pyvis. This might not work in some environments.
net = Network(height="550px", width="100%", notebook=True, directed=True)
net.from_nx(G)
net.show("graph.html")

## Answer questions on your data

The `MongoDBGraphStore` class provides a `chat_response` method that you can use to answer questions on your data. It executes queries by using the `$graphLookup` aggregation stage.

In [None]:
query = "Who inspired Sherlock Holmes?"

answer = graph_store.chat_response(query)
answer.content