# LangChain MongoDB Integration - GraphRAG

This notebook is a companion to the [GraphRAG with MongoDB and LangChain](https://www.mongodb.com/docs/atlas/ai-integrations/langchain/graph-rag/) tutorial. Refer to the page for set-up instructions and detailed explanations.

This notebook demonstrates a GraphRAG implementation using MongoDB and LangChain. Compared to vector-based RAG, which structures your data as vector embeddings, GraphRAG structures data as a knowledge graph with entities and their relationships. This enables relationship-aware retrieval and multi-hop reasoning.

<a target="_blank" href="https://colab.research.google.com/github/mongodb/docs-notebooks/blob/main/ai-integrations/langchain-graphrag.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [None]:
pip install --quiet --upgrade pymongo langchain_community wikipedia langchain_openai langchain_mongodb pyvis

## Set up your environment

Before you begin, make sure you have the following:

- A MongoDB cluster up and running (you'll need the [connection string](https://www.mongodb.com/docs/manual/reference/connection-string/))
- An API key to access an LLM (This tutorial uses a model from OpenAI, but you can use any model [supported by LangChain](https://python.langchain.com/docs/integrations/chat/))

In [None]:
import os

os.environ["OPENAI_API_KEY"] = "<api-key>"
MONGODB_URI = "<connection-string>"
DB_NAME = "langchain_db"    # MongoDB database to store the knowledge graph
COLLECTION = "wikipedia"    # MongoDB collection to store the knowledge graph

## Use MongoDB as a knowledge graph

Use the `MongoDBGraphStore` component to store your data as a knowledge graph. This component allows you to implement GraphRAG by storing entities (nodes) and their relationships (edges) in a MongoDB collection. It stores each entity as a document with relationship fields that reference other documents in your collection.

In [None]:
from langchain_openai import OpenAI
from langchain.chat_models import init_chat_model

# For best results, use latest models such as gpt-4o and Claude Sonnet 3.5+, etc.
chat_model = init_chat_model("gpt-4o", model_provider="openai", temperature=0)

In [None]:
from langchain_community.document_loaders import WikipediaLoader
from langchain.text_splitter import TokenTextSplitter

# Load Wikipedia pages corresponding to the query "Sherlock Holmes"
wikipedia_pages = WikipediaLoader(query="Sherlock Holmes", load_max_docs=3).load()

# Split the documents into chunks for efficient downstream processing (graph creation)
text_splitter = TokenTextSplitter(chunk_size=1024, chunk_overlap=0)
wikipedia_docs = text_splitter.split_documents(wikipedia_pages)

# Print the first document
wikipedia_docs[0]

In [None]:
from langchain_mongodb.graphrag.graph import MongoDBGraphStore

graph_store = MongoDBGraphStore(
    connection_string = MONGODB_URI,
    database_name = DB_NAME,
    collection_name = COLLECTION,
    entity_extraction_model = chat_model
)

In [None]:
# Extract entities and create knowledge graph in MongoDB
# This might take a few minutes; you can ignore any warnings
graph_store.add_documents(wikipedia_docs)

## Visualize the knowledge graph

To visualize the knowledge graph, you can export the structured data to a visualization library like `pyvis`.
This helps you to explore and understand the relationships and hierarchies within your data.

In [None]:
import networkx as nx
from pyvis.network import Network

def visualize_graph(collection):
    docs = list(collection.find())
    
    def format_attributes(attrs):
        return "<br>".join(f"{k}: {', '.join(v)}" for k, v in attrs.items()) if attrs else ""
    
    G = nx.DiGraph()

    # Create nodes
    for doc in docs:
        node_id = str(doc["_id"])
        info = f"Type: {doc.get('type', '')}"
        if "attributes" in doc:
            attr_info = format_attributes(doc["attributes"])
            if attr_info:
                info += "<br>" + attr_info
        G.add_node(node_id, label=node_id, title=info.replace("<br>", "\n"))

    # Create edges
    for doc in docs:
        source = str(doc["_id"])
        rels = doc.get("relationships", {})
        targets = rels.get("target_ids", [])
        types = rels.get("types", [])
        attrs = rels.get("attributes", [])
        
        for i, target in enumerate(targets):
            edge_type = types[i] if i < len(types) else ""
            extra = attrs[i] if i < len(attrs) else {}
            edge_info = f"Relationship: {edge_type}"
            if extra:
                edge_info += "<br>" + format_attributes(extra)
            G.add_edge(source, str(target), label=edge_type, title=edge_info.replace("<br>", "\n"))

    # Build and configure network
    nt = Network(notebook=True, cdn_resources='in_line', width="800px", height="600px", directed=True)
    nt.from_nx(G)
    nt.set_options('''
    var options = {
      "interaction": {
        "hover": true,
        "tooltipDelay": 200
      },
      "nodes": {
        "font": {"multi": "html"}
      },
      "physics": {
        "repulsion": {
          "nodeDistance": 300,
          "centralGravity": 0.2,
          "springLength": 200,
          "springStrength": 0.05,
          "damping": 0.09
        }
      }
    }
    ''')

    return nt.generate_html()

In [None]:
from IPython.display import HTML, display
from pymongo import MongoClient

client = MongoClient(MONGODB_URI)

collection = client[DB_NAME][COLLECTION]
html = visualize_graph(collection)

display(HTML(html))

## Answer questions on your data

The `MongoDBGraphStore` class provides a `chat_response` method that you can use to answer questions on your data. It executes queries by using the `$graphLookup` aggregation stage.

In [None]:
query = "Who inspired Sherlock Holmes?"

answer = graph_store.chat_response(query)
answer.content