# Knowledge Graph Construction, Querying, and Visualization

This notebook demonstrates building a knowledge graph from a text corpus containing approximately 100 entities.

We will:
1. Load a large text dataset about technology companies, people, and projects
2. Chunk the text into paragraph-sized documents using `RecursiveCharacterTextSplitter`
3. Extract entities and relationships using `MongoDBGraphStore`
4. Visualize the resulting knowledge graph using HoloViews and NetworkX

The dataset contains news-style articles about:
- Organizations (tech companies, research institutions, nonprofits)
- People (CEOs, scientists, directors)
- Projects and initiatives
- Locations and facilities
- Technologies and products

In [None]:
import os
from pathlib import Path

import holoviews as hv
import networkx as nx
from holoviews import opts
from langchain_core.documents import Document
from langchain_openai import ChatOpenAI
from langchain_text_splitters import RecursiveCharacterTextSplitter

from langchain_mongodb.graphrag.graph import MongoDBGraphStore

In [None]:
hv.extension("bokeh")

In [None]:
# Set up default plot dimensions for better viewing of large graphs
defaults = dict(width=1200, height=800)
hv.opts.defaults(
    opts.EdgePaths(**defaults), opts.Graph(**defaults), opts.Nodes(**defaults)
)

In [None]:
defaults = dict(width=1400, height=1000)
hv.opts.defaults(
    opts.EdgePaths(**defaults), opts.Graph(**defaults), opts.Nodes(**defaults)
)

## Configuration

Set up MongoDB connection and LLM for entity extraction.

In [None]:
CONNECTION_STRING = os.environ.get("MONGODB_URI", "")
DB_NAME = "langchain_test_db"
COLLECTION_NAME = "langchain_graphrag_large_example"

# Configure the LLM for entity extraction
# Using gpt-4o for high-quality entity extraction
entity_extraction_model = ChatOpenAI(
    model="gpt-4o", temperature=0.0, cache=False, seed=12345
)

## Load and Chunk Text Data

Load the large text dataset and split it into paragraph-sized chunks.

In [None]:
# Load the text data
data_file = Path("data/articles.txt")
text_content = data_file.read_text()

print(f"Loaded text file with {len(text_content)} characters")
print(f"Preview:\n{text_content[:300]}...")

In [None]:
# Split text into chunks using RecursiveCharacterTextSplitter
# This creates natural document boundaries at paragraph breaks
text_splitter = RecursiveCharacterTextSplitter(
    separators=["\n\n", "\n", " ", ""],
    chunk_size=1000,
    chunk_overlap=100,
    length_function=len,
)

text_chunks = text_splitter.split_text(text_content)
print(f"Split text into {len(text_chunks)} chunks")

# Convert to Document objects
documents = [Document(page_content=chunk) for chunk in text_chunks]
print(f"Created {len(documents)} Document objects")

## Build Knowledge Graph

Create `MongoDBGraphStore` and extract entities from the documents.

**Note:** This step takes several minutes as the LLM processes each chunk to extract entities and relationships.

In [None]:
# Create the graph store
graph_store = MongoDBGraphStore(
    connection_string=CONNECTION_STRING,
    database_name=DB_NAME,
    collection_name=COLLECTION_NAME,
    entity_extraction_model=entity_extraction_model,
    max_depth=3,  # Allow deeper graph traversal for complex queries
)

print(f"Created MongoDBGraphStore connected to {DB_NAME}.{COLLECTION_NAME}")

In [None]:
# Extract entities from documents
# This may take 1-2 minutes depending on the number of chunks and LLM response time
print(f"Processing {len(documents)} documents...")
bulkwrite_results = graph_store.add_documents(documents)

print(f"Processed {len(bulkwrite_results)} document chunks")
print("Entity extraction complete!")

## Graph Statistics

Examine the extracted knowledge graph.

In [None]:
# Count entities
entity_count = graph_store.collection.count_documents({})
print(f"Total entities extracted: {entity_count}")

# Get all entities
entities = list(graph_store.collection.find({}))

# Analyze entity types
entity_types = {}
for entity in entities:
    entity_type = entity.get("type", "Unknown")
    entity_types[entity_type] = entity_types.get(entity_type, 0) + 1

print("\nEntity types:")
for entity_type, count in sorted(
    entity_types.items(), key=lambda x: x[1], reverse=True
):
    print(f"  {entity_type}: {count}")

# Analyze relationships
entities_with_relationships = [
    entity for entity in entities if entity.get("relationships", {}).get("target_ids")
]

total_relationships = sum(
    len(entity.get("relationships", {}).get("target_ids", []))
    for entity in entities_with_relationships
)

print("\nRelationships:")
print(f"  Entities with relationships: {len(entities_with_relationships)}")
print(f"  Total relationship edges: {total_relationships}")
print(
    f"  Average relationships per connected entity: {total_relationships / len(entities_with_relationships):.1f}"
)

## Sample Entities

Display a few example entities from the graph.

In [None]:
# Show some sample entities with relationships
print("Sample entities with relationships:\n")
for i, entity in enumerate(entities_with_relationships[:5]):
    print(f"{i+1}. {entity['_id']} ({entity.get('type')})")
    relationships = entity.get("relationships", {})
    target_ids = relationships.get("target_ids", [])
    rel_types = relationships.get("types", [])
    print(f"   Relationships: {len(target_ids)}")
    for _, (target, rel_type) in enumerate(zip(target_ids[:3], rel_types[:3])):
        print(f"     - {rel_type} -> {target}")
    if len(target_ids) > 3:
        print(f"     ... and {len(target_ids) - 3} more")
    print()

## Query the Knowledge Graph

Test querying the graph to find related entities.

In [None]:
# Example query about relationships
query = (
    "What is the connection between Quantum Dynamics Corp and NanoTech Materials Ltd?"
)

# Extract entity names from query
entity_names = graph_store.extract_entity_names(query)
print(f"Extracted entities from query: {entity_names}")

# Find related entities through graph traversal
if entity_names:
    related_entities = graph_store.related_entities(entity_names)
    print(f"\nFound {len(related_entities)} related entities:")
    for entity in related_entities[:10]:
        print(f"  - {entity['_id']} ({entity.get('type')})")
    if len(related_entities) > 10:
        print(f"  ... and {len(related_entities) - 10} more")

In [None]:
# Get a natural language response using the knowledge graph
answer = graph_store.chat_response(query)
print(f"Chat Response:\n{answer.content}")

## Visualize Knowledge Graph

Create interactive visualizations of the knowledge graph using HoloViews and NetworkX.

### Basic Graph View

First, we'll create a basic visualization of the entire graph.

In [None]:
# Create basic view of the entire graph
basic_view = graph_store.view()
basic_view

### Add view options

The default view uses a force-directed layout for organic graph visualization, however we can improve on this by adding options such as a colormap for the nodes(`node_opts`, `edge_opts`) and options to the networkx layout algorithm (`nx_opts`).

In [None]:
spring_view = graph_store.view(
    layout=nx.spring_layout,
    nx_opts=dict(k=0.5, iterations=100),
    edge_opts=dict(
        edge_line_width=0.5,
        node_color="type",
        cmap="Category20",
        node_size=10,
    ),  #
    node_opts=dict(size=10, color="type", cmap="Category20", alpha=0.8),
)
spring_view

### Multipartite Layout (by Entity Type)

NetworkX has many different layouts available. Here we visualize the graph with entities grouped by type.

In [None]:
# Create multipartite layout grouping entities by type

type_view = graph_store.view(
    layout=nx.multipartite_layout,
    nx_opts=dict(subset_key="type"),
    edge_opts=dict(
        edge_line_width=1,
        edge_alpha=0.5,
        node_color="type",
        cmap="Category20",
        node_size=10,
    ),
    node_opts=dict(size=10, color="type", cmap="Category20", alpha=0.8),
)
type_view

### NetworkX Graph Analytics

Convert to NetworkX for advanced layout algorithms and analytics.

In [None]:
# Convert to NetworkX graph
nx_graph = graph_store.to_networkx()

print(
    f"NetworkX graph: {nx_graph.number_of_nodes()} nodes, {nx_graph.number_of_edges()} edges"
)
print(f"Graph density: {nx.density(nx_graph):.4f}")

# Check if graph is connected
if nx_graph.number_of_nodes() > 0:
    is_connected = nx.is_connected(nx_graph.to_undirected())
    print(f"Is connected: {is_connected}")

    if not is_connected:
        components = list(nx.connected_components(nx_graph.to_undirected()))
        print(f"Number of connected components: {len(components)}")
        print(f"Largest component size: {len(max(components, key=len))}")

## Creating plots from NetworkX

Though we again use HoloViews in the example below, this shows how one can compose plots without reliance on `view` alone.

In [None]:
# Create spring layout (force-directed)
spring_layout = hv.Graph.from_networkx(nx_graph, nx.spring_layout, k=0.5, iterations=50)

spring_layout + spring_layout.nodes.opts(
    size=10, color="type", cmap="Category20", alpha=0.8
)

### Focused Subgraph Visualization

Visualize a subgraph around specific entities for detailed exploration.

In [None]:
# Pick an entity with many connections
if entities_with_relationships:
    # Find entity with most relationships
    most_connected = max(
        entities_with_relationships,
        key=lambda e: len(e.get("relationships", {}).get("target_ids", [])),
    )

    print(
        f"Most connected entity: {most_connected['_id']} ({most_connected.get('type')})"
    )
    print(
        f"Number of relationships: {len(most_connected.get('relationships', {}).get('target_ids', []))}"
    )

    # Get subgraph around this entity
    focus_entity = most_connected["_id"]
    related = graph_store.related_entities([focus_entity], max_depth=2)

    print(f"\nSubgraph contains {len(related)} entities")

    # Create a subgraph from these entities
    subgraph_nodes = {entity["_id"] for entity in related}
    subgraph = nx_graph.subgraph(subgraph_nodes).copy()

    print(
        f"Subgraph: {subgraph.number_of_nodes()} nodes, {subgraph.number_of_edges()} edges"
    )

    # Visualize subgraph
    if subgraph.number_of_nodes() > 0:
        sub_spring = hv.Graph.from_networkx(
            subgraph, nx.spring_layout, k=0.8, iterations=50
        )

        subgraph_view = sub_spring.opts(
            inspection_policy="edges",
            node_color="type",
            cmap="Category20",
            node_size=15,
            edge_line_width=2,
            edge_alpha=0.6,
            title=f"Subgraph around {focus_entity}",
        ) * sub_spring.nodes.opts(
            size=15,
            color="type",
            cmap="Category20",
            alpha=0.9,
        )

In [None]:
subgraph_view

## Export Visualizations

Save the visualizations as HTML files for sharing. You can also download static images via the Bokeh widget!

In [None]:
import logging

logging.getLogger("bokeh.core.validation.check").setLevel(logging.CRITICAL)

# Save visualizations
hv.save(type_view, "graph_by_type.html")
print("Saved graph_by_type.html")

hv.save(spring_view, "graph_spring.html")
print("Saved graph_spring.html")

print("\nVisualizations exported successfully!")

## Summary

In this notebook, we:

1. ✅ Loaded a large text corpus with ~100 entities
2. ✅ Chunked text into documents using `RecursiveCharacterTextSplitter`
3. ✅ Built a knowledge graph using `MongoDBGraphStore` and LLM-based entity extraction
4. ✅ Analyzed graph statistics (entities, types, relationships)
5. ✅ Queried the graph for related entities and chat responses
6. ✅ Visualized the graph using multiple layout algorithms
7. ✅ Explored focused subgraphs around highly-connected entities
8. ✅ Exported interactive visualizations as HTML

The resulting knowledge graph can be used for:
- Complex multi-hop queries
- Entity relationship discovery
- Context-aware chat responses
- Graph analytics and network analysis