# Knowledge Graph Exploration

Enhance retrieval with knowledge graphs and entity relationships.

**What you'll learn:**
- Build entity graphs from documents
- Graph traversal queries
- Combine graphs with RAG
- Entity extraction
- Visualize relationships

**Prerequisites:** Completed notebooks 01-02

In [None]:
# Setup
from dotenv import load_dotenv

from hybridrag import create_hybridrag
from hybridrag.prompts import ENTITY_EXTRACTION_PROMPT, extract_entities_from_query

load_dotenv()
rag = await create_hybridrag()
print("âœ“ HybridRAG initialized with graph support")

## 1. Entity Extraction from Documents

In [None]:
# Sample document with entities
document = """
MongoDB Atlas is a cloud database service that provides vector search capabilities.
It integrates with Voyage AI for embeddings and supports HNSW indexing.
The service runs on AWS, Azure, and Google Cloud platforms.
"""

print("Document:")
print(document)
print("\nEntity Extraction Prompt:")
print(ENTITY_EXTRACTION_PROMPT[:200] + "...")

# Note: Entity extraction requires LLM call
# This is a demonstration of the pattern
print("\nExpected entities:")
print("- MongoDB Atlas (Product)")
print("- Voyage AI (Company)")
print("- HNSW (Algorithm)")
print("- AWS, Azure, Google Cloud (Platforms)")

## 2. Query Entity Extraction

In [None]:
# Extract entities from query
query = "How does MongoDB Atlas integrate with Voyage AI embeddings?"

print(f"Query: {query}\n")
print("Query Entity Extraction Prompt:")
print(extract_entities_from_query(query)[:300] + "...\n")

print("Expected entities from query:")
print("- MongoDB Atlas (Product)")
print("- Voyage AI (Company)")
print("- embeddings (Concept)")

## 3. Graph-Enhanced Search

In [None]:
# In HybridRAG, entity graphs are stored in MongoDB
# Each entity becomes a node with relationships to other entities

# Example: Search with entity expansion
query = "vector search mongodb"

# Standard search
standard_results = await rag.query(query=query, mode="hybrid", top_k=3)

# Graph-enhanced search would:
# 1. Extract entities from query ("vector search", "mongodb")
# 2. Find related entities in graph ("atlas", "embeddings", "hnsw")
# 3. Expand query with related entities
# 4. Retrieve documents mentioning any related entities

print("Standard Search Results:")
for idx, result in enumerate(standard_results, 1):
    print(f"{idx}. Score: {result.score:.4f}")
    print(f"   Content: {result.content[:80]}...\n")

print("Graph-enhanced search would expand to include:")
print("- Documents about 'Atlas' (related to MongoDB)")
print("- Documents about 'embeddings' (related to vector search)")
print("- Documents about 'HNSW' (algorithm for vector search)")

## 4. Entity Relationship Patterns

In [None]:
# Common entity relationship patterns
relationships = {
    "MongoDB Atlas": {
        "type": "Product",
        "relationships": [
            ("provides", "Vector Search"),
            ("runs_on", "AWS"),
            ("runs_on", "Azure"),
            ("integrates_with", "Voyage AI"),
        ],
    },
    "Vector Search": {
        "type": "Feature",
        "relationships": [
            ("uses", "HNSW"),
            ("requires", "Embeddings"),
        ],
    },
    "Voyage AI": {
        "type": "Company",
        "relationships": [
            ("provides", "Embeddings"),
        ],
    },
}

print("Entity Graph Structure:\n")
for entity, data in relationships.items():
    print(f"{entity} ({data['type']})")
    for rel_type, target in data["relationships"]:
        print(f"  --{rel_type}--> {target}")
    print()

## 5. Visualize Entity Graph

In [None]:
import matplotlib.pyplot as plt
import networkx as nx

# Create graph
G = nx.DiGraph()

# Add nodes and edges
for entity, data in relationships.items():
    G.add_node(entity, type=data["type"])
    for rel_type, target in data["relationships"]:
        G.add_edge(entity, target, label=rel_type)

# Draw graph
plt.figure(figsize=(12, 8))
pos = nx.spring_layout(G, k=2, iterations=50)

# Draw nodes with colors by type
node_colors = {
    "Product": "#4CAF50",
    "Feature": "#2196F3",
    "Company": "#FF9800",
}

for node_type, color in node_colors.items():
    nodes = [n for n, d in G.nodes(data=True) if d.get("type") == node_type]
    nx.draw_networkx_nodes(
        G, pos, nodelist=nodes, node_color=color, node_size=3000, label=node_type
    )

# Draw edges and labels
nx.draw_networkx_edges(
    G, pos, edge_color="gray", arrows=True, arrowsize=20, arrowstyle="->", width=2
)
nx.draw_networkx_labels(G, pos, font_size=10, font_weight="bold")

# Edge labels
edge_labels = nx.get_edge_attributes(G, "label")
nx.draw_networkx_edge_labels(G, pos, edge_labels, font_size=8)

plt.title("Entity Knowledge Graph", fontsize=16, fontweight="bold")
plt.legend(loc="upper left")
plt.axis("off")
plt.tight_layout()
plt.show()

print("\nGraph Statistics:")
print(f"Nodes: {G.number_of_nodes()}")
print(f"Edges: {G.number_of_edges()}")
print(f"Density: {nx.density(G):.2f}")

## 6. Graph Traversal Queries

In [None]:
# Example: Find all entities related to "MongoDB Atlas"
start_entity = "MongoDB Atlas"

# Direct relationships (depth 1)
direct = list(G.successors(start_entity))
print(f"Direct relationships from '{start_entity}':")
for node in direct:
    edge_data = G.get_edge_data(start_entity, node)
    print(f"  --{edge_data['label']}--> {node}")

print()

# Two-hop relationships (depth 2)
two_hop = set()
for intermediate in direct:
    for target in G.successors(intermediate):
        two_hop.add((intermediate, target))

print(f"Two-hop relationships from '{start_entity}':")
for intermediate, target in two_hop:
    edge1 = G.get_edge_data(start_entity, intermediate)
    edge2 = G.get_edge_data(intermediate, target)
    print(
        f"  {start_entity} --{edge1['label']}--> {intermediate} --{edge2['label']}--> {target}"
    )

## 7. Integration Patterns

### Graph-Enhanced RAG Pipeline:

```python
# 1. Extract entities from query
query_entities = extract_entities(query)

# 2. Find related entities (graph traversal)
related_entities = graph_traverse(query_entities, depth=2)

# 3. Expand query with related entities
expanded_query = query + " " + " ".join(related_entities)

# 4. Hybrid search with expanded query
results = await rag.query(expanded_query, mode="hybrid")

# 5. Re-rank based on entity relevance
reranked = rerank_by_entity_coverage(results, query_entities + related_entities)
```

### Benefits:
- **Improved recall**: Find documents not directly matching query terms
- **Context awareness**: Understand entity relationships
- **Query expansion**: Automatic synonym/related term discovery

## Next Steps

- `04_prompt_engineering.ipynb` - Optimize system prompts
- `05_performance_tuning.ipynb` - Production optimization
- **Further reading:**
  - MongoDB Graph Queries
  - Knowledge Graph Construction
  - Entity Resolution Techniques