# D&D Campaign RAG Experiment

This notebook lets you experiment with the RAG (Retrieval-Augmented Generation) system for your D&D campaign.

## What this notebook does:
1. Loads and chunks your campaign content
2. Creates embeddings using the nomic-embed-text model
3. Builds a FAISS vector store for similarity search
4. Lets you test queries and see what documents are retrieved
5. Shows the full RAG chain in action


In [7]:
# Import necessary libraries
import sys
import os
import json
from pathlib import Path

# Add the project root to Python path
# If we're in notebooks/, go up one level to find the project root
if Path.cwd().name == 'notebooks':
    project_root = Path.cwd().parent
else:
    project_root = Path.cwd()

sys.path.append(str(project_root))
os.chdir(project_root)  # Change working directory to project root

print(f"Current working directory: {os.getcwd()}")
print(f"Project root: {project_root}")

from src.settings import CONTENT_DIR, INDEX_DIR, LLM_MODEL, EMBED_MODEL, CHUNK_SIZE, CHUNK_OVERLAP, TOP_K
from src.rag.index import create_rag_index
from src.rag.chain import make_graph_rag_chain
from src.rag.graph import (
    ObsidianGraphBuilder, 
    GraphAnalyzer,
    normalize_entity_name, 
    extract_headers, 
    smart_chunk_content
)

from langchain_community.chat_models import ChatOllama
from langchain.prompts import PromptTemplate
from langchain.schema import StrOutputParser

# Additional visualization libraries
import numpy as np
import matplotlib.pyplot as plt
import umap
import seaborn as sns

print(f"Content directory: {CONTENT_DIR}")
print(f"Index directory: {INDEX_DIR}")
print(f"LLM Model: {LLM_MODEL}")
print(f"Embedding Model: {EMBED_MODEL}")
print(f"Chunk size: {CHUNK_SIZE}")
print(f"Top K: {TOP_K}")

Current working directory: d:\DnD\1 - Campaign - THE KEEPERS\kob
Project root: d:\DnD\1 - Campaign - THE KEEPERS\kob
Content directory: content
Index directory: .rag_index
LLM Model: llama3.1
Embedding Model: nomic-embed-text
Chunk size: 600
Top K: 10


## Step 1: Load and Chunk Content

This loads all your campaign documents and splits them into chunks for processing.


In [2]:
# Graph-based content loading is handled in the next step
print("=== GRAPH-BASED CONTENT LOADING ===")
print("Content loading will be performed during index creation...")


=== GRAPH-BASED CONTENT LOADING ===
Content loading will be performed during index creation...


## Step 2: Build Vector Store

This creates embeddings for all chunks and builds a FAISS index for fast similarity search.


In [3]:
# Create RAG index with vector store and graph
print("Creating RAG index...")
vector_store, graph_builder, graph_path = create_rag_index(
    CONTENT_DIR, INDEX_DIR, 
    chunk_size=500,  # Reduced chunk size for more granular content
    chunk_overlap=100, 
    embed_model=EMBED_MODEL
)

# Detailed index information
print(f"\nVector Store Details:")
print(f"- Total vectors: {vector_store.index.ntotal}")
print(f"- Vector dimension: {vector_store.index.d}")

print(f"\nGraph Details:")
print(f"- Total nodes: {len(graph_builder.graph.nodes)}")
print(f"- Total edges: {len(graph_builder.graph.edges)}")

# Node type distribution
type_counts = {}
for _, node_data in graph_builder.graph.nodes(data=True):
    node_type = node_data.get('type', 'unknown')
    type_counts[node_type] = type_counts.get(node_type, 0) + 1

print("\nNode Type Distribution:")
for node_type, count in sorted(type_counts.items(), key=lambda x: x[1], reverse=True):
    print(f"- {node_type}: {count} nodes")


Creating RAG index...


The default value will be `edges="edges" in NetworkX 3.6.


  nx.node_link_data(G, edges="links") to preserve current behavior, or
  nx.node_link_data(G, edges="edges") for forward compatibility.
  embedding = OllamaEmbeddings(model=embed_model)
2025-10-20 18:51:31,008 - INFO - Loading faiss with AVX2 support.
2025-10-20 18:51:31,053 - INFO - Successfully loaded faiss with AVX2 support.


Graph built with 583 nodes and 1420 edges
Loading existing FAISS index...

Vector Store Details:
- Total vectors: 200
- Vector dimension: 1024

Graph Details:
- Total nodes: 583
- Total edges: 1420

Node Type Distribution:
- content_chunk: 200 nodes
- unknown: 197 nodes
- location: 84 nodes
- character: 50 nodes
- faction: 21 nodes
- entry: 20 nodes
- item: 11 nodes


## Step 2.1: Analyze Your Content

Let's get some insights about your campaign content.


In [4]:
# Analyze the content using graph analyzer
print("CONTENT ANALYSIS:")

# Initialize graph analyzer
graph_analyzer = GraphAnalyzer(graph_builder.graph)

# 1. Node Type Distribution
print("\n1. Node Type Distribution:")
type_counts = graph_analyzer.count_nodes_by_type()
for node_type, count in sorted(type_counts.items(), key=lambda x: x[1], reverse=True):
    print(f"  {node_type}: {count} nodes")

# 2. Most Connected Nodes
print("\n2. Most Connected Nodes:")
most_connected = graph_analyzer.find_most_connected_nodes(top_k=10)
for node in most_connected:
    print(f"  {node['node']} (Type: {node['type']}, Connections: {node['total_connections']})")

# 3. Relationship Type Analysis
relationship_types = {}
for u, v, data in graph_builder.graph.edges(data=True):
    rel_type = data.get('type', 'unknown')
    relationship_types[rel_type] = relationship_types.get(rel_type, 0) + 1

print("\n3. Relationship Types:")
for rel_type, count in sorted(relationship_types.items(), key=lambda x: x[1], reverse=True):
    print(f"  {rel_type}: {count} edges")

# 4. Detailed Node Exploration
print("\n4. Detailed Node Exploration:")
example_node = "Baang"  # Choose an interesting node to explore
node_details = graph_analyzer.explore_node(example_node)
print(f"Exploring node: {example_node}")
print(json.dumps(node_details, indent=2))



CONTENT ANALYSIS:

1. Node Type Distribution:
  content_chunk: 200 nodes
  unknown: 197 nodes
  location: 84 nodes
  character: 50 nodes
  faction: 21 nodes
  entry: 20 nodes
  item: 11 nodes

2. Most Connected Nodes:
  The Rock of Bral (Type: location, Connections: 103)
  Bragora (Type: location, Connections: 73)
  La Citta (Type: location, Connections: 65)
  Greyhawk (Type: location, Connections: 49)
  Vax (Type: character, Connections: 41)
  Spelljammer Academy (Type: location, Connections: 36)
  Keepers of the Balance (Type: faction, Connections: 34)
  Baang (Type: character, Connections: 34)
  Rhogar (Type: character, Connections: 33)
  Central Flanaess (Type: location, Connections: 31)

3. Relationship Types:
  wikilink: 764 edges
  has_content_chunk: 200 edges
  parent_of: 88 edges
  child_of: 88 edges
  member_of: 50 edges
  has_member: 50 edges
  related_to: 48 edges
  originates_from: 29 edges
  origin_of: 29 edges
  associated_with: 27 edges
  associates: 27 edges
  adjacent

In [5]:
# Print the content of nodes that are linked to "The Party"

party_node = "The Party"

# Find nodes linked to "The Party"
linked_nodes = set()
for u, v in graph_builder.graph.edges(party_node):
    if u == party_node:
        linked_nodes.add(v)
    else:
        linked_nodes.add(u)

print('\nNodes linked to "The Party" and their details:')
for node in linked_nodes:
    print(f"\nNode: {node}")
    node_data = graph_builder.graph.nodes[node]
    print("Attributes:")
    for key, value in node_data.items():
        print(f"  {key}: {value}")




Nodes linked to "The Party" and their details:

Node: Baang
Attributes:
  type: character
  file_path: content\1 Keepers' Compendium\game\party\Baang.md
  race: Air Genasi
  class: Rogue
  height: 1,70m
  origin: [[Greyhawk]]
  known_locations: []
  factions: ['[[kbβ42]]', '[[The Party]]']
  alignment: 
  appears_in: []
  relates_to: []
  image: [[Baang.jpg]]

Node: Vax
Attributes:
  type: character
  file_path: content\1 Keepers' Compendium\game\party\Vax.md
  race: Black Cat (utilizando Tabaxi de referencia
  class: Warlock, Celestial Patron
  height: 1,20m
  origin: [[Greyhawk]]
  known_locations: []
  factions: ['[[kbβ42]]', '[[The Party]]']
  alignment: 
  appears_in: []
  relates_to: []
  image: [[Vax.webp]]

Node: The Party_content_0
Attributes:
  type: content_chunk
  parent_entity: The Party
  chunk_index: 0
  file_path: content\1 Keepers' Compendium\wiki\faction\The Party.md
  content: # PCS

O grupo de player characters.

Node: Gizmo
Attributes:
  type: character
  file_pat

## Step 3: Test Similarity Search

Let's test the vector store by searching for similar content to your queries.


In [6]:
# Test similarity search
test_queries = [
    "What is the Rock of Bral?",
    "Tell me about the party members",
    "What are the faction relationships?",
    "How does spelljamming work?"
]

for query in test_queries:
    print(f"\n{'='*50}")
    print(f"QUERY: {query}")
    print(f"{'='*50}")
    
    # Search for similar documents
    docs = vector_store.similarity_search(query, k=TOP_K)
    
    for i, doc in enumerate(docs, 1):
        print(f"\n--- Result {i} ---")
        print(doc.metadata)
        # Extract filename from file_path if available
        source = doc.metadata.get('file_path', 'Unknown Source')
        source_filename = Path(source).name if source != 'Unknown Source' else source
        print(f"Source: {source_filename}")
        
        # Try to get node type from graph if possible
        node_type = graph_builder.graph.nodes.get(source_filename, {}).get('type', 'Unknown Type')
        print(f"Node Type: {node_type}")
        
        # Get node details from graph
        node_details = graph_builder.graph.nodes.get(source_filename, {})
        print("Node Details:")
        for key, value in node_details.items():
            if key not in ['type', 'file_path', 'content']:
                print(f"  {key}: {value}")
        
        print(f"Content: {doc.page_content[:300]}...")
        print(f"{'...' if len(doc.page_content) > 300 else ''}")



QUERY: What is the Rock of Bral?


AssertionError: 

## Step 4: Test Full RAG Chain

Now let's test the complete RAG system with the LLM.


In [None]:
# Create the Graph-Enhanced RAG chain
print("Creating Graph-Enhanced RAG chain...")
chain = make_graph_rag_chain(
    vector_store, 
    graph_path, 
    LLM_MODEL, 
    TOP_K
)
print("Graph-Enhanced RAG chain created successfully!")

# Test queries with graph context
test_questions = [
    "Who are the main characters in the party?",
    "What is the current status of Bral?",
    "Tell me about the spelljammer ships",
    "What factions are involved in the campaign?"
]

for question in test_questions:
    print(f"\n{'='*60}")
    print(f"QUESTION: {question}")
    print(f"{'='*60}")
    
    try:
        # Run the Graph-Enhanced RAG chain
        result = chain({"query": question})
        
        print(f"\nANSWER:")
        print(result["result"])
        
        # Show graph context
        print(f"\nGRAPH CONTEXT:")
        print(result.get("graph_context", "No graph context available"))
        
        # Show vector context
        print(f"\nVECTOR CONTEXT:")
        print(result.get("vector_context", "No vector context available"))
    
    except Exception as e:
        print(f"Error: {e}")
    
    print(f"\n{'-'*60}")


2025-10-20 16:36:31,449 - INFO - Creating graph-enhanced RAG chain
The default value will be changed to `edges="edges" in NetworkX 3.6.


  nx.node_link_graph(data, edges="links") to preserve current behavior, or
  nx.node_link_graph(data, edges="edges") for forward compatibility.
  llm = ChatOllama(model=llm_model)
2025-10-20 16:36:31,492 - INFO - Graph-enhanced RAG chain created successfully
2025-10-20 16:36:31,493 - INFO - Processing query: Who are the main characters in the party?
2025-10-20 16:36:31,494 - INFO - Retrieving context for query: Who are the main characters in the party?


Creating Graph-Enhanced RAG chain...
Graph-Enhanced RAG chain created successfully!

QUESTION: Who are the main characters in the party?


2025-10-20 16:37:17,928 - INFO - Processing query: What is the current status of Bral?
2025-10-20 16:37:17,928 - INFO - Retrieving context for query: What is the current status of Bral?



ANSWER:
Based on the provided initial context, I will attempt to answer the query "Who are the main characters in the party?"

From the given context, we see that there is no direct mention of the main characters in the party. However, we have mentions of Bianca Micharle and Saerthe Abizjn (as Spelljammer Training Officers), Kip and Pik Whistleslap (as ship inspectors), and Miken Haverstance (a human cadet). While these entities are related to the Spelljammer campaign setting, they do not appear to be the main characters in the party.

However, there is a mention of "The Keeper of Whispers," which is a tome written by Iggwilv. The book has a section where it writes letters to its reader (Vax) and provides some information about an entity that seems to be a patron or guide for Vax.

To further explore this query, I can use the graph_exploration_tool to request additional context on "Iggwilv" or "The Keeper of Whispers." However, before doing so, let me evaluate if the initial context f

2025-10-20 16:37:51,286 - INFO - Processing query: Tell me about the spelljammer ships
2025-10-20 16:37:51,287 - INFO - Retrieving context for query: Tell me about the spelljammer ships



ANSWER:
It appears that you provided a massive amount of text related to the campaign setting "Bral" in the Spelljammer universe. I'll do my best to provide a concise summary and answer your questions.

**Summary:**

The text describes various locations, entities, and events within the Bral campaign setting. It includes information about the city's layout, its nobility, trade, magic, and the presence of cultists. The text also mentions the Library of the Spheres, which serves as a regional headquarters for The Seekers and contains an impressive collection of books.

**Questions:**

Based on your original request, I assume you want to explore the context related to Bral's current status.

To provide a more accurate answer, please clarify what specific information you're looking for regarding Bral's current status. Are you interested in its:

1. Political landscape?
2. Economic situation?
3. Magical developments?
4. Cultist activities?

Please specify which aspect of Bral's current stat

2025-10-20 16:38:27,981 - INFO - Processing query: What factions are involved in the campaign?
2025-10-20 16:38:27,982 - INFO - Retrieving context for query: What factions are involved in the campaign?



ANSWER:
Based on your request, I'll provide information about spelljammer ships.

**Initial Context Review**

The initial context provides an overview of the Spelljammer campaign and its unique concept. It introduces O SPELLJAMMER (a legendary city-nave ship), Spelljammers (magical vessels that traverse space and the astral plane), and their significance in the campaign.

However, regarding your specific query about spelljammer ships, the context seems a bit general. There are no detailed descriptions or statistics about individual spelljammer ships.

**Additional Exploration**

To gain more insight into the spelljammer ships, I'd like to explore related entities that might provide more context.

Let's examine the entity "Keepers of the Balance" since it's mentioned as being associated with Alpha-one squad and Spelljammer Academy. Understanding their role or connection to spelljammer ships could help clarify the answer.

**Graph Exploration**

Using the graph_exploration_tool, I found

In [None]:
# Advanced Vector Database and Graph Analysis

import json
import numpy as np
import matplotlib.pyplot as plt
import umap
import seaborn as sns

# Initialize graph analyzer
graph_analyzer = GraphAnalyzer(graph_builder.graph)

# 1. Vector Database Visualization
print("\n=== VECTOR DATABASE VISUALIZATION ===")

# Extract embeddings and metadata
embeddings = vector_store.index.reconstruct_n(0, vector_store.index.ntotal)
docs = vector_store.docstore.search(list(range(vector_store.index.ntotal)))

# Prepare metadata for visualization
node_types = []
parent_entities = []
sources = []

for doc in docs:
    node_type = doc.metadata.get('graph_node_type', 'unknown')
    parent_entity = doc.metadata.get('parent_entity', 'unknown')
    source = doc.metadata.get('source', 'unknown')
    
    node_types.append(node_type)
    parent_entities.append(parent_entity)
    sources.append(source)

# Dimensionality Reduction with UMAP
reducer = umap.UMAP(n_components=2, random_state=42)
embedding_2d = reducer.fit_transform(embeddings)

# Visualization
plt.figure(figsize=(16, 12))

# Color mapping for node types
unique_node_types = list(set(node_types))
color_palette = sns.color_palette("husl", len(unique_node_types))
color_map = dict(zip(unique_node_types, color_palette))

# Scatter plot
for node_type in unique_node_types:
    mask = [nt == node_type for nt in node_types]
    plt.scatter(
        embedding_2d[mask, 0], 
        embedding_2d[mask, 1], 
        c=[color_map[node_type]], 
        label=node_type, 
        alpha=0.7
    )

plt.title("2D Projection of Vector Database", fontsize=16)
plt.xlabel("UMAP Dimension 1", fontsize=12)
plt.ylabel("UMAP Dimension 2", fontsize=12)
plt.legend(title="Node Types", bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()

# Save the plot
plt.savefig(os.path.join(INDEX_DIR, 'vector_db_projection.png'), dpi=300, bbox_inches='tight')
plt.close()

# Print some statistics
print("\nVector Database Projection Statistics:")
print(f"Total Vectors: {len(embeddings)}")
print("\nNode Type Distribution:")
type_counts = {}
for nt in node_types:
    type_counts[nt] = type_counts.get(nt, 0) + 1
for node_type, count in sorted(type_counts.items(), key=lambda x: x[1], reverse=True):
    print(f"- {node_type}: {count} vectors")

# Detailed Node Exploration
print("\n=== GRAPH ANALYSIS ===")

# 1. Node Type Distribution
print("\n1. Node Type Distribution:")
type_counts = graph_analyzer.count_nodes_by_type()
for node_type, count in sorted(type_counts.items(), key=lambda x: x[1], reverse=True):
    print(f"- {node_type}: {count} nodes")

# 2. Most Connected Nodes
print("\n2. Most Connected Nodes:")
most_connected = graph_analyzer.find_most_connected_nodes(top_k=5)
for node in most_connected:
    print(f"- {node['node']} (Type: {node['type']}, Connections: {node['total_connections']})")

# 3. Detailed Node Exploration
print("\n3. Detailed Node Exploration:")
example_node = "Baang"  # Choose an interesting node to explore
node_details = graph_analyzer.explore_node(example_node)
print(f"Exploring node: {example_node}")
print(json.dumps(node_details, indent=2))

# 4. Vector Search with Enhanced Metadata
print("\n=== VECTOR SEARCH DEMONSTRATION ===")

# Test queries with rich context
test_queries = [
    "Who are the main characters in the party?",
    "What is the Rock of Bral?",
    "Tell me about the Spelljammer ships"
]

for query in test_queries:
    print(f"\n--- Query: {query} ---")
    
    # Perform vector similarity search
    docs = vector_store.similarity_search(query, k=5)
    
    print("Retrieved Documents:")
    for i, doc in enumerate(docs, 1):
        print(f"\n{i}. Source: {doc.metadata.get('source', 'Unknown')}")
        print(f"   Parent Entity: {doc.metadata.get('parent_entity', 'N/A')}")
        print(f"   Node Type: {doc.metadata.get('graph_node_type', 'Unknown')}")
        print(f"   Content Preview: {doc.page_content[:200]}...")

# Demonstrate text processing utilities
print("\n=== TEXT PROCESSING UTILITIES ===")

# Example of header extraction
sample_text = """
# Main Title
Some introductory text.

## Subsection 1
More detailed information about the first subsection.

### Sub-subsection
Even more specific details.

## Subsection 2
Information about the second subsection.
"""

print("Header Extraction:")
headers = extract_headers(sample_text)
for header in headers:
    print(f"Level {header['level']}: {header['text']}")

# Example of smart chunking
print("\nSmart Content Chunking:")
chunks = smart_chunk_content(sample_text, min_content_size=50, target_chunk_size=100)
print(f"Number of chunks: {len(chunks)}")
for i, chunk in enumerate(chunks, 1):
    print(f"\nChunk {i}:")
    print(chunk[:200] + "...")



=== VECTOR DATABASE VISUALIZATION ===


TypeError: unhashable type: 'list'

## Step 5: Interactive Query Testing

Use this cell to test your own queries!


In [None]:
# Interactive query testing with graph context
def test_query(query):
    print(f"\n{'='*60}")
    print(f"QUERY: {query}")
    print(f"{'='*60}")
    
    try:
        # First, let's see what documents are retrieved
        docs = vector_store.similarity_search(query, k=TOP_K)
        print(f"\nRETRIEVED DOCUMENTS:")
        for i, doc in enumerate(docs, 1):
            # Extract filename from file_path if available
            source = doc.metadata.get('file_path', 'Unknown Source')
            source_filename = Path(source).name if source != 'Unknown Source' else source
            print(f"\n{i}. {source_filename}")
            
            # Try to get node type from graph if possible
            node_type = graph_builder.graph.nodes.get(source_filename, {}).get('type', 'Unknown Type')
            print(f"   Node Type: {node_type}")
            
            print(f"   {doc.page_content[:200]}...")
        
        # Explore graph connections for key entities
        print("\n--- GRAPH EXPLORATION ---")
        # Extract potential key entities from the query
        key_entities = graph_builder.find_nodes_by_type('character') + \
                       graph_builder.find_nodes_by_type('faction') + \
                       graph_builder.find_nodes_by_type('location')
        
        # Find relevant entities based on query
        relevant_entities = [
            entity for entity in key_entities 
            if any(word.lower() in entity.lower() for word in query.split())
        ]
        
        # Show graph connections for relevant entities
        for entity in relevant_entities[:3]:  # Limit to top 3
            print(f"\nGraph Connections for {entity}:")
            connections = graph_builder.get_strongest_connections(entity)
            for conn in connections:
                print(f"- {conn['node']} (Type: {conn['type']}, Weight: {conn['weight']:.2f})")
        
        # Now run the full Graph-Enhanced RAG chain
        print(f"\n{'='*40}")
        print("GRAPH-ENHANCED RAG RESPONSE:")
        print(f"{'='*40}")
        
        result = chain({"query": query})
        print(result["result"])
        
    
    except Exception as e:
        print(f"Error: {e}")

# Test your own queries here!
test_query("What is the Rock of Bral and who lives there?")
test_query("Tell me about the party's current mission")
test_query("What are the house rules for this campaign?")
