# D&D Campaign RAG Experiment

This notebook lets you experiment with the RAG (Retrieval-Augmented Generation) system for your D&D campaign.

## What this notebook does:
1. Loads and chunks your campaign content
2. Creates embeddings using the nomic-embed-text model
3. Builds a FAISS vector store for similarity search
4. Lets you test queries and see what documents are retrieved
5. Shows the full RAG chain in action


In [1]:
# Import necessary libraries
import sys
import os
from pathlib import Path

# Add the project root to Python path
# If we're in notebooks/, go up one level to find the project root
if Path.cwd().name == 'notebooks':
    project_root = Path.cwd().parent
else:
    project_root = Path.cwd()

sys.path.append(str(project_root))
os.chdir(project_root)  # Change working directory to project root

print(f"Current working directory: {os.getcwd()}")
print(f"Project root: {project_root}")

from src.settings import CONTENT_DIR, INDEX_DIR, LLM_MODEL, EMBED_MODEL, CHUNK_SIZE, CHUNK_OVERLAP, TOP_K
from src.rag.load import load_and_chunk
from src.rag.index import build_or_load_faiss
from src.rag.chain import make_rag_chain

print(f"Content directory: {CONTENT_DIR}")
print(f"Index directory: {INDEX_DIR}")
print(f"LLM Model: {LLM_MODEL}")
print(f"Embedding Model: {EMBED_MODEL}")
print(f"Chunk size: {CHUNK_SIZE}")
print(f"Top K: {TOP_K}")


Current working directory: d:\DnD\1 - Campaign - THE KEEPERS\kob
Project root: d:\DnD\1 - Campaign - THE KEEPERS\kob
Content directory: content
Index directory: .rag_index
LLM Model: llama3.1
Embedding Model: nomic-embed-text
Chunk size: 1200
Top K: 4


## Step 1: Load and Chunk Content

This loads all your campaign documents and splits them into chunks for processing.


In [2]:
# Load and chunk the content
print("=== LOADING AND CHUNKING ===")
print("Loading and chunking content...")

try:
    chunks = load_and_chunk(CONTENT_DIR, CHUNK_SIZE, CHUNK_OVERLAP)
    print(f"\nTotal chunks created: {len(chunks)}")
    
    if chunks:
        print(f"\nFirst few chunks:")
        for i, chunk in enumerate(chunks[:3]):
            print(f"\n--- Chunk {i+1} ---")
            print(f"Source: {chunk.metadata.get('source', 'Unknown')}")
            print(f"Page: {chunk.metadata.get('page', 'N/A')}")
            print(f"Content preview: {chunk.page_content[:200]}...")
    else:
        print("No chunks were created! Check the file discovery above.")
        
except Exception as e:
    print(f"Error during loading: {e}")
    import traceback
    traceback.print_exc()


=== LOADING AND CHUNKING ===
Loading and chunking content...
Loading from directory: content
Absolute path: D:\DnD\1 - Campaign - THE KEEPERS\kob\content
Found 216 text files:
  index.md (size: 640 bytes, suffix: .md)
  1 Keepers' Compendium\.obsidian\index.md (size: 27 bytes, suffix: .md)
  1 Keepers' Compendium\.stfolder\index.md (size: 27 bytes, suffix: .md)
  1 Keepers' Compendium\game\Bral Faction Status.md (size: 1083 bytes, suffix: .md)
  1 Keepers' Compendium\game\Crew.md (size: 410 bytes, suffix: .md)
  1 Keepers' Compendium\imgs\0000 Midjourney Cheat Sheet.md (size: 878 bytes, suffix: .md)
  1 Keepers' Compendium\rules\Dinheiros.md (size: 814 bytes, suffix: .md)
  1 Keepers' Compendium\rules\House Rules.md (size: 1663 bytes, suffix: .md)
  1 Keepers' Compendium\rules\Skill Challenge.md (size: 2322 bytes, suffix: .md)
  1 Keepers' Compendium\wiki\character\ALExA.md (size: 1886 bytes, suffix: .md)
  1 Keepers' Compendium\wiki\character\Andru Cozar.md (size: 1013 bytes, suffix: 

## Step 2: Build Vector Store

This creates embeddings for all chunks and builds a FAISS index for fast similarity search.


In [3]:
# Build or load the FAISS vector store
print("Building FAISS vector store...")
vector_store = build_or_load_faiss(chunks, INDEX_DIR, EMBED_MODEL)

print(f"Vector store created with {vector_store.index.ntotal} vectors")
print(f"Vector dimension: {vector_store.index.d}")


2025-10-15 17:16:50,272 - INFO - Using embedding model: nomic-embed-text
2025-10-15 17:16:50,273 - INFO - Index directory: .rag_index
2025-10-15 17:16:50,274 - INFO - Total chunks to process: 1777
  embeddings = OllamaEmbeddings(model=embed_model)
2025-10-15 17:16:50,276 - INFO - Embeddings initialized successfully
2025-10-15 17:16:50,277 - INFO - Building new FAISS index...


Building FAISS vector store...


2025-10-15 18:19:19,189 - INFO - Loading faiss with AVX2 support.
2025-10-15 18:19:20,478 - INFO - Successfully loaded faiss with AVX2 support.
2025-10-15 18:19:20,981 - INFO - FAISS index built in 3750.70 seconds
2025-10-15 18:19:20,982 - INFO - Total time (including embeddings): 3750.71 seconds
2025-10-15 18:19:20,983 - INFO - Vector store created with 1777 vectors
2025-10-15 18:19:20,984 - INFO - Vector dimension: 768


Vector store created with 1777 vectors
Vector dimension: 768


## Step 3: Test Similarity Search

Let's test the vector store by searching for similar content to your queries.


In [4]:
# Test similarity search
test_queries = [
    "What is the Rock of Bral?",
    "Tell me about the party members",
    "What are the faction relationships?",
    "How does spelljamming work?"
]

for query in test_queries:
    print(f"\n{'='*50}")
    print(f"QUERY: {query}")
    print(f"{'='*50}")
    
    # Search for similar documents
    docs = vector_store.similarity_search(query, k=TOP_K)
    
    for i, doc in enumerate(docs, 1):
        print(f"\n--- Result {i} ---")
        print(f"Source: {doc.metadata.get('source', 'Unknown')}")
        print(f"Page: {doc.metadata.get('page', 'N/A')}")
        print(f"Content: {doc.page_content[:300]}...")
        print(f"{'...' if len(doc.page_content) > 300 else ''}")



QUERY: What is the Rock of Bral?

--- Result 1 ---
Source: content\1 Keepers' Compendium\wiki\location\Bragora.md
Page: N/A
Content: ---
type: location
location_type: Quarter
parent:
  - "[[The Rock of Bral|Bral]]"
---
# The Lower City
Bral é quase um degradê. O acobreado da Città escurece conforme chega perto da linha de gravidade. A parte baixa da cidade é de cinzas e marrons - paredes de tijolo e passarelas de madeira molhada,...
...

--- Result 2 ---
Source: content\1 Keepers' Compendium\wiki\location\Torri di Corsario.md
Page: N/A
Content: ---
type: location
location_type: Building
parent:
  - "[[La Citta]]"
aliases:
  - Bral's Tower
  - Tower of Bral
  - Bral Donjon
appears_in: []
image: "[[Il Torre Di Corsario.png]]"
---
# Torri di Corsario

A massive bastion of ash-gray stone with bronze filigree and rotating gun-towers. A stocky g...
...

--- Result 3 ---
Source: content\1 Keepers' Compendium\wiki\location\The Rock of Bral.md
Page: N/A
Content: ---
title: The Rock of Bral
ali

## Step 4: Test Full RAG Chain

Now let's test the complete RAG system with the LLM.


In [5]:
# Create the RAG chain
print("Creating RAG chain...")
chain = make_rag_chain(vector_store, LLM_MODEL, TOP_K)
print("RAG chain created successfully!")

# Test queries
test_questions = [
    "Who are the main characters in the party?",
    "What is the current status of Bral?",
    "Tell me about the spelljammer ships",
    "What factions are involved in the campaign?"
]

for question in test_questions:
    print(f"\n{'='*60}")
    print(f"QUESTION: {question}")
    print(f"{'='*60}")
    
    try:
        # Run the RAG chain
        result = chain({"query": question})
        
        print(f"\nANSWER:")
        print(result["result"])
        
        # Show sources
        if "source_documents" in result:
            print(f"\nSOURCES:")
            for i, doc in enumerate(result["source_documents"], 1):
                print(f"{i}. {Path(doc.metadata.get('source', '')).name} (page {doc.metadata.get('page', 'N/A')})")
                print(f"   Preview: {doc.page_content[:150]}...")
    
    except Exception as e:
        print(f"Error: {e}")
    
    print(f"\n{'-'*60}")


Creating RAG chain...
RAG chain created successfully!

QUESTION: Who are the main characters in the party?


  llm = Ollama(model=llm_model, temperature=0.2)
  result = chain({"query": question})



ANSWER:
I don't know, as there is no information provided about the other members of the party besides Ismark, Phillip (who has Advantage on an Influence action), and Jordal Brambletopple.

SOURCES:
1. DnD 5.5e - Players Handbook 2024 - PHOTOSCAN2OCR.pdf (page 16)
   Preview: Russell: "All right, Ismark. You bought us drinks 
and told us about the Devil Strahd and your sister. 
How can we help?”
Influencing NPCs. Gareth tak...
2. DnD 5.5e - Players Handbook 2024 - PHOTOSCAN2OCR.pdf (page 14)
   Preview: to confess to wrongdoing or try to flatter a guard. 
The Dungeon Master assumes the roles of any non­
player characters who are participating.
An NPC’...
3. Captain Jordal Brambletopple.md (page N/A)
   Preview: ---
origin: Krynn
class: Rogue
race: Gnome
type: character
known_locations: []
factions:
  - "[[kbα1]]"
  - "[[Keepers of the Balance]]"
alignment: Ne...
4. DnD 5.5e - Players Handbook 2024 - PHOTOSCAN2OCR.pdf (page 13)
   Preview: does your character react to those situations?

## Step 5: Interactive Query Testing

Use this cell to test your own queries!


In [6]:
# Interactive query testing
def test_query(query):
    print(f"\n{'='*60}")
    print(f"QUERY: {query}")
    print(f"{'='*60}")
    
    try:
        # First, let's see what documents are retrieved
        docs = vector_store.similarity_search(query, k=TOP_K)
        print(f"\nRETRIEVED DOCUMENTS:")
        for i, doc in enumerate(docs, 1):
            print(f"\n{i}. {Path(doc.metadata.get('source', '')).name} (page {doc.metadata.get('page', 'N/A')})")
            print(f"   {doc.page_content[:200]}...")
        
        # Now run the full RAG chain
        print(f"\n{'='*40}")
        print("RAG RESPONSE:")
        print(f"{'='*40}")
        
        result = chain({"query": query})
        print(result["result"])
        
    except Exception as e:
        print(f"Error: {e}")

# Test your own queries here!
test_query("What is the Rock of Bral and who lives there?")
test_query("Tell me about the party's current mission")
test_query("What are the house rules for this campaign?")



QUERY: What is the Rock of Bral and who lives there?

RETRIEVED DOCUMENTS:

1. Bragora.md (page N/A)
   ---
type: location
location_type: Quarter
parent:
  - "[[The Rock of Bral|Bral]]"
---
# The Lower City
Bral é quase um degradê. O acobreado da Città escurece conforme chega perto da linha de gravidade...

2. Shou Town.md (page N/A)
   ---
type: location
location_type: Sub-district
parent: "[[Bragora]]"
appears_in: []
image: "[[bral shou town.png]]"
---

# Shou Town

**Shou Town** é o enclave cultural da diáspora Shou Lung em Bral. ...

3. Torri di Corsario.md (page N/A)
   ---
type: location
location_type: Building
parent:
  - "[[La Citta]]"
aliases:
  - Bral's Tower
  - Tower of Bral
  - Bral Donjon
appears_in: []
image: "[[Il Torre Di Corsario.png]]"
---
# Torri di Co...

4. Bianca Micharle.md (page N/A)
   ---
type: character
race: Human
class: ""
origin: ""
known_locations:
  - "[[The Rock of Bral|Bral]]"
factions:
  - "[[Monarquia Braliana]]"
alignment: ""
appears_in: []
relate

## Step 6: Analyze Your Content

Let's get some insights about your campaign content.


In [7]:
# Analyze the content
from collections import Counter

print("CONTENT ANALYSIS:")
print(f"Total documents: {len(chunks)}")

# Count documents by source
sources = [chunk.metadata.get('source', 'Unknown') for chunk in chunks]
source_counts = Counter(sources)

print(f"\nDocuments by source:")
for source, count in source_counts.most_common():
    print(f"  {Path(source).name}: {count} chunks")

# Average chunk size
avg_chunk_size = sum(len(chunk.page_content) for chunk in chunks) / len(chunks)
print(f"\nAverage chunk size: {avg_chunk_size:.0f} characters")

# Show some sample content types
print(f"\nSample content types:")
for i, chunk in enumerate(chunks[:5]):
    source_name = Path(chunk.metadata.get('source', '')).name
    preview = chunk.page_content[:100].replace('\n', ' ')
    print(f"  {i+1}. {source_name}: {preview}...")


CONTENT ANALYSIS:
Total documents: 1777

Documents by source:
  DnD 5.5e - Players Handbook 2024 - PHOTOSCAN2OCR.pdf: 1200 chunks
  Business Facilities.md: 52 chunks
  Map of Bragora.md: 49 chunks
  Map of La Città.md: 26 chunks
  Bastions.md: 13 chunks
  The Rock of Bral.md: 11 chunks
  Betsy.md: 10 chunks
  Tripulações e Oficiais (WIP).md: 10 chunks
  Bastions - Political Facilities.md: 10 chunks
  Cult of Elemental Evil.md: 8 chunks
  Faction Relations.md: 8 chunks
  Poderes Astrais.md: 7 chunks
  Space Combat.md: 7 chunks
  Bragora.md: 7 chunks
  La Citta.md: 6 chunks
  Map of Montevia.md: 6 chunks
  Crafting.md: 6 chunks
  Wildspace, o Mar Astral e as Correntes Astrais.md: 5 chunks
  Arsenale.md: 5 chunks
  Gifftown.md: 5 chunks
  Micelio.md: 5 chunks
  Muro d'Oro.md: 5 chunks
  Stillwater.md: 5 chunks
  The Phlogiston.md: 5 chunks
  AAR - Adv1 - Vax.md: 4 chunks
  Dockside.md: 4 chunks
  Dwarven District.md: 4 chunks
  Grand Bazaar.md: 4 chunks
  Greyhawk.md: 4 chunks
  Lo Strozz