# Multimodal RAG System Demo

This notebook demonstrates the capabilities of the Multimodal RAG System for Document Intelligence.

In [1]:
import sys
sys.path.insert(0, '..')

from pathlib import Path
import numpy as np

## 1. Configuration & Logging

In [2]:
from src.utils import get_config, get_logger

config = get_config()
logger = get_logger(__name__)

print(f"Embedding Model: {config.embedding.model_name}")
print(f"Embedding Dimension: {config.embedding.embedding_dim}")
print(f"Device: {config.embedding.device}")

Embedding Model: sentence-transformers/all-mpnet-base-v2
Embedding Dimension: 768
Device: cuda


## 2. Document Processing Pipeline

In [3]:
from src.preprocessing import TextChunker

# Initialize chunker (note: chunk_overlap, not overlap)
chunker = TextChunker(chunk_size=500, chunk_overlap=50)

sample_text = """
Machine learning is a subset of artificial intelligence that enables computers 
to learn from data without being explicitly programmed. Deep learning, a more 
advanced form of machine learning, uses neural networks with multiple layers to 
model complex patterns in data.

Natural Language Processing (NLP) is a field that combines linguistics and machine 
learning to enable computers to understand human language. Applications include 
sentiment analysis, machine translation, and question answering.
"""

chunks = chunker.chunk(sample_text)
print(f"Created {len(chunks)} chunks:")
for i, chunk in enumerate(chunks):
    print(f"\n[Chunk {i+1}] ({len(chunk.text)} chars)")
    print(chunk.text[:100] + "...")

2026-02-07 16:01:13 | [32mINFO[0m | TextChunker | Created 1 chunks
Created 1 chunks:

[Chunk 1] (502 chars)

Machine learning is a subset of artificial intelligence that enables computers 
to learn from data ...


## 3. Embedding Generation

In [4]:
from src.embeddings import CustomEmbedder

# Initialize embedder (use CPU for demo)
embedder = CustomEmbedder(device="cpu")

# Sample texts
texts = [
    "Machine learning enables computers to learn from data.",
    "Deep learning uses neural networks with many layers.",
    "The weather is sunny today."
]

# Generate embeddings
embeddings = embedder.encode(texts)

print(f"Embeddings shape: {embeddings.shape}")

# Calculate similarity
similarities = embedder.similarity(embeddings, embeddings)
print("\nSimilarity Matrix:")
print(np.round(similarities, 3))

2026-02-07 16:01:13 | [32mINFO[0m | CustomEmbedder | Loading embedding model: sentence-transformers/all-mpnet-base-v2


  from .autonotebook import tqdm as notebook_tqdm
Loading weights: 100%|█████████| 199/199 [00:00<00:00, 684.95it/s, Materializing param=pooler.dense.weight]
MPNetModel LOAD REPORT from: sentence-transformers/all-mpnet-base-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


2026-02-07 16:01:46 | [32mINFO[0m | CustomEmbedder | Model loaded: sentence-transformers/all-mpnet-base-v2 (dim=768, device=cpu)
Embeddings shape: (3, 768)

Similarity Matrix:
[[ 1.     0.534 -0.012]
 [ 0.534  1.    -0.003]
 [-0.012 -0.003  1.   ]]


## 4. Vector Search with FAISS

In [5]:
from src.retrieval.vector_db import FAISSVectorStore, Document

# Create documents
documents = [
    Document(id="1", text="Machine learning is a type of AI.", embedding=embeddings[0]),
    Document(id="2", text="Deep learning uses neural networks.", embedding=embeddings[1]),
    Document(id="3", text="The weather is nice.", embedding=embeddings[2]),
]

# Initialize FAISS store
store = FAISSVectorStore(embedding_dim=embeddings.shape[1], index_type="flat")
store.add_documents(documents)

print(f"Documents in store: {store.count}")

2026-02-07 16:01:47 | [32mINFO[0m | FAISSVectorStore | Initialized FAISS flat index
2026-02-07 16:01:47 | [32mINFO[0m | FAISSVectorStore | Added 3 documents to FAISS
Documents in store: 3


## 5. Evaluation Metrics

In [6]:
from src.evaluation import RetrievalMetrics, GenerationMetrics

# Retrieval evaluation
retrieval = RetrievalMetrics(k_values=[1, 3, 5])

retrieved = ["doc1", "doc3", "doc5", "doc2", "doc4"]
relevant = ["doc1", "doc2"]

metrics = retrieval.evaluate(retrieved, relevant)

print("Retrieval Metrics:")
for name, result in metrics.items():
    print(f"  {result}")

# Generation evaluation
gen_metrics = GenerationMetrics()

prediction = "Machine learning enables computers to learn from data."
reference = "Machine learning allows computers to learn from examples."

rouge_scores = gen_metrics.rouge(prediction, reference)
print("\nGeneration Metrics:")
for name, result in rouge_scores.items():
    print(f"  {result}")

Retrieval Metrics:
  Precision@1: 1.0000
  Recall@1: 0.5000
  NDCG@1: 1.0000
  Hit Rate@1: 1.0000
  Precision@3: 0.3333
  Recall@3: 0.5000
  NDCG@3: 0.6131
  Hit Rate@3: 1.0000
  Precision@5: 0.4000
  Recall@5: 1.0000
  NDCG@5: 0.8772
  Hit Rate@5: 1.0000
  MRR: 1.0000

Generation Metrics:
  ROUGE1: 0.7500
  ROUGE2: 0.5714
  ROUGEL: 0.7500


## 6. Hallucination Detection

In [7]:
from src.evaluation import HallucinationDetector

detector = HallucinationDetector()

sources = [
    "Machine learning is a subset of artificial intelligence.",
    "Deep learning uses neural networks with multiple layers."
]

grounded = "Machine learning is part of AI and uses data to learn."
hallucinated = "Machine learning was invented in 1850 by Charles Darwin."

print("Grounded answer:")
result = detector.detect_ngram_overlap(grounded, sources)
print(f"  Hallucinated: {result.is_hallucinated}")
print(f"  Overlap ratio: {result.details['overlap_ratio']:.2%}")

print("\nHallucinated answer:")
result = detector.detect_ngram_overlap(hallucinated, sources)
print(f"  Hallucinated: {result.is_hallucinated}")
print(f"  Overlap ratio: {result.details['overlap_ratio']:.2%}")

Grounded answer:
  Hallucinated: True
  Overlap ratio: 11.11%

Hallucinated answer:
  Hallucinated: True
  Overlap ratio: 0.00%


## 7. Experiment Tracking

In [8]:
from src.mlops import ExperimentTracker

# Initialize without MLflow for demo
tracker = ExperimentTracker(
    experiment_name="demo_experiment",
    use_mlflow=False
)

# Start run
run_id = tracker.start_run(run_name="demo_run")

# Log parameters
tracker.log_params({
    "model": "all-mpnet-base-v2",
    "chunk_size": 500,
    "top_k": 5
})

# Log metrics
tracker.log_metrics({
    "ndcg@5": 0.78,
    "mrr": 0.82,
    "latency_p50": 45.0
})

# End run
tracker.end_run()

# List runs
runs = tracker.list_runs()
print(f"Logged {len(runs)} runs")
print(f"Run ID: {runs[-1].run_id}")
print(f"Metrics: {runs[-1].metrics}")

2026-02-07 16:01:47 | [32mINFO[0m | ExperimentTracker | Started run: 20260207_160147_451012


[32mINFO[0m:ExperimentTracker:Started run: 20260207_160147_451012


2026-02-07 16:01:47 | [32mINFO[0m | ExperimentTracker | Ended run with status: FINISHED


[32mINFO[0m:ExperimentTracker:Ended run with status: FINISHED


Logged 1 runs
Run ID: 20260207_160147_451012
Metrics: {'ndcg@5': 0.78, 'mrr': 0.82, 'latency_p50': 45.0}


---
## Summary

This notebook demonstrated:
1. **Configuration & Logging** - Centralized settings management
2. **Document Processing** - Text chunking with overlap
3. **Embeddings** - Vector generation and similarity
4. **Vector Search** - FAISS-based retrieval
5. **Evaluation** - Retrieval and generation metrics
6. **Hallucination Detection** - N-gram overlap method
7. **Experiment Tracking** - MLflow-compatible logging

For full RAG pipeline usage, see the CLI: `python -m src.main --help`