# Notebook 2: Baseline RAG Pipeline

This notebook demonstrates:
1. Loading the FAISS index
2. Implementing basic retrieval
3. Prompt formatting with context
4. LLM answer generation
5. Evaluation metrics

**Prerequisites**: Complete Notebook 1 (Data Preparation) first.

In [None]:
# Import required modules
import os
from dotenv import load_dotenv
from src.rag_baseline import BaselineRAG, evaluate_retrieval

# Load environment variables
load_dotenv()

print("✓ Imports successful")
print(f"OpenAI API Key configured: {'Yes' if os.getenv('OPENAI_API_KEY') else 'No (dry-run mode)'}")

## Step 1: Initialize Baseline RAG System

In [None]:
# Initialize RAG system
rag = BaselineRAG(
    faiss_path="faiss_index/",
    openai_api_key=os.getenv("OPENAI_API_KEY"),
    model="gpt-3.5-turbo",
    top_k=3
)

print("✓ Baseline RAG initialized")
print(f"Model: {rag.model}")
print(f"Top-K: {rag.top_k}")

## Step 2: Run Sample Queries

In [None]:
# Define test queries
test_queries = [
    "What are the data subject rights under GDPR?",
    "What is the right to erasure?",
    "What are the lawful bases for processing personal data?"
]

# Run queries
for query in test_queries:
    print(f"\n{'='*60}")
    print(f"Query: {query}")
    print(f"{'='*60}")
    
    result = rag.query(query)
    
    print(f"\nAnswer:\n{result['answer']}")
    print(f"\nSources: {result['num_sources']} documents")
    for i, source in enumerate(result['sources'], 1):
        print(f"  {i}. Article {source.get('article', '?')}, Page {source.get('page', '?')}")

## Step 3: Evaluate Retrieval Quality

In [None]:
# Retrieve documents for evaluation
query = "What are the data subject rights?"
retrieved_docs = rag.retrieve(query)

# Ground truth: Articles 15-22 cover data subject rights
ground_truth = [15, 16, 17, 18, 19, 20, 21, 22]

# Evaluate retrieval
metrics = evaluate_retrieval(retrieved_docs, ground_truth)

print(f"\nRetrieval Evaluation:")
print(f"  Precision: {metrics['precision']:.2%}")
print(f"  Recall: {metrics['recall']:.2%}")
print(f"  F1 Score: {metrics['f1']:.2%}")
print(f"\nRetrieved Articles: {metrics['retrieved_articles']}")
print(f"Ground Truth Articles: {metrics['ground_truth']}")

## Step 4: Analyze Prompt Structure

In [None]:
# Show the formatted prompt
query = "What is the right to data portability?"
docs = rag.retrieve(query)
prompt = rag.format_prompt(query, docs)

print("Formatted Prompt:")
print("="*60)
print(prompt)
print("="*60)

## Summary

In this notebook, we:
- ✓ Initialized a baseline RAG system
- ✓ Ran sample GDPR queries
- ✓ Evaluated retrieval quality
- ✓ Analyzed prompt structure

Next: Notebook 3 - Memory Integration with LangGraph