# Milestone 2: Baseline RAG Pipeline

This notebook implements a baseline RAG (Retrieval-Augmented Generation) pipeline:
1. Load FAISS index from disk
2. Implement query → retrieval workflow
3. Format prompts with retrieved context
4. Generate answers using LLM
5. Evaluate baseline performance

## Setup

In [None]:
# Import required modules
import sys
sys.path.append('..')

from src import rag_baseline
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv()

print("RAG baseline module loaded successfully!")
print(f"OpenAI API Key present: {bool(os.getenv('OPENAI_API_KEY'))}")

## Step 1: Initialize Baseline RAG

Create an instance of the BaselineRAG system.

In [None]:
# Initialize RAG system
rag = rag_baseline.BaselineRAG(
    faiss_path="../faiss_index",
    model="gpt-3.5-turbo",
    top_k=3
)

print("✅ BaselineRAG initialized")
print(f"Model: {rag.model}")
print(f"Top K: {rag.top_k}")

## Step 2: Test Retrieval

Test the retrieval component independently.

In [None]:
# Test retrieval with sample query
query = "What is personal data according to GDPR?"

retrieved_docs = rag.retrieve(query)

print(f"Query: {query}")
print(f"\nRetrieved {len(retrieved_docs)} documents:")

for i, doc in enumerate(retrieved_docs, 1):
    print(f"\n{i}. Article {doc['metadata'].get('article', 'N/A')}")
    print(f"   Content: {doc['content'][:100]}...")
    print(f"   Score: {doc.get('score', 'N/A')}")

## Step 3: Test Prompt Formatting

See how the prompt is formatted with retrieved context.

In [None]:
# Format prompt
prompt = rag.format_prompt(query, retrieved_docs)

print("Formatted prompt:")
print("=" * 60)
print(prompt)
print("=" * 60)

## Step 4: Run Complete RAG Pipeline

Execute the full query pipeline: retrieve → format → generate.

In [None]:
# Run complete pipeline
result = rag.query(query)

print("RAG Pipeline Result:")
print("=" * 60)
print(f"Question: {result['question']}")
print(f"\nAnswer: {result['answer']}")
print(f"\nSources: {result['num_sources']} documents")
print("=" * 60)

## Step 5: Test with Multiple Queries

Evaluate the system with different types of questions.

In [None]:
# Test queries
test_queries = [
    "What is personal data according to GDPR?",
    "What are the main principles of GDPR?",
    "What rights do data subjects have?",
    "What is the territorial scope of GDPR?"
]

print("Testing multiple queries:")
print("=" * 60)

for i, q in enumerate(test_queries, 1):
    print(f"\n{i}. {q}")
    result = rag.query(q)
    print(f"   Answer: {result['answer'][:150]}...")
    print(f"   Sources: {result['num_sources']}")

## Step 6: Evaluation Metrics

Calculate basic performance metrics for the baseline system.

In [None]:
# Simple evaluation metrics
import time

eval_queries = test_queries[:2]  # Use subset for quick evaluation
latencies = []

print("Evaluating baseline performance:")

for query in eval_queries:
    start_time = time.time()
    result = rag.query(query)
    latency = time.time() - start_time
    latencies.append(latency)
    
    print(f"\nQuery: {query[:40]}...")
    print(f"Latency: {latency:.2f}s")
    print(f"Answer length: {len(result['answer'])} chars")

print(f"\nAverage latency: {sum(latencies)/len(latencies):.2f}s")

## Summary

In this notebook, we:
- ✅ Initialized the baseline RAG system
- ✅ Tested retrieval functionality
- ✅ Examined prompt formatting
- ✅ Ran complete RAG pipeline
- ✅ Evaluated with multiple queries
- ✅ Measured baseline performance metrics

Next: Proceed to `03_memory_integration.ipynb` to add conversational memory.