# Comprehensive Evaluation

In this notebook, we'll learn how to evaluate RAG systems comprehensively using both offline and online metrics.

## Learning Objectives
By the end of this notebook, you will:
1. Implement comprehensive evaluation metrics for RAG systems
2. Learn about offline evaluation with ground truth data
3. Set up online evaluation and monitoring
4. Compare different RAG configurations
5. Understand how to interpret evaluation results


## Setup and Imports

Let's import the libraries we need for comprehensive evaluation of RAG systems.


In [None]:
# Standard library imports
import json
import time
import numpy as np
import pandas as pd
from pathlib import Path
from typing import List, Dict, Any
import matplotlib.pyplot as plt
import seaborn as sns
from collections import defaultdict

# Add project root to path
import sys
sys.path.append(str(Path.cwd().parent))

# Import our modules
from src.evaluation.evaluation_metrics import RAGEvaluator, RetrievalMetrics, GenerationMetrics, OnlineEvaluator
from src.retrieval.retrieval_system import RetrievalSystem, RetrievalConfig
from src.models.llm_models import RAGGenerator, PromptTemplate
from src.config import DATA_DIR

# Set up plotting
plt.style.use('default')
sns.set_palette("husl")

print("Libraries imported successfully!")

# Create sample evaluation data
print("Creating sample evaluation data...")

# Sample queries and ground truth
evaluation_data = {
    'queries': [
        "What is machine learning?",
        "How does deep learning work?", 
        "What is natural language processing?",
        "Explain computer vision",
        "What is reinforcement learning?"
    ],
    'ground_truth': [
        ['chunk1'],  # Machine learning chunk
        ['chunk2'],  # Deep learning chunk
        ['chunk3'],  # NLP chunk
        ['chunk4'],  # Computer vision chunk
        ['chunk5']   # Reinforcement learning chunk
    ],
    'reference_responses': [
        "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.",
        "Deep learning uses neural networks with multiple layers to process complex data patterns.",
        "Natural language processing is a field of AI that focuses on the interaction between computers and human language.",
        "Computer vision is a field of artificial intelligence that trains computers to interpret and understand visual information.",
        "Reinforcement learning is a type of machine learning where agents learn to make decisions through trial and error."
    ]
}

print(f"Created evaluation data with {len(evaluation_data['queries'])} queries")

# Create sample chunks for evaluation
sample_chunks = [
    {
        'id': 'chunk1',
        'text': 'Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.',
        'title': 'Machine Learning',
        'source': 'wikipedia',
        'chunk_id': 'chunk_1'
    },
    {
        'id': 'chunk2',
        'text': 'Deep learning uses neural networks with multiple layers to process complex data patterns.',
        'title': 'Deep Learning',
        'source': 'wikipedia',
        'chunk_id': 'chunk_2'
    },
    {
        'id': 'chunk3',
        'text': 'Natural language processing is a field of AI that focuses on the interaction between computers and human language.',
        'title': 'NLP',
        'source': 'wikipedia',
        'chunk_id': 'chunk_3'
    },
    {
        'id': 'chunk4',
        'text': 'Computer vision is a field of artificial intelligence that trains computers to interpret and understand visual information.',
        'title': 'Computer Vision',
        'source': 'wikipedia',
        'chunk_id': 'chunk_4'
    },
    {
        'id': 'chunk5',
        'text': 'Reinforcement learning is a type of machine learning where agents learn to make decisions through trial and error.',
        'title': 'Reinforcement Learning',
        'source': 'wikipedia',
        'chunk_id': 'chunk_5'
    }
]

print(f"Created {len(sample_chunks)} sample chunks for evaluation")
