# Day 3 — Exercise 3: Hallucination and Citation Validation

## 🎯 **Learning Objective**
Detect and mitigate hallucinations in RAG outputs by implementing automated checks for hallucination and citation support, ensuring answers are grounded in retrieved documents with human-in-the-loop validation.

## 📋 **Exercise Structure & Navigation**

### **🧭 Navigation Guide**
| Section | What You'll Do | Expected Outcome | Time |
|---------|----------------|------------------|------|
| **Theory & Foundation** | Understand hallucination types and detection methods | Knowledge of validation frameworks | 15 min |
| **Simple Implementation** | Build basic hallucination detection | Working validation system | 30 min |
| **Intermediate Level** | Add automated citation validation | Advanced detection capabilities | 45 min |
| **Advanced Implementation** | Human-in-the-loop validation | Production-ready correction system | 30 min |
| **Enterprise Integration** | LightLLM hallucination detection | Complete validation pipeline | 20 min |

### **🔍 Code Block Navigation**
Each code block includes:
- **🎯 Purpose**: What the code accomplishes
- **📊 Expected Output**: What you should see
- **💡 Interpretation**: How to understand the results
- **⚠️ Troubleshooting**: Common issues and solutions

---

## 📚 **Theory & Foundation: Understanding Hallucination in RAG Systems**

### **What is Hallucination in RAG?**

**Hallucination** refers to the generation of information that is:
- **Factually incorrect**: Contradicts the retrieved context
- **Not grounded**: Not supported by the source documents
- **Fabricated**: Completely made up by the language model
- **Misleading**: Partially correct but contains errors

### **Types of Hallucinations**

#### **1. Contextual Hallucination**
- **Definition**: Information not present in the retrieved context
- **Example**: Answer mentions "Company X was founded in 1995" when context only says "Company X is a technology company"
- **Detection**: Check if claims are supported by retrieved documents

#### **2. Factual Hallucination**
- **Definition**: Incorrect factual information
- **Example**: "The capital of France is London" (should be Paris)
- **Detection**: Cross-reference with authoritative sources

#### **3. Temporal Hallucination**
- **Definition**: Incorrect time-related information
- **Example**: "The company launched in 2023" when context says "2022"
- **Detection**: Extract and validate temporal entities

#### **4. Numerical Hallucination**
- **Definition**: Incorrect numbers, statistics, or measurements
- **Example**: "Revenue increased by 150%" when context says "50%"
- **Detection**: Extract and validate numerical claims

### **Citation Validation Framework**

#### **Citation Quality Metrics**
1. **Citation Accuracy**: Do citations support the claims?
2. **Citation Completeness**: Are all claims supported by citations?
3. **Citation Relevance**: Are cited passages relevant to the claim?
4. **Citation Attribution**: Are sources properly attributed?

#### **Validation Approaches**
1. **Automated Detection**: Rule-based and ML-based methods
2. **Human-in-the-Loop**: Expert review and correction
3. **Hybrid Approach**: Combine automated and human validation

### **Enterprise Considerations**

#### **Risk Assessment**
- **High-Risk Domains**: Medical, legal, financial advice
- **Compliance Requirements**: Regulatory standards for accuracy
- **Reputation Management**: Brand protection through accuracy
- **Liability Reduction**: Minimizing misinformation risks

#### **Quality Assurance**
- **Validation Pipelines**: Automated quality checks
- **Human Review**: Expert validation for critical content
- **Audit Trails**: Documentation of validation decisions
- **Continuous Monitoring**: Ongoing quality assessment

---

## 🚀 **Simple Implementation: Basic Hallucination Detection**

### **Step 1: Setting Up the Environment**

**🎯 Purpose**: Import necessary libraries and set up the basic environment for hallucination detection.

**📊 Expected Output**: Confirmation that all libraries are imported and basic setup is complete.

**💡 Interpretation**: This establishes the foundation for our hallucination detection system.

**⚠️ Troubleshooting**: If any imports fail, install missing packages using `pip install package_name`.


In [1]:
# Import essential libraries for hallucination detection and citation validation
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import re
import json
from typing import List, Dict, Any, Tuple, Optional
from dataclasses import dataclass
from collections import defaultdict, Counter
import warnings
warnings.filterwarnings('ignore')

# Set up plotting style
plt.style.use('default')
sns.set_palette("husl")

print("✅ Libraries imported successfully!")
print(f"📅 Notebook initialized on: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print("🔍 Ready for hallucination detection and citation validation!")


✅ Libraries imported successfully!
📅 Notebook initialized on: 2025-09-20 19:00:52
🔍 Ready for hallucination detection and citation validation!


### **Step 2: Creating Sample Data for Hallucination Detection**

**🎯 Purpose**: Create realistic sample data that includes both accurate and hallucinated responses to test our detection systems.

**📊 Expected Output**: Sample queries, contexts, answers (some with hallucinations), and ground truth data for validation.

**💡 Interpretation**: 
- **Clean Answers**: Responses that are fully supported by context
- **Hallucinated Answers**: Responses containing unsupported claims
- **Mixed Answers**: Responses with both accurate and hallucinated information

**⚠️ Troubleshooting**: If you want to test with your own data, replace the sample data with your domain-specific examples.


In [2]:
# Create comprehensive sample data for hallucination detection testing
@dataclass
class ValidationSample:
    """Data class for storing validation samples."""
    query: str
    context: str
    answer: str
    citations: List[str]
    ground_truth: str
    hallucination_type: str  # 'none', 'contextual', 'factual', 'temporal', 'numerical'
    severity: str  # 'low', 'medium', 'high'
    expected_citations: List[str]

# Sample data with various types of hallucinations
sample_data = [
    ValidationSample(
        query="What is the company's revenue growth rate?",
        context="TechCorp reported revenue of $50 million in 2022, up from $40 million in 2021. The company's quarterly growth has been consistent at 15%.",
        answer="TechCorp achieved a revenue growth rate of 25% year-over-year, reaching $50 million in 2022. The company has maintained consistent quarterly growth.",
        citations=["TechCorp financial report 2022"],
        ground_truth="TechCorp achieved a revenue growth rate of 25% year-over-year (from $40M to $50M), reaching $50 million in 2022. The company has maintained consistent quarterly growth at 15%.",
        hallucination_type="numerical",
        severity="medium",
        expected_citations=["TechCorp financial report 2022"]
    ),
    
    ValidationSample(
        query="When was the product launched?",
        context="Our flagship product, DataFlow Pro, was launched in March 2023. It has since gained over 10,000 users.",
        answer="DataFlow Pro was launched in March 2023 and has gained over 10,000 users. The product was developed by a team of 15 engineers over 18 months.",
        citations=["Product launch announcement"],
        ground_truth="DataFlow Pro was launched in March 2023 and has gained over 10,000 users.",
        hallucination_type="contextual",
        severity="low",
        expected_citations=["Product launch announcement"]
    ),
    
    ValidationSample(
        query="What are the main features of the software?",
        context="The software includes real-time data processing, automated reporting, and cloud synchronization. It supports multiple data formats including CSV, JSON, and XML.",
        answer="The software includes real-time data processing, automated reporting, and cloud synchronization. It also features advanced machine learning algorithms and blockchain integration.",
        citations=["Software documentation"],
        ground_truth="The software includes real-time data processing, automated reporting, and cloud synchronization. It supports multiple data formats including CSV, JSON, and XML.",
        hallucination_type="contextual",
        severity="high",
        expected_citations=["Software documentation"]
    ),
    
    ValidationSample(
        query="Who is the CEO of the company?",
        context="Sarah Johnson has been the CEO of TechCorp since 2020. She previously worked at Microsoft for 10 years before joining our company.",
        answer="Sarah Johnson has been the CEO of TechCorp since 2020. She previously worked at Microsoft for 10 years before joining our company in 2019.",
        citations=["Company leadership page"],
        ground_truth="Sarah Johnson has been the CEO of TechCorp since 2020. She previously worked at Microsoft for 10 years before joining our company.",
        hallucination_type="temporal",
        severity="medium",
        expected_citations=["Company leadership page"]
    ),
    
    ValidationSample(
        query="What is the company's market share?",
        context="TechCorp holds a 12% market share in the enterprise software sector. The company has been growing steadily over the past three years.",
        answer="TechCorp holds a 12% market share in the enterprise software sector. The company has been growing steadily over the past three years.",
        citations=["Market research report 2023"],
        ground_truth="TechCorp holds a 12% market share in the enterprise software sector. The company has been growing steadily over the past three years.",
        hallucination_type="none",
        severity="none",
        expected_citations=["Market research report 2023"]
    ),
    
    ValidationSample(
        query="How many employees does the company have?",
        context="TechCorp currently employs 250 people across three offices. The company is planning to hire 50 more employees in 2024.",
        answer="TechCorp currently employs 250 people across three offices. The company has a diverse workforce with 60% of employees holding advanced degrees.",
        citations=["Company HR report"],
        ground_truth="TechCorp currently employs 250 people across three offices. The company is planning to hire 50 more employees in 2024.",
        hallucination_type="contextual",
        severity="medium",
        expected_citations=["Company HR report"]
    ),
    
    ValidationSample(
        query="What is the company's headquarters location?",
        context="TechCorp is headquartered in San Francisco, California. The company was founded in 2015 and has expanded to include offices in New York and London.",
        answer="TechCorp is headquartered in San Francisco, California. The company was founded in 2015 and has expanded to include offices in New York, London, and Tokyo.",
        citations=["Company website"],
        ground_truth="TechCorp is headquartered in San Francisco, California. The company was founded in 2015 and has expanded to include offices in New York and London.",
        hallucination_type="contextual",
        severity="medium",
        expected_citations=["Company website"]
    ),
    
    ValidationSample(
        query="What are the company's main competitors?",
        context="TechCorp's main competitors include DataSys Inc., CloudTech Solutions, and Analytics Pro. These companies compete in the enterprise data management space.",
        answer="TechCorp's main competitors include DataSys Inc., CloudTech Solutions, and Analytics Pro. The competitive landscape is dominated by these four companies.",
        citations=["Industry analysis report"],
        ground_truth="TechCorp's main competitors include DataSys Inc., CloudTech Solutions, and Analytics Pro. These companies compete in the enterprise data management space.",
        hallucination_type="contextual",
        severity="low",
        expected_citations=["Industry analysis report"]
    ),
    
    ValidationSample(
        query="What is the company's stock price?",
        context="TechCorp's stock (TECH) is currently trading at $45.50 per share. The stock has increased by 12% over the past month.",
        answer="TechCorp's stock (TECH) is currently trading at $45.50 per share. The stock has increased by 15% over the past month and is expected to reach $50 by year-end.",
        citations=["Financial market data"],
        ground_truth="TechCorp's stock (TECH) is currently trading at $45.50 per share. The stock has increased by 12% over the past month.",
        hallucination_type="numerical",
        severity="high",
        expected_citations=["Financial market data"]
    ),
    
    ValidationSample(
        query="What awards has the company received?",
        context="TechCorp received the 'Best Enterprise Software' award at the Tech Innovation Summit in 2023. The company was also recognized for its environmental sustainability efforts.",
        answer="TechCorp received the 'Best Enterprise Software' award at the Tech Innovation Summit in 2023. The company was also recognized for its environmental sustainability efforts and won the 'Green Technology' award.",
        citations=["Award announcements"],
        ground_truth="TechCorp received the 'Best Enterprise Software' award at the Tech Innovation Summit in 2023. The company was also recognized for its environmental sustainability efforts.",
        hallucination_type="contextual",
        severity="medium",
        expected_citations=["Award announcements"]
    )
]

print(f"✅ Sample data created with {len(sample_data)} validation samples")
print(f"📊 Hallucination types included:")
hallucination_counts = Counter([sample.hallucination_type for sample in sample_data])
for h_type, count in hallucination_counts.items():
    print(f"   • {h_type.title()}: {count} samples")

print(f"\n🎯 Severity distribution:")
severity_counts = Counter([sample.severity for sample in sample_data])
for severity, count in severity_counts.items():
    print(f"   • {severity.title()}: {count} samples")

print(f"\n📝 Sample data structure:")
print(f"   • Queries: {len(set(sample.query for sample in sample_data))}")
print(f"   • Contexts: {len(set(sample.context for sample in sample_data))}")
print(f"   • Citations: {len(set(cite for sample in sample_data for cite in sample.citations))}")
print("🔍 Ready for hallucination detection testing!")


✅ Sample data created with 10 validation samples
📊 Hallucination types included:
   • Numerical: 2 samples
   • Contextual: 6 samples
   • Temporal: 1 samples
   • None: 1 samples

🎯 Severity distribution:
   • Medium: 5 samples
   • Low: 2 samples
   • High: 2 samples
   • None: 1 samples

📝 Sample data structure:
   • Queries: 10
   • Contexts: 10
   • Citations: 10
🔍 Ready for hallucination detection testing!


### **Step 3: Basic Hallucination Detection System**

**🎯 Purpose**: Implement a basic hallucination detection system that can identify unsupported claims in generated answers.

**📊 Expected Output**: A working hallucination detector that can classify answers as clean or containing hallucinations.

**💡 Interpretation**: 
- **Hallucination Score**: 0.0 = no hallucinations, 1.0 = complete hallucination
- **Detection Confidence**: How confident the system is in its assessment
- **Supported Claims**: Claims that are backed by the retrieved context

**⚠️ Troubleshooting**: If detection accuracy is low, consider adjusting the similarity thresholds or adding more sophisticated NLP techniques.


In [3]:
class BasicHallucinationDetector:
    """
    Basic hallucination detection system for RAG outputs.
    
    This detector implements rule-based and similarity-based methods to identify
    hallucinations in generated answers by comparing them against retrieved context.
    """
    
    def __init__(self):
        self.detection_history = []
        self.performance_metrics = {}
        
    def tokenize_and_normalize(self, text: str) -> List[str]:
        """Tokenize and normalize text for comparison."""
        # Remove punctuation and convert to lowercase
        normalized = re.sub(r'[^\w\s]', ' ', text.lower())
        # Split into tokens and remove empty strings
        tokens = [token.strip() for token in normalized.split() if token.strip()]
        return tokens
    
    def calculate_jaccard_similarity(self, text1: str, text2: str) -> float:
        """Calculate Jaccard similarity between two texts."""
        tokens1 = set(self.tokenize_and_normalize(text1))
        tokens2 = set(self.tokenize_and_normalize(text2))
        
        if len(tokens1) == 0 and len(tokens2) == 0:
            return 1.0
        if len(tokens1) == 0 or len(tokens2) == 0:
            return 0.0
        
        intersection = len(tokens1.intersection(tokens2))
        union = len(tokens1.union(tokens2))
        
        return intersection / union if union > 0 else 0.0
    
    def extract_claims(self, text: str) -> List[str]:
        """Extract potential claims from text using simple heuristics."""
        # Split into sentences
        sentences = re.split(r'[.!?]+', text)
        claims = []
        
        for sentence in sentences:
            sentence = sentence.strip()
            if len(sentence) > 10:  # Filter out very short sentences
                # Look for factual statements (contain numbers, dates, or specific facts)
                if (re.search(r'\d+', sentence) or  # Contains numbers
                    re.search(r'\b(was|is|are|were|has|have|will|can|should|must)\b', sentence) or  # Contains factual verbs
                    re.search(r'\b(in|on|at|since|from|to)\b', sentence)):  # Contains temporal/location prepositions
                    claims.append(sentence)
        
        return claims
    
    def detect_numerical_hallucinations(self, answer: str, context: str) -> Dict[str, Any]:
        """Detect numerical hallucinations by comparing numbers in answer vs context."""
        # Extract numbers from answer and context
        answer_numbers = re.findall(r'\b\d+(?:\.\d+)?(?:%|million|billion|thousand)?\b', answer.lower())
        context_numbers = re.findall(r'\b\d+(?:\.\d+)?(?:%|million|billion|thousand)?\b', context.lower())
        
        # Check if answer numbers are present in context
        unsupported_numbers = []
        for num in answer_numbers:
            if num not in context_numbers:
                unsupported_numbers.append(num)
        
        hallucination_score = len(unsupported_numbers) / max(len(answer_numbers), 1)
        
        return {
            "has_numerical_hallucination": len(unsupported_numbers) > 0,
            "unsupported_numbers": unsupported_numbers,
            "numerical_hallucination_score": hallucination_score,
            "answer_numbers": answer_numbers,
            "context_numbers": context_numbers
        }
    
    def detect_temporal_hallucinations(self, answer: str, context: str) -> Dict[str, Any]:
        """Detect temporal hallucinations by comparing dates and time references."""
        # Extract temporal expressions
        temporal_patterns = [
            r'\b(19|20)\d{2}\b',  # Years
            r'\b(january|february|march|april|may|june|july|august|september|october|november|december)\b',
            r'\b\d{1,2}/\d{1,2}/\d{2,4}\b',  # Dates
            r'\b\d{1,2}-\d{1,2}-\d{2,4}\b',  # Dates with dashes
            r'\b(in|since|from|until|before|after)\s+\d{4}\b'  # Temporal prepositions with years
        ]
        
        answer_temporal = []
        context_temporal = []
        
        for pattern in temporal_patterns:
            answer_temporal.extend(re.findall(pattern, answer.lower()))
            context_temporal.extend(re.findall(pattern, context.lower()))
        
        # Check for unsupported temporal information
        unsupported_temporal = []
        for temp in answer_temporal:
            if temp not in context_temporal:
                unsupported_temporal.append(temp)
        
        hallucination_score = len(unsupported_temporal) / max(len(answer_temporal), 1)
        
        return {
            "has_temporal_hallucination": len(unsupported_temporal) > 0,
            "unsupported_temporal": unsupported_temporal,
            "temporal_hallucination_score": hallucination_score,
            "answer_temporal": answer_temporal,
            "context_temporal": context_temporal
        }
    
    def detect_contextual_hallucinations(self, answer: str, context: str) -> Dict[str, Any]:
        """Detect contextual hallucinations by analyzing claim support."""
        # Extract claims from answer
        claims = self.extract_claims(answer)
        
        # Check support for each claim
        unsupported_claims = []
        supported_claims = []
        
        for claim in claims:
            # Calculate similarity between claim and context
            similarity = self.calculate_jaccard_similarity(claim, context)
            
            # Threshold for considering a claim supported
            support_threshold = 0.3
            
            if similarity < support_threshold:
                unsupported_claims.append({
                    "claim": claim,
                    "similarity": similarity,
                    "reason": "low_similarity"
                })
            else:
                supported_claims.append({
                    "claim": claim,
                    "similarity": similarity
                })
        
        hallucination_score = len(unsupported_claims) / max(len(claims), 1)
        
        return {
            "has_contextual_hallucination": len(unsupported_claims) > 0,
            "unsupported_claims": unsupported_claims,
            "supported_claims": supported_claims,
            "contextual_hallucination_score": hallucination_score,
            "total_claims": len(claims)
        }
    
    def detect_hallucinations(self, query: str, context: str, answer: str) -> Dict[str, Any]:
        """
        Comprehensive hallucination detection for a given query, context, and answer.
        
        Args:
            query: Original query
            context: Retrieved context
            answer: Generated answer
            
        Returns:
            Dictionary with hallucination detection results
        """
        # Run different types of hallucination detection
        numerical_results = self.detect_numerical_hallucinations(answer, context)
        temporal_results = self.detect_temporal_hallucinations(answer, context)
        contextual_results = self.detect_contextual_hallucinations(answer, context)
        
        # Calculate overall hallucination score
        scores = [
            numerical_results["numerical_hallucination_score"],
            temporal_results["temporal_hallucination_score"],
            contextual_results["contextual_hallucination_score"]
        ]
        
        overall_score = np.mean([score for score in scores if score is not None])
        
        # Determine if hallucination is present
        has_hallucination = any([
            numerical_results["has_numerical_hallucination"],
            temporal_results["has_temporal_hallucination"],
            contextual_results["has_contextual_hallucination"]
        ])
        
        # Calculate confidence based on consistency of detection methods
        detection_methods = [
            numerical_results["has_numerical_hallucination"],
            temporal_results["has_temporal_hallucination"],
            contextual_results["has_contextual_hallucination"]
        ]
        
        confidence = sum(detection_methods) / len(detection_methods)
        
        # Store detection result
        detection_result = {
            "query": query,
            "answer": answer,
            "context": context,
            "has_hallucination": has_hallucination,
            "overall_hallucination_score": overall_score,
            "confidence": confidence,
            "numerical_detection": numerical_results,
            "temporal_detection": temporal_results,
            "contextual_detection": contextual_results,
            "timestamp": datetime.now().isoformat()
        }
        
        self.detection_history.append(detection_result)
        
        return detection_result
    
    def get_detection_summary(self) -> Dict[str, Any]:
        """Get summary statistics of hallucination detection performance."""
        if not self.detection_history:
            return {"error": "No detection history available"}
        
        total_detections = len(self.detection_history)
        hallucinations_detected = sum(1 for result in self.detection_history if result["has_hallucination"])
        
        avg_hallucination_score = np.mean([result["overall_hallucination_score"] for result in self.detection_history])
        avg_confidence = np.mean([result["confidence"] for result in self.detection_history])
        
        # Breakdown by hallucination type
        numerical_hallucinations = sum(1 for result in self.detection_history 
                                     if result["numerical_detection"]["has_numerical_hallucination"])
        temporal_hallucinations = sum(1 for result in self.detection_history 
                                    if result["temporal_detection"]["has_temporal_hallucination"])
        contextual_hallucinations = sum(1 for result in self.detection_history 
                                      if result["contextual_detection"]["has_contextual_hallucination"])
        
        return {
            "total_detections": total_detections,
            "hallucinations_detected": hallucinations_detected,
            "detection_rate": hallucinations_detected / total_detections,
            "average_hallucination_score": avg_hallucination_score,
            "average_confidence": avg_confidence,
            "hallucination_type_breakdown": {
                "numerical": numerical_hallucinations,
                "temporal": temporal_hallucinations,
                "contextual": contextual_hallucinations
            }
        }

# Initialize the basic hallucination detector
basic_detector = BasicHallucinationDetector()

print("✅ Basic hallucination detector initialized!")
print("🔍 Detection capabilities:")
print("   • Numerical hallucination detection")
print("   • Temporal hallucination detection") 
print("   • Contextual hallucination detection")
print("   • Overall hallucination scoring")
print("🎯 Ready for hallucination detection testing!")


✅ Basic hallucination detector initialized!
🔍 Detection capabilities:
   • Numerical hallucination detection
   • Temporal hallucination detection
   • Contextual hallucination detection
   • Overall hallucination scoring
🎯 Ready for hallucination detection testing!


### **Step 4: Testing Basic Hallucination Detection**

**🎯 Purpose**: Test our basic hallucination detector with the sample data to see how well it identifies different types of hallucinations.

**📊 Expected Output**: Detection results for each sample, showing which hallucinations were correctly identified and the confidence scores.

**💡 Interpretation**: 
- **High Detection Rate**: Good performance in identifying hallucinations
- **Low False Positives**: Few clean answers incorrectly flagged as hallucinations
- **Confidence Scores**: Indicate how certain the system is about its predictions

**⚠️ Troubleshooting**: If detection accuracy is low, consider adjusting thresholds or adding more sophisticated detection methods.


In [4]:
# Test basic hallucination detection on sample data
print("🧪 Testing Basic Hallucination Detection")
print("=" * 60)

detection_results = []

for i, sample in enumerate(sample_data, 1):
    print(f"\n📝 Sample {i}: {sample.query[:50]}...")
    
    # Run hallucination detection
    result = basic_detector.detect_hallucinations(
        query=sample.query,
        context=sample.context,
        answer=sample.answer
    )
    
    # Store result with ground truth for evaluation
    detection_results.append({
        'sample': sample,
        'detection_result': result,
        'ground_truth_type': sample.hallucination_type,
        'ground_truth_severity': sample.severity
    })
    
    # Display results
    print(f"   🎯 Ground Truth: {sample.hallucination_type.title()} ({sample.severity})")
    print(f"   🔍 Detected Hallucination: {'Yes' if result['has_hallucination'] else 'No'}")
    print(f"   📊 Overall Score: {result['overall_hallucination_score']:.3f}")
    print(f"   🎪 Confidence: {result['confidence']:.3f}")
    
    # Show specific detection details
    if result['numerical_detection']['has_numerical_hallucination']:
        print(f"   🔢 Numerical: {result['numerical_detection']['unsupported_numbers']}")
    
    if result['temporal_detection']['has_temporal_hallucination']:
        print(f"   📅 Temporal: {result['temporal_detection']['unsupported_temporal']}")
    
    if result['contextual_detection']['has_contextual_hallucination']:
        unsupported_count = len(result['contextual_detection']['unsupported_claims'])
        print(f"   📝 Contextual: {unsupported_count} unsupported claims")

# Calculate detection performance
print(f"\n📈 DETECTION PERFORMANCE ANALYSIS")
print("=" * 60)

# Count correct detections
correct_detections = 0
false_positives = 0
false_negatives = 0

for result in detection_results:
    sample = result['sample']
    detection = result['detection_result']
    
    # Ground truth: has hallucination (not 'none')
    ground_truth_has_hallucination = sample.hallucination_type != 'none'
    detected_hallucination = detection['has_hallucination']
    
    if ground_truth_has_hallucination and detected_hallucination:
        correct_detections += 1
    elif not ground_truth_has_hallucination and not detected_hallucination:
        correct_detections += 1
    elif not ground_truth_has_hallucination and detected_hallucination:
        false_positives += 1
    elif ground_truth_has_hallucination and not detected_hallucination:
        false_negatives += 1

total_samples = len(detection_results)
accuracy = correct_detections / total_samples
precision = correct_detections / (correct_detections + false_positives) if (correct_detections + false_positives) > 0 else 0
recall = correct_detections / (correct_detections + false_negatives) if (correct_detections + false_negatives) > 0 else 0
f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0

print(f"📊 Overall Performance:")
print(f"   • Accuracy: {accuracy:.3f} ({correct_detections}/{total_samples})")
print(f"   • Precision: {precision:.3f}")
print(f"   • Recall: {recall:.3f}")
print(f"   • F1 Score: {f1_score:.3f}")
print(f"   • False Positives: {false_positives}")
print(f"   • False Negatives: {false_negatives}")

# Performance by hallucination type
print(f"\n🎯 Performance by Hallucination Type:")
print("-" * 40)

hallucination_types = ['none', 'numerical', 'temporal', 'contextual']
for h_type in hallucination_types:
    type_samples = [r for r in detection_results if r['ground_truth_type'] == h_type]
    if type_samples:
        correct = sum(1 for r in type_samples 
                     if (h_type == 'none' and not r['detection_result']['has_hallucination']) or
                        (h_type != 'none' and r['detection_result']['has_hallucination']))
        type_accuracy = correct / len(type_samples)
        print(f"   • {h_type.title()}: {type_accuracy:.3f} ({correct}/{len(type_samples)})")

# Performance by severity
print(f"\n⚡ Performance by Severity:")
print("-" * 30)

severities = ['none', 'low', 'medium', 'high']
for severity in severities:
    severity_samples = [r for r in detection_results if r['ground_truth_severity'] == severity]
    if severity_samples:
        correct = sum(1 for r in severity_samples 
                     if (severity == 'none' and not r['detection_result']['has_hallucination']) or
                        (severity != 'none' and r['detection_result']['has_hallucination']))
        severity_accuracy = correct / len(severity_samples)
        print(f"   • {severity.title()}: {severity_accuracy:.3f} ({correct}/{len(severity_samples)})")

# Get summary from detector
summary = basic_detector.get_detection_summary()
print(f"\n📋 Detector Summary:")
print("-" * 25)
print(f"   • Total Detections: {summary['total_detections']}")
print(f"   • Hallucinations Detected: {summary['hallucinations_detected']}")
print(f"   • Detection Rate: {summary['detection_rate']:.3f}")
print(f"   • Average Score: {summary['average_hallucination_score']:.3f}")
print(f"   • Average Confidence: {summary['average_confidence']:.3f}")

print(f"\n✅ Basic hallucination detection testing completed!")
print("🎯 Ready to implement advanced citation validation!")


🧪 Testing Basic Hallucination Detection

📝 Sample 1: What is the company's revenue growth rate?...
   🎯 Ground Truth: Numerical (medium)
   🔍 Detected Hallucination: Yes
   📊 Overall Score: 0.444
   🎪 Confidence: 0.667
   🔢 Numerical: ['25']
   📝 Contextual: 2 unsupported claims

📝 Sample 2: When was the product launched?...
   🎯 Ground Truth: Contextual (low)
   🔍 Detected Hallucination: Yes
   📊 Overall Score: 0.300
   🎪 Confidence: 0.667
   🔢 Numerical: ['15', '18']
   📝 Contextual: 1 unsupported claims

📝 Sample 3: What are the main features of the software?...
   🎯 Ground Truth: Contextual (high)
   🔍 Detected Hallucination: No
   📊 Overall Score: 0.000
   🎪 Confidence: 0.000

📝 Sample 4: Who is the CEO of the company?...
   🎯 Ground Truth: Temporal (medium)
   🔍 Detected Hallucination: Yes
   📊 Overall Score: 0.194
   🎪 Confidence: 0.667
   🔢 Numerical: ['2019']
   📅 Temporal: ['in']

📝 Sample 5: What is the company's market share?...
   🎯 Ground Truth: None (none)
   🔍 Detected 

## 🔧 **Intermediate Level: Advanced Citation Validation and Automated Detection**

### **Step 5: Citation Validation System**

**🎯 Purpose**: Implement comprehensive citation validation to ensure claims are properly supported by cited sources.

**📊 Expected Output**: Advanced citation validator that can assess citation quality, completeness, and relevance.

**💡 Interpretation**: 
- **Citation Accuracy**: How well citations support the claims made
- **Citation Completeness**: Whether all claims have supporting citations
- **Citation Relevance**: How relevant cited passages are to the claims

**⚠️ Troubleshooting**: If citation scores are unexpectedly low, check if citation formats are consistent and properly formatted.


In [5]:
class CitationValidator:
    """
    Advanced citation validation system for RAG outputs.
    
    This validator assesses the quality, completeness, and relevance of citations
    in generated answers to ensure proper source attribution and claim support.
    """
    
    def __init__(self):
        self.validation_history = []
        self.citation_patterns = [
            r'\[(\d+)\]',  # [1], [2], etc.
            r'\(([^)]+)\)',  # (Source name), (Author, Year)
            r'According to ([^,]+),',  # According to Source,
            r'As stated in ([^,]+),',  # As stated in Source,
            r'Source: ([^\n]+)',  # Source: Name
            r'Reference: ([^\n]+)'  # Reference: Name
        ]
    
    def extract_citations(self, text: str) -> List[Dict[str, Any]]:
        """Extract citations from text using pattern matching."""
        citations = []
        
        for pattern in self.citation_patterns:
            matches = re.finditer(pattern, text, re.IGNORECASE)
            for match in matches:
                citation_text = match.group(1) if match.groups() else match.group(0)
                citations.append({
                    'text': citation_text,
                    'full_match': match.group(0),
                    'start': match.start(),
                    'end': match.end(),
                    'pattern_type': pattern
                })
        
        return citations
    
    def extract_claims_with_citations(self, text: str) -> List[Dict[str, Any]]:
        """Extract claims and their associated citations from text."""
        # Split text into sentences
        sentences = re.split(r'[.!?]+', text)
        claims_with_citations = []
        
        for sentence in sentences:
            sentence = sentence.strip()
            if len(sentence) > 10:
                # Extract citations in this sentence
                citations = self.extract_citations(sentence)
                
                # Check if sentence contains factual claims
                if self._contains_factual_claim(sentence):
                    claims_with_citations.append({
                        'sentence': sentence,
                        'citations': citations,
                        'has_citation': len(citations) > 0,
                        'citation_count': len(citations)
                    })
        
        return claims_with_citations
    
    def _contains_factual_claim(self, sentence: str) -> bool:
        """Check if a sentence contains factual claims."""
        # Look for factual indicators
        factual_indicators = [
            r'\b(is|are|was|were|has|have|will|can|should|must)\b',
            r'\b(\d+(?:\.\d+)?(?:%|million|billion|thousand)?)\b',  # Numbers
            r'\b(19|20)\d{2}\b',  # Years
            r'\b(january|february|march|april|may|june|july|august|september|october|november|december)\b'
        ]
        
        for pattern in factual_indicators:
            if re.search(pattern, sentence, re.IGNORECASE):
                return True
        
        return False
    
    def validate_citation_accuracy(self, claim: str, citation_text: str, context: str) -> Dict[str, Any]:
        """Validate if a citation accurately supports the claim."""
        # Calculate similarity between claim and context around citation
        citation_context = self._extract_citation_context(context, citation_text)
        
        if citation_context:
            similarity = self._calculate_semantic_similarity(claim, citation_context)
            
            # Check if citation mentions are present in context
            citation_mentioned = citation_text.lower() in context.lower()
            
            # Check if claim information is present in citation context
            claim_supported = self._check_claim_support(claim, citation_context)
            
            accuracy_score = (similarity * 0.4 + 
                            (1.0 if citation_mentioned else 0.0) * 0.3 + 
                            (1.0 if claim_supported else 0.0) * 0.3)
        else:
            accuracy_score = 0.0
            citation_mentioned = False
            claim_supported = False
        
        return {
            'accuracy_score': accuracy_score,
            'citation_mentioned': citation_mentioned,
            'claim_supported': claim_supported,
            'citation_context': citation_context
        }
    
    def _extract_citation_context(self, context: str, citation_text: str) -> str:
        """Extract context around a citation mention."""
        # Look for citation text in context (case-insensitive)
        context_lower = context.lower()
        citation_lower = citation_text.lower()
        
        if citation_lower in context_lower:
            # Find the position of citation in context
            start_pos = context_lower.find(citation_lower)
            # Extract surrounding context (100 characters before and after)
            context_start = max(0, start_pos - 100)
            context_end = min(len(context), start_pos + len(citation_text) + 100)
            return context[context_start:context_end]
        
        return ""
    
    def _calculate_semantic_similarity(self, text1: str, text2: str) -> float:
        """Calculate semantic similarity between two texts."""
        # Simple word overlap similarity (can be enhanced with embeddings)
        words1 = set(re.findall(r'\b\w+\b', text1.lower()))
        words2 = set(re.findall(r'\b\w+\b', text2.lower()))
        
        if len(words1) == 0 or len(words2) == 0:
            return 0.0
        
        intersection = len(words1.intersection(words2))
        union = len(words1.union(words2))
        
        return intersection / union if union > 0 else 0.0
    
    def _check_claim_support(self, claim: str, citation_context: str) -> bool:
        """Check if a claim is supported by citation context."""
        # Extract key information from claim
        claim_numbers = re.findall(r'\b\d+(?:\.\d+)?(?:%|million|billion|thousand)?\b', claim.lower())
        claim_years = re.findall(r'\b(19|20)\d{2}\b', claim.lower())
        claim_names = re.findall(r'\b[A-Z][a-z]+(?:\s+[A-Z][a-z]+)*\b', claim)
        
        # Check if key information appears in citation context
        context_lower = citation_context.lower()
        
        # Check numbers
        numbers_supported = all(num in context_lower for num in claim_numbers)
        
        # Check years
        years_supported = all(year in context_lower for year in claim_years)
        
        # Check names (at least one should be present)
        names_supported = any(name.lower() in context_lower for name in claim_names)
        
        # Overall support: at least 2 out of 3 criteria should be met
        support_criteria = [numbers_supported, years_supported, names_supported]
        support_count = sum(support_criteria)
        
        return support_count >= 2 or (len(claim_numbers) == 0 and len(claim_years) == 0 and len(claim_names) == 0)
    
    def validate_citation_completeness(self, answer: str) -> Dict[str, Any]:
        """Validate if all claims have supporting citations."""
        claims_with_citations = self.extract_claims_with_citations(answer)
        
        total_claims = len(claims_with_citations)
        claims_with_citations_count = sum(1 for claim in claims_with_citations if claim['has_citation'])
        
        completeness_score = claims_with_citations_count / total_claims if total_claims > 0 else 1.0
        
        # Identify uncited claims
        uncited_claims = [claim['sentence'] for claim in claims_with_citations if not claim['has_citation']]
        
        return {
            'completeness_score': completeness_score,
            'total_claims': total_claims,
            'cited_claims': claims_with_citations_count,
            'uncited_claims': uncited_claims,
            'citation_rate': claims_with_citations_count / total_claims if total_claims > 0 else 0
        }
    
    def validate_citation_relevance(self, answer: str, context: str) -> Dict[str, Any]:
        """Validate if citations are relevant to the claims."""
        claims_with_citations = self.extract_claims_with_citations(answer)
        relevance_scores = []
        
        for claim_data in claims_with_citations:
            if claim_data['has_citation']:
                claim = claim_data['sentence']
                citations = claim_data['citations']
                
                # Calculate relevance for each citation
                citation_relevances = []
                for citation in citations:
                    relevance_result = self.validate_citation_accuracy(claim, citation['text'], context)
                    citation_relevances.append(relevance_result['accuracy_score'])
                
                # Average relevance for this claim
                if citation_relevances:
                    claim_relevance = np.mean(citation_relevances)
                    relevance_scores.append(claim_relevance)
        
        overall_relevance = np.mean(relevance_scores) if relevance_scores else 0.0
        
        return {
            'overall_relevance_score': overall_relevance,
            'individual_relevance_scores': relevance_scores,
            'total_cited_claims': len(relevance_scores),
            'high_relevance_claims': sum(1 for score in relevance_scores if score > 0.7),
            'low_relevance_claims': sum(1 for score in relevance_scores if score < 0.3)
        }
    
    def comprehensive_citation_validation(self, query: str, context: str, answer: str) -> Dict[str, Any]:
        """Perform comprehensive citation validation."""
        # Run all validation checks
        completeness_result = self.validate_citation_completeness(answer)
        relevance_result = self.validate_citation_relevance(answer, context)
        
        # Extract all citations
        citations = self.extract_citations(answer)
        
        # Validate each citation for accuracy
        accuracy_results = []
        for citation in citations:
            accuracy_result = self.validate_citation_accuracy(answer, citation['text'], context)
            accuracy_results.append(accuracy_result)
        
        avg_accuracy = np.mean([result['accuracy_score'] for result in accuracy_results]) if accuracy_results else 0.0
        
        # Calculate overall citation quality score
        overall_score = (completeness_result['completeness_score'] * 0.3 + 
                        relevance_result['overall_relevance_score'] * 0.4 + 
                        avg_accuracy * 0.3)
        
        validation_result = {
            'query': query,
            'answer': answer,
            'context': context,
            'overall_citation_score': overall_score,
            'completeness': completeness_result,
            'relevance': relevance_result,
            'accuracy': {
                'average_accuracy': avg_accuracy,
                'individual_results': accuracy_results
            },
            'citations': citations,
            'timestamp': datetime.now().isoformat()
        }
        
        self.validation_history.append(validation_result)
        
        return validation_result
    
    def get_validation_summary(self) -> Dict[str, Any]:
        """Get summary of citation validation performance."""
        if not self.validation_history:
            return {"error": "No validation history available"}
        
        scores = [result['overall_citation_score'] for result in self.validation_history]
        completeness_scores = [result['completeness']['completeness_score'] for result in self.validation_history]
        relevance_scores = [result['relevance']['overall_relevance_score'] for result in self.validation_history]
        accuracy_scores = [result['accuracy']['average_accuracy'] for result in self.validation_history]
        
        return {
            'total_validations': len(self.validation_history),
            'average_overall_score': np.mean(scores),
            'average_completeness': np.mean(completeness_scores),
            'average_relevance': np.mean(relevance_scores),
            'average_accuracy': np.mean(accuracy_scores),
            'score_distribution': {
                'excellent': sum(1 for score in scores if score > 0.8),
                'good': sum(1 for score in scores if 0.6 <= score <= 0.8),
                'fair': sum(1 for score in scores if 0.4 <= score < 0.6),
                'poor': sum(1 for score in scores if score < 0.4)
            }
        }

# Initialize citation validator
citation_validator = CitationValidator()

print("✅ Citation validator initialized!")
print("🔍 Validation capabilities:")
print("   • Citation extraction and parsing")
print("   • Citation accuracy validation")
print("   • Citation completeness assessment")
print("   • Citation relevance evaluation")
print("   • Comprehensive citation scoring")
print("🎯 Ready for advanced citation validation!")


✅ Citation validator initialized!
🔍 Validation capabilities:
   • Citation extraction and parsing
   • Citation accuracy validation
   • Citation completeness assessment
   • Citation relevance evaluation
   • Comprehensive citation scoring
🎯 Ready for advanced citation validation!
