# Civic Image Anomaly Detection - Demo Notebook

This notebook demonstrates how to use the Civic Image Anomaly Detector for detecting urban infrastructure issues.

## 🎯 What We'll Cover
1. Setup and imports
2. Load the trained model
3. Process sample images
4. Visualize results
5. Analyze detection statistics

## 1. Setup and Imports

In [None]:
# Install required packages (run once)
# !pip install ultralytics opencv-python matplotlib seaborn pandas numpy pillow

import sys
import os
from pathlib import Path

# Add parent directory to path
sys.path.append(str(Path.cwd().parent))

# Core imports
import cv2
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from PIL import Image
import pandas as pd

# Project imports
from scripts.inference import CivicAnomalyDetector
from app.utils import calculate_detection_stats, create_detection_report

# Set up plotting
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
%matplotlib inline

print("✅ Setup complete!")

## 2. Initialize the Detector

In [None]:
# Initialize the civic anomaly detector
model_path = "../models/weights/civic_detector_final.pt"

# Check if custom model exists, otherwise use pretrained YOLOv8
if Path(model_path).exists():
    detector = CivicAnomalyDetector(model_path)
    print(f"✅ Loaded custom model: {model_path}")
else:
    print("⚠️  Custom model not found. Using pretrained YOLOv8n...")
    detector = CivicAnomalyDetector()
    print("✅ Loaded pretrained model")

# Display class information
print("\n🎯 Detection Classes:")
for class_id, class_name in detector.class_names.items():
    print(f"  {class_id}: {class_name.replace('_', ' ').title()}")

## 3. Helper Functions

In [None]:
def display_detection_results(image_path, result_image, detections, figsize=(15, 8)):
    """Display original and result images side by side"""
    
    # Load original image
    original = cv2.imread(str(image_path))
    original_rgb = cv2.cvtColor(original, cv2.COLOR_BGR2RGB)
    
    # Convert result to RGB
    result_rgb = cv2.cvtColor(result_image, cv2.COLOR_BGR2RGB)
    
    # Create subplot
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=figsize)
    
    # Original image
    ax1.imshow(original_rgb)
    ax1.set_title(f"Original Image\n{Path(image_path).name}", fontsize=14, fontweight='bold')
    ax1.axis('off')
    
    # Result image
    ax2.imshow(result_rgb)
    ax2.set_title(f"Detection Results\n{len(detections)} anomalies found", fontsize=14, fontweight='bold')
    ax2.axis('off')
    
    plt.tight_layout()
    plt.show()
    
    # Display detection details
    if detections:
        print("\n🔍 Detection Details:")
        for i, det in enumerate(detections, 1):
            print(f"  {i}. {det['class_name'].replace('_', ' ').title()}: {det['confidence']:.2f}")
    else:
        print("\n✅ No civic anomalies detected!")

def plot_detection_statistics(all_detections):
    """Plot detection statistics"""
    if not all_detections:
        print("No detections to analyze")
        return
    
    # Create DataFrame
    df = pd.DataFrame(all_detections)
    
    # Create subplots
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 10))
    
    # 1. Class distribution
    class_counts = df['class_name'].value_counts()
    ax1.pie(class_counts.values, labels=[name.replace('_', ' ').title() for name in class_counts.index], 
            autopct='%1.1f%%', startangle=90)
    ax1.set_title('Detection Distribution by Class', fontweight='bold')
    
    # 2. Confidence distribution
    ax2.hist(df['confidence'], bins=20, alpha=0.7, color='skyblue', edgecolor='black')
    ax2.set_xlabel('Confidence Score')
    ax2.set_ylabel('Frequency')
    ax2.set_title('Confidence Score Distribution', fontweight='bold')
    ax2.axvline(df['confidence'].mean(), color='red', linestyle='--', 
                label=f'Mean: {df["confidence"].mean():.2f}')
    ax2.legend()
    
    # 3. Box plot by class
    df_plot = df.copy()
    df_plot['class_name'] = df_plot['class_name'].str.replace('_', ' ').str.title()
    sns.boxplot(data=df_plot, x='class_name', y='confidence', ax=ax3)
    ax3.set_title('Confidence by Class', fontweight='bold')
    ax3.tick_params(axis='x', rotation=45)
    
    # 4. Detection counts by class
    class_counts.plot(kind='bar', ax=ax4, color='lightcoral')
    ax4.set_title('Detection Counts by Class', fontweight='bold')
    ax4.set_xlabel('Anomaly Type')
    ax4.set_ylabel('Count')
    ax4.tick_params(axis='x', rotation=45)
    
    plt.tight_layout()
    plt.show()
    
    # Print summary statistics
    print("\n📊 Summary Statistics:")
    print(f"Total detections: {len(df)}")
    print(f"Average confidence: {df['confidence'].mean():.2f}")
    print(f"Most common anomaly: {class_counts.index[0].replace('_', ' ').title()}")
    print(f"Confidence range: {df['confidence'].min():.2f} - {df['confidence'].max():.2f}")

print("✅ Helper functions defined!")

## 4. Process Sample Images

Let's process some sample images to demonstrate the detector's capabilities.

In [None]:
# Define sample image paths (add your own images here)
sample_images = [
    "../data/raw/sample1.jpg",
    "../data/raw/sample2.jpg", 
    "../data/raw/sample3.jpg"
]

# Check which images exist
existing_images = [img for img in sample_images if Path(img).exists()]

if not existing_images:
    print("⚠️  No sample images found in ../data/raw/")
    print("Please add some images to the data/raw/ directory to test the detector.")
    print("\n💡 You can download sample civic images from:")
    print("  - Google Images (search for 'potholes', 'garbage dump', etc.)")
    print("  - Unsplash.com")
    print("  - Your own photos")
else:
    print(f"✅ Found {len(existing_images)} sample images to process")
    for img in existing_images:
        print(f"  - {Path(img).name}")

In [None]:
# Process each sample image
all_detections = []
confidence_threshold = 0.5

print(f"🔍 Processing images with confidence threshold: {confidence_threshold}")
print("=" * 60)

for i, image_path in enumerate(existing_images, 1):
    print(f"\n📸 Processing image {i}/{len(existing_images)}: {Path(image_path).name}")
    
    # Run detection
    result_image, detections = detector.detect_anomalies(image_path, confidence_threshold)
    
    if result_image is not None:
        # Display results
        display_detection_results(image_path, result_image, detections)
        
        # Collect all detections for statistics
        all_detections.extend(detections)
        
        # Generate report
        if detections:
            report = create_detection_report(detections, image_path)
            print("\n📋 Detection Report:")
            print(report)
    else:
        print(f"❌ Failed to process {Path(image_path).name}")
    
    print("\n" + "-" * 60)

print(f"\n🎉 Processing complete! Total detections across all images: {len(all_detections)}")

## 5. Analyze Overall Statistics

In [None]:
# Plot comprehensive statistics
if all_detections:
    print("📊 Analyzing detection statistics across all processed images...")
    plot_detection_statistics(all_detections)
    
    # Additional analysis
    stats = calculate_detection_stats(all_detections)
    
    print("\n🔍 Detailed Analysis:")
    print(f"Images processed: {len(existing_images)}")
    print(f"Total anomalies detected: {stats['total_count']}")
    print(f"Average anomalies per image: {stats['total_count'] / len(existing_images):.1f}")
    print(f"Average confidence: {stats['avg_confidence']:.2f}")
    
    print("\n🏆 Top Detected Anomalies:")
    for class_name, count in sorted(stats['class_counts'].items(), key=lambda x: x[1], reverse=True):
        percentage = (count / stats['total_count']) * 100
        print(f"  {class_name.replace('_', ' ').title()}: {count} ({percentage:.1f}%)")
        
else:
    print("No detections found to analyze. Try:")
    print("1. Adding more sample images to data/raw/")
    print("2. Lowering the confidence threshold")
    print("3. Using images with visible civic issues")

## 6. Interactive Detection

Try different confidence thresholds to see how they affect detection results.

In [None]:
# Interactive confidence threshold testing
if existing_images:
    test_image = existing_images[0]  # Use first available image
    confidence_levels = [0.3, 0.5, 0.7, 0.9]
    
    print(f"🧪 Testing different confidence thresholds on: {Path(test_image).name}")
    print("=" * 70)
    
    results_by_confidence = {}
    
    for conf in confidence_levels:
        result_image, detections = detector.detect_anomalies(test_image, conf)
        results_by_confidence[conf] = len(detections)
        
        print(f"\nConfidence {conf}: {len(detections)} detections")
        if detections:
            for det in detections:
                print(f"  - {det['class_name'].replace('_', ' ').title()}: {det['confidence']:.2f}")
    
    # Plot confidence threshold analysis
    plt.figure(figsize=(10, 6))
    plt.plot(confidence_levels, list(results_by_confidence.values()), 
             marker='o', linewidth=2, markersize=8)
    plt.xlabel('Confidence Threshold')
    plt.ylabel('Number of Detections')
    plt.title('Detection Count vs Confidence Threshold')
    plt.grid(True, alpha=0.3)
    plt.show()
    
    print("\n💡 Observations:")
    print("- Lower thresholds detect more objects but may include false positives")
    print("- Higher thresholds are more conservative but may miss some detections")
    print("- Choose threshold based on your use case (precision vs recall)")

## 7. Model Performance Analysis

In [None]:
# Analyze model performance if validation data is available
val_images_dir = Path("../data/processed/val/images")

if val_images_dir.exists() and any(val_images_dir.iterdir()):
    print("📊 Running model validation on validation set...")
    
    # Get validation images
    val_images = list(val_images_dir.glob("*.jpg")) + list(val_images_dir.glob("*.png"))
    
    if val_images:
        print(f"Found {len(val_images)} validation images")
        
        # Process a subset for demo (first 5 images)
        sample_val_images = val_images[:5]
        
        val_detections = []
        processing_times = []
        
        for img_path in sample_val_images:
            import time
            start_time = time.time()
            
            result_image, detections = detector.detect_anomalies(str(img_path), 0.5)
            
            end_time = time.time()
            processing_time = end_time - start_time
            processing_times.append(processing_time)
            
            if detections:
                val_detections.extend(detections)
        
        # Performance metrics
        avg_processing_time = np.mean(processing_times)
        fps = 1 / avg_processing_time
        
        print(f"\n⚡ Performance Metrics:")
        print(f"Average processing time: {avg_processing_time:.3f} seconds")
        print(f"Frames per second (FPS): {fps:.1f}")
        print(f"Total detections in validation sample: {len(val_detections)}")
        print(f"Average detections per image: {len(val_detections) / len(sample_val_images):.1f}")
        
        # Plot processing times
        plt.figure(figsize=(10, 4))
        plt.bar(range(len(processing_times)), processing_times, alpha=0.7)
        plt.axhline(avg_processing_time, color='red', linestyle='--', 
                   label=f'Average: {avg_processing_time:.3f}s')
        plt.xlabel('Image Index')
        plt.ylabel('Processing Time (seconds)')
        plt.title('Processing Time per Image')
        plt.legend()
        plt.show()
        
else:
    print("⚠️  No validation images found. To run performance analysis:")
    print("1. Add images to data/processed/val/images/")
    print("2. Or run the data preparation script first")

## 8. Export Results

In [None]:
# Export results to files
if all_detections:
    # Create results directory
    results_dir = Path("../results")
    results_dir.mkdir(exist_ok=True)
    
    # Export to CSV
    df_results = pd.DataFrame(all_detections)
    csv_path = results_dir / "detection_results.csv"
    df_results.to_csv(csv_path, index=False)
    print(f"✅ Results exported to: {csv_path}")
    
    # Export summary statistics
    stats = calculate_detection_stats(all_detections)
    
    summary_text = f"""
CIVIC ANOMALY DETECTION SUMMARY
==============================

Analysis Date: {pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')}
Images Processed: {len(existing_images)}
Total Detections: {stats['total_count']}
Average Confidence: {stats['avg_confidence']:.2f}

DETECTION BREAKDOWN:
"""
    
    for class_name, count in stats['class_counts'].items():
        percentage = (count / stats['total_count']) * 100
        summary_text += f"- {class_name.replace('_', ' ').title()}: {count} ({percentage:.1f}%)\n"
    
    summary_text += f"""
CONFIDENCE DISTRIBUTION:
- High confidence (>0.8): {stats['confidence_distribution']['high (>0.8)']}
- Medium confidence (0.5-0.8): {stats['confidence_distribution']['medium (0.5-0.8)']}
- Low confidence (<0.5): {stats['confidence_distribution']['low (<0.5)']}
"""
    
    summary_path = results_dir / "detection_summary.txt"
    with open(summary_path, 'w') as f:
        f.write(summary_text)
    
    print(f"✅ Summary exported to: {summary_path}")
    print("\n📁 Results saved in ../results/ directory")
    
else:
    print("No results to export. Process some images first!")

## 🎉 Conclusion

This notebook demonstrated the key capabilities of the Civic Image Anomaly Detector:

### ✅ What We Accomplished
1. **Model Loading**: Successfully loaded and initialized the detector
2. **Image Processing**: Processed sample images and visualized results
3. **Statistical Analysis**: Analyzed detection patterns and confidence scores
4. **Performance Testing**: Measured processing speed and accuracy
5. **Results Export**: Saved results for further analysis

### 🚀 Next Steps
1. **Add More Images**: Test with diverse civic images
2. **Fine-tune Model**: Train on custom dataset for better accuracy
3. **Deploy Application**: Use Streamlit app or API for production
4. **Integration**: Connect with civic reporting systems

### 💡 Tips for Better Results
- Use high-quality, well-lit images
- Ensure anomalies are clearly visible
- Adjust confidence threshold based on use case
- Train on domain-specific data for best performance

---

**Happy detecting! 🏙️✨**