# YOLO Object Detection Analysis

This notebook runs YOLO object detection on images filtered by date range and analyzes the results.
It serves as a test bed for features that will eventually be integrated into the database.

## Features
- Filter images by date range and originals only
- Run YOLOv8 object detection on filtered images
- Build comprehensive DataFrame with all detection metadata
- Visualize sample detections with bounding boxes
- Generate summary statistics and analysis

## Configuration

Set your parameters below:

In [None]:
# ==========================================
# CONFIGURATION PARAMETERS
# ==========================================

from datetime import datetime

# Date range for filtering images
START_DATE = datetime(2024, 1, 1)
END_DATE = datetime(2024, 6, 30)

# YOLO model configuration
YOLO_MODEL = 'yolov8x-seg.pt'  # Segmentation model - provides pixel-level masks + bounding boxes
                                # Options: yolov8n-seg, yolov8s-seg, yolov8m-seg, yolov8l-seg, yolov8x-seg
                                # (n=fastest/smallest, x=slowest/most accurate)
CONFIDENCE_THRESHOLD = 0.25  # Minimum confidence for detections (0.0 - 1.0)
IOU_THRESHOLD = 0.45  # IoU threshold for NMS (non-maximum suppression)

# Processing options
MAX_IMAGES = None  # Set to a number to limit processing (None = process all)
RESIZE_FOR_YOLO = 1280  # Resize images to this size for YOLO (standard is 640)

# Optional additional filters (set to None to disable)
MIN_RATING = None  # e.g., 4 to only process 4-5 star images
CAMERA_MAKE = None  # e.g., 'Canon' to filter by camera
FILE_EXTENSIONS = None  # e.g., ['.jpg', '.jpeg'] to filter by type

print("Configuration:")
print(f"  Date range: {START_DATE.date()} to {END_DATE.date()}")
print(f"  YOLO model: {YOLO_MODEL} (Segmentation)")
print(f"  Confidence threshold: {CONFIDENCE_THRESHOLD}")
print(f"  Max images: {MAX_IMAGES or 'All'}")


## Setup and Imports

In [None]:
# Enable auto-reload for development
%load_ext autoreload
%autoreload 2

print("Auto-reload enabled!")


In [None]:
# Standard library imports
import sys
from pathlib import Path
from datetime import datetime
from typing import List, Dict, Any
import warnings
import os

# Add package to path
package_root = Path().resolve().parent
if str(package_root) not in sys.path:
    sys.path.insert(0, str(package_root))

# Data analysis
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from PIL import Image

# PyTorch (for GPU configuration)
import torch

# Database
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

# Home Media AI
from home_media_ai import Media, MediaType, MediaQuery
from home_media_ai.config import get_config
from home_media_ai.io import read_image_as_array

# YOLO
from ultralytics import YOLO

warnings.filterwarnings('ignore')
print("Imports successful!")


## GPU Configuration

Check available GPUs and select which one to use for YOLO inference.

In [None]:
# ==========================================
# GPU CONFIGURATION
# ==========================================

# Check if CUDA is available
print("GPU Information:")
print("=" * 60)

if torch.cuda.is_available():
    num_gpus = torch.cuda.device_count()
    print(f"✓ CUDA is available")
    print(f"✓ Found {num_gpus} GPU(s)\n")
    
    # List all available GPUs
    for i in range(num_gpus):
        gpu_name = torch.cuda.get_device_name(i)
        gpu_memory = torch.cuda.get_device_properties(i).total_memory / 1024**3  # Convert to GB
        print(f"GPU {i}: {gpu_name}")
        print(f"  Memory: {gpu_memory:.2f} GB")
        print(f"  Capability: {torch.cuda.get_device_capability(i)}")
        print()
    
    # SET WHICH GPU TO USE HERE
    # ==========================================
    GPU_ID = 0  # Change this to select a different GPU (0, 1, 2, etc.)
    # ==========================================
    
    # Set the GPU device
    torch.cuda.set_device(GPU_ID)
    os.environ['CUDA_VISIBLE_DEVICES'] = str(GPU_ID)
    
    selected_gpu = torch.cuda.get_device_name(GPU_ID)
    print(f"✓ Selected GPU {GPU_ID}: {selected_gpu}")
    print(f"  This GPU will be used for YOLO inference")
    
    # Test GPU
    test_tensor = torch.zeros(1).cuda()
    current_device = torch.cuda.current_device()
    print(f"✓ GPU {current_device} is ready and accessible")
    
    DEVICE = f'cuda:{GPU_ID}'
    
else:
    print("⚠ CUDA is not available")
    print("  YOLO will run on CPU (much slower)")
    DEVICE = 'cpu'

print("=" * 60)
print(f"\nYOLO will use device: {DEVICE}")


## Verify GPU Usage

Run this cell to check actual GPU memory usage and confirm YOLO is using the correct GPU.

In [None]:
# Check GPU memory usage to verify GPU is being used
if torch.cuda.is_available():
    print("Current GPU Memory Usage:")
    print("=" * 60)
    
    for i in range(torch.cuda.device_count()):
        # Get memory info
        allocated = torch.cuda.memory_allocated(i) / 1024**2  # MB
        reserved = torch.cuda.memory_reserved(i) / 1024**2    # MB
        total = torch.cuda.get_device_properties(i).total_memory / 1024**2  # MB
        
        print(f"GPU {i}: {torch.cuda.get_device_name(i)}")
        print(f"  Allocated: {allocated:.2f} MB")
        print(f"  Reserved:  {reserved:.2f} MB")
        print(f"  Total:     {total:.2f} MB")
        
        if i == GPU_ID:
            print(f"  ← This is your selected GPU")
        print()
    
    # Show which device PyTorch is currently using
    print(f"PyTorch current device: {torch.cuda.current_device()}")
    print(f"PyTorch device name: {torch.cuda.get_device_name(torch.cuda.current_device())}")
    print("=" * 60)
    
    print("\n💡 TIP: Watch the 'Allocated' memory - it should increase significantly")
    print("   when YOLO processes images on the GPU.")
else:
    print("CUDA not available")


## GPU Troubleshooting

If Task Manager shows the wrong GPU being used, here are some things to check:

In [None]:
print("GPU TROUBLESHOOTING GUIDE")
print("=" * 60)
print()
print("1. TASK MANAGER GPU COLUMN:")
print("   - In Task Manager, click 'Performance' tab")
print("   - Look at each GPU individually")
print("   - Check 'CUDA' or 'Compute_0' usage (not just '3D')")
print("   - Intel iGPU might show activity for display rendering only")
print()
print("2. VERIFY WITH NVIDIA-SMI:")
print("   - Open Command Prompt or PowerShell")
print("   - Run: nvidia-smi")
print("   - Look for 'python.exe' process using GPU memory")
print("   - This is the most reliable way to check!")
print()
print("3. GPU PROCESS ASSIGNMENT:")
print("   - Right-click on python.exe in Task Manager 'Details' tab")
print("   - Select 'Change graphics preference'")
print("   - Choose 'High performance' (forces discrete GPU)")
print()
print("4. CHECK HYBRID GRAPHICS:")
print("   - Some laptops use 'Optimus' which routes through iGPU")
print("   - The dGPU does the compute, iGPU does the display")
print("   - This is normal and doesn't affect performance!")
print()
print("=" * 60)

# Provide a command to check nvidia-smi
print("\n💡 RUN THIS COMMAND in a separate terminal to monitor GPU:")
print("   nvidia-smi -l 1")
print("   (Updates every 1 second - watch for python.exe)")
print()
print("   Or for a single check:")
print("   nvidia-smi --query-compute-apps=pid,process_name,used_memory --format=csv")


In [None]:
# Run nvidia-smi directly from notebook to check GPU usage
import subprocess

try:
    print("NVIDIA-SMI Output:")
    print("=" * 60)
    result = subprocess.run(['nvidia-smi'], capture_output=True, text=True, timeout=5)
    print(result.stdout)
    
    print("\nActive CUDA Processes:")
    print("=" * 60)
    result2 = subprocess.run(
        ['nvidia-smi', '--query-compute-apps=pid,process_name,used_memory', '--format=csv'],
        capture_output=True, text=True, timeout=5
    )
    print(result2.stdout)
    
except FileNotFoundError:
    print("⚠ nvidia-smi not found in PATH")
    print("  This is normal if NVIDIA drivers aren't installed or accessible")
except subprocess.TimeoutExpired:
    print("⚠ nvidia-smi command timed out")
except Exception as e:
    print(f"⚠ Error running nvidia-smi: {e}")


## Database Connection

In [None]:
# Load configuration and connect to database
config = get_config()
print(f"Database URI: {config.database.uri}")

if config.database.uri:
    engine = create_engine(config.database.uri)
    Session = sessionmaker(bind=engine)
    session = Session()
    print("✓ Database connection established!")
else:
    raise RuntimeError("No database URI configured. Set in config.yaml or HOME_MEDIA_AI_URI environment variable.")


## Query Images

Use MediaQuery to filter images by date range and originals only.

In [None]:
# Build query with filters
query = MediaQuery(session)

# Apply date range and originals filter
query = query.date_range(START_DATE, END_DATE).originals_only()

# Apply optional filters
if MIN_RATING is not None:
    query = query.rating_min(MIN_RATING)
    print(f"✓ Filtered to rating >= {MIN_RATING}")

if CAMERA_MAKE is not None:
    query = query.camera_make(CAMERA_MAKE)
    print(f"✓ Filtered to camera make: {CAMERA_MAKE}")

if FILE_EXTENSIONS is not None:
    # Apply extension filter manually since we might have multiple
    for ext in FILE_EXTENSIONS:
        query = query.extension(ext)
    print(f"✓ Filtered to extensions: {FILE_EXTENSIONS}")

# Sort by date for consistent processing
query = query.sort_by_date(ascending=True)

# Apply limit if specified
if MAX_IMAGES is not None:
    query = query.limit(MAX_IMAGES)
    print(f"✓ Limited to {MAX_IMAGES} images")

# Get results
images = query.all()

print(f"\n{'='*60}")
print(f"Found {len(images)} images matching criteria")
print(f"{'='*60}")

if images:
    print(f"\nFirst few results:")
    for i, media in enumerate(images[:5]):
        print(f"  {i+1}. {media.filename} - {media.created.strftime('%Y-%m-%d')}")
    if len(images) > 5:
        print(f"  ... and {len(images) - 5} more")
else:
    print("\n⚠ No images found. Try adjusting your filters.")


## Load YOLO Model

Load the YOLOv8 model for object detection.

In [None]:
# Load YOLO model (will download if not already cached)
print(f"Loading YOLO model: {YOLO_MODEL}...")
model = YOLO(YOLO_MODEL)

# Move model to selected device
model.to(DEVICE)

# Get class names
class_names = model.names
print(f"✓ Model loaded successfully!")
print(f"✓ Model moved to device: {DEVICE}")
print(f"\nModel can detect {len(class_names)} classes:")
print(f"  {', '.join(list(class_names.values())[:20])}...")
print(f"\nModel configuration:")
print(f"  Device: {DEVICE}")
print(f"  Confidence threshold: {CONFIDENCE_THRESHOLD}")
print(f"  IoU threshold: {IOU_THRESHOLD}")
print(f"  Input size: {RESIZE_FOR_YOLO}px")


## Load YOLO Model (with GPU verification)

Load the YOLOv8 model and verify it's on the correct GPU.

In [None]:
# Show GPU memory BEFORE loading model
if torch.cuda.is_available():
    mem_before = torch.cuda.memory_allocated(GPU_ID) / 1024**2
    print(f"GPU memory before loading: {mem_before:.2f} MB\n")

# Load YOLO model (will download if not already cached)
print(f"Loading YOLO model: {YOLO_MODEL}...")
model = YOLO(YOLO_MODEL)

# Move model to selected device
print(f"Moving model to {DEVICE}...")
model.to(DEVICE)

# Show GPU memory AFTER loading model
if torch.cuda.is_available():
    mem_after = torch.cuda.memory_allocated(GPU_ID) / 1024**2
    mem_used = mem_after - mem_before
    print(f"\nGPU memory after loading: {mem_after:.2f} MB")
    print(f"Model uses: {mem_used:.2f} MB on GPU")
    
    if mem_used < 10:
        print("\n⚠ WARNING: Model memory usage is very low!")
        print("  The model might not actually be on the GPU.")
        print("  Check that DEVICE is set correctly.")

# Get class names
class_names = model.names
print(f"\n✓ Model loaded successfully!")
print(f"✓ Model device: {DEVICE}")
print(f"\nModel can detect {len(class_names)} classes:")
print(f"  {', '.join(list(class_names.values())[:20])}...")
print(f"\nModel configuration:")
print(f"  Device: {DEVICE}")
print(f"  Confidence threshold: {CONFIDENCE_THRESHOLD}")
print(f"  IoU threshold: {IOU_THRESHOLD}")
print(f"  Input size: {RESIZE_FOR_YOLO}px")


In [None]:
from tqdm.auto import tqdm

# Storage for all detection data
detection_records = []
processing_errors = []

print(f"Processing {len(images)} images through YOLO...")
print(f"Using device: {DEVICE}\n")

for idx, media in enumerate(tqdm(images, desc="Running YOLO")):
    try:
        # Read image
        img_array = media.read_as_array()
        
        # Convert to uint8 if needed
        if img_array.dtype == np.uint16:
            img_array = (img_array / 256).astype(np.uint8)
        
        # Run YOLO detection with explicit device specification
        results = model.predict(
            img_array,
            conf=CONFIDENCE_THRESHOLD,
            iou=IOU_THRESHOLD,
            imgsz=RESIZE_FOR_YOLO,
            device=DEVICE,
            verbose=False
        )
        
        # Extract detection data from results
        result = results[0]  # Single image result
        boxes = result.boxes
        masks = result.masks  # Segmentation masks (if available)
        
        # Count detections by class
        class_counts = {}
        confidence_scores = []
        all_detections = []
        
        if len(boxes) > 0:
            for i, box in enumerate(boxes):
                # Extract box data
                cls_id = int(box.cls.item())
                cls_name = class_names[cls_id]
                confidence = float(box.conf.item())
                bbox = box.xyxy[0].tolist()  # [x1, y1, x2, y2]
                
                # Track counts
                class_counts[cls_name] = class_counts.get(cls_name, 0) + 1
                confidence_scores.append(confidence)
                
                # Extract mask data if available
                mask_data = {}
                if masks is not None and i < len(masks):
                    # Get mask for this detection
                    mask = masks[i]
                    
                    # Mask area (number of pixels)
                    mask_array = mask.data.cpu().numpy()[0]  # Convert to numpy
                    mask_area = int(mask_array.sum())
                    
                    # Calculate mask coverage (% of bounding box covered by mask)
                    bbox_area = (bbox[2] - bbox[0]) * (bbox[3] - bbox[1])
                    mask_coverage_pct = (mask_area / bbox_area * 100) if bbox_area > 0 else 0
                    
                    # Get polygon coordinates (simplified)
                    if hasattr(mask, 'xy') and len(mask.xy) > 0:
                        polygon = mask.xy[0].tolist()  # List of [x, y] coordinates
                        # Simplify polygon if too many points (keep every Nth point)
                        if len(polygon) > 100:
                            step = len(polygon) // 50
                            polygon = polygon[::step]
                    else:
                        polygon = None
                    
                    mask_data = {
                        'has_mask': True,
                        'mask_area': mask_area,
                        'mask_coverage_pct': mask_coverage_pct,
                        'mask_polygon': polygon,
                        'mask_shape': mask_array.shape
                    }
                else:
                    mask_data = {
                        'has_mask': False,
                        'mask_area': None,
                        'mask_coverage_pct': None,
                        'mask_polygon': None,
                        'mask_shape': None
                    }
                
                # Store individual detection
                detection_dict = {
                    'class_id': cls_id,
                    'class_name': cls_name,
                    'confidence': confidence,
                    'bbox_x1': bbox[0],
                    'bbox_y1': bbox[1],
                    'bbox_x2': bbox[2],
                    'bbox_y2': bbox[3],
                    'bbox_width': bbox[2] - bbox[0],
                    'bbox_height': bbox[3] - bbox[1],
                    'bbox_area': (bbox[2] - bbox[0]) * (bbox[3] - bbox[1]),
                }
                # Add mask data
                detection_dict.update(mask_data)
                
                all_detections.append(detection_dict)
        
        # Create comprehensive record for this image
        record = {
            # Image metadata
            'media_id': media.id,
            'filename': media.filename,
            'file_path': media.get_full_path(),
            'file_ext': media.file_ext,
            'file_size_mb': media.file_size / (1024 * 1024) if media.file_size else None,
            'created': media.created,
            'rating': media.rating,
            
            # Camera metadata
            'camera_make': media.camera_make,
            'camera_model': media.camera_model,
            'lens_model': media.lens_model,
            
            # Image dimensions
            'width': media.width,
            'height': media.height,
            'megapixels': (media.width * media.height) / 1_000_000 if media.width and media.height else None,
            
            # GPS data
            'gps_latitude': media.gps_latitude,
            'gps_longitude': media.gps_longitude,
            'gps_altitude': media.gps_altitude,
            
            # YOLO detection summary
            'total_detections': len(boxes),
            'unique_classes': len(class_counts),
            'detected_classes': ', '.join(sorted(class_counts.keys())) if class_counts else None,
            'class_counts_json': str(class_counts) if class_counts else None,
            
            # Confidence statistics
            'avg_confidence': np.mean(confidence_scores) if confidence_scores else None,
            'min_confidence': np.min(confidence_scores) if confidence_scores else None,
            'max_confidence': np.max(confidence_scores) if confidence_scores else None,
            'std_confidence': np.std(confidence_scores) if confidence_scores else None,
            
            # Mask summary
            'has_masks': masks is not None,
            'total_mask_area': sum(d['mask_area'] for d in all_detections if d['has_mask']) if all_detections else 0,
            
            # Individual detections (as nested list of dicts)
            'detections': all_detections,
        }
        
        # Add individual class counts as separate columns
        for cls_name, count in class_counts.items():
            record[f'count_{cls_name}'] = count
        
        detection_records.append(record)
        
    except FileNotFoundError as e:
        processing_errors.append({
            'media_id': media.id,
            'filename': media.filename,
            'error': f'File not found: {str(e)}'
        })
    except Exception as e:
        processing_errors.append({
            'media_id': media.id,
            'filename': media.filename,
            'error': str(e)
        })

print(f"\n{'='*60}")
print(f"✓ Processing complete!")
print(f"  Successfully processed: {len(detection_records)} images")
print(f"  Errors: {len(processing_errors)} images")
if detection_records and detection_records[0]['has_masks']:
    print(f"  ✓ Segmentation masks extracted")
print(f"{'='*60}")

if processing_errors:
    print(f"\n⚠ Errors encountered:")
    for err in processing_errors[:5]:
        print(f"  - {err['filename']}: {err['error']}")
    if len(processing_errors) > 5:
        print(f"  ... and {len(processing_errors) - 5} more errors")


## Build Comprehensive DataFrame

Convert detection records into a pandas DataFrame for analysis.

In [None]:
# Create main DataFrame (one row per image)
df = pd.DataFrame(detection_records)

print(f"DataFrame shape: {df.shape}")
print(f"Columns: {len(df.columns)}")
print(f"\nColumn names:")
for col in df.columns:
    print(f"  - {col}")

# Display first few rows
print(f"\nFirst few rows:")
df.head()


In [None]:
# Create expanded DataFrame (one row per detection)
# This "explodes" the detections column so each detection gets its own row
detection_rows = []

for idx, row in df.iterrows():
    if row['detections']:  # If there are detections
        for detection in row['detections']:
            detection_row = {
                # Image info
                'media_id': row['media_id'],
                'filename': row['filename'],
                'created': row['created'],
                'width': row['width'],
                'height': row['height'],
                
                # Detection info
                'class_id': detection['class_id'],
                'class_name': detection['class_name'],
                'confidence': detection['confidence'],
                'bbox_x1': detection['bbox_x1'],
                'bbox_y1': detection['bbox_y1'],
                'bbox_x2': detection['bbox_x2'],
                'bbox_y2': detection['bbox_y2'],
                'bbox_width': detection['bbox_width'],
                'bbox_height': detection['bbox_height'],
                'bbox_area': detection['bbox_area'],
                'bbox_area_pct': (detection['bbox_area'] / (row['width'] * row['height']) * 100) if row['width'] and row['height'] else None,
                
                # Mask info (if available)
                'has_mask': detection.get('has_mask', False),
                'mask_area': detection.get('mask_area'),
                'mask_coverage_pct': detection.get('mask_coverage_pct'),
            }
            detection_rows.append(detection_row)

df_detections = pd.DataFrame(detection_rows)

print(f"\nDetections DataFrame shape: {df_detections.shape}")
print(f"Total individual detections: {len(df_detections)}")

if 'has_mask' in df_detections.columns:
    masks_count = df_detections['has_mask'].sum()
    print(f"Detections with masks: {masks_count}")
    if masks_count > 0:
        print(f"\nMask Statistics:")
        print(f"  Average mask area: {df_detections[df_detections['has_mask']]['mask_area'].mean():.0f} pixels")
        print(f"  Average mask coverage: {df_detections[df_detections['has_mask']]['mask_coverage_pct'].mean():.1f}% of bbox")

print(f"\nFirst few detection rows:")
df_detections.head(10)


## Summary Statistics

In [None]:
print("YOLO DETECTION SUMMARY")
print("=" * 60)

print(f"\nImages Processed: {len(df)}")
print(f"Total Detections: {df['total_detections'].sum():.0f}")
print(f"Images with Detections: {(df['total_detections'] > 0).sum()}")
print(f"Images with No Detections: {(df['total_detections'] == 0).sum()}")

print(f"\nDetections per Image:")
print(f"  Mean: {df['total_detections'].mean():.2f}")
print(f"  Median: {df['total_detections'].median():.0f}")
print(f"  Max: {df['total_detections'].max():.0f}")
print(f"  Std Dev: {df['total_detections'].std():.2f}")

print(f"\nConfidence Scores:")
print(f"  Mean: {df['avg_confidence'].mean():.3f}")
print(f"  Min: {df['min_confidence'].min():.3f}")
print(f"  Max: {df['max_confidence'].max():.3f}")

# Top detected classes
if len(df_detections) > 0:
    print(f"\nTop 20 Detected Classes:")
    class_counts = df_detections['class_name'].value_counts()
    for i, (cls, count) in enumerate(class_counts.head(20).items(), 1):
        pct = (count / len(df_detections)) * 100
        print(f"  {i:2d}. {cls:20s}: {count:5d} ({pct:5.1f}%)")

print("\n" + "=" * 60)


## Visualize Detection Distribution

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# 1. Histogram of detections per image
axes[0, 0].hist(df['total_detections'], bins=30, edgecolor='black')
axes[0, 0].set_xlabel('Number of Detections')
axes[0, 0].set_ylabel('Number of Images')
axes[0, 0].set_title('Distribution of Detections per Image')
axes[0, 0].axvline(df['total_detections'].mean(), color='red', linestyle='--', label=f"Mean: {df['total_detections'].mean():.1f}")
axes[0, 0].legend()

# 2. Top 15 classes by count
if len(df_detections) > 0:
    top_classes = df_detections['class_name'].value_counts().head(15)
    axes[0, 1].barh(range(len(top_classes)), top_classes.values)
    axes[0, 1].set_yticks(range(len(top_classes)))
    axes[0, 1].set_yticklabels(top_classes.index)
    axes[0, 1].set_xlabel('Count')
    axes[0, 1].set_title('Top 15 Detected Classes')
    axes[0, 1].invert_yaxis()

# 3. Confidence score distribution
if len(df_detections) > 0:
    axes[1, 0].hist(df_detections['confidence'], bins=30, edgecolor='black')
    axes[1, 0].set_xlabel('Confidence Score')
    axes[1, 0].set_ylabel('Number of Detections')
    axes[1, 0].set_title('Distribution of Confidence Scores')
    axes[1, 0].axvline(df_detections['confidence'].mean(), color='red', linestyle='--', label=f"Mean: {df_detections['confidence'].mean():.2f}")
    axes[1, 0].legend()

# 4. Detections over time
if 'created' in df.columns:
    df_sorted = df.sort_values('created')
    axes[1, 1].plot(df_sorted['created'], df_sorted['total_detections'], alpha=0.5, marker='o', markersize=3, linestyle='-')
    axes[1, 1].set_xlabel('Date')
    axes[1, 1].set_ylabel('Total Detections')
    axes[1, 1].set_title('Detections Over Time')
    axes[1, 1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()


## Visualize Sample Detections with Bounding Boxes

Display random sample images with their YOLO detections overlaid.

In [None]:
def plot_image_with_detections(media, detections_list, ax=None):
    """
    Plot an image with YOLO detections overlaid.
    
    Args:
        media: Media object
        detections_list: List of detection dictionaries
        ax: Matplotlib axis (optional)
    """
    if ax is None:
        fig, ax = plt.subplots(1, 1, figsize=(12, 8))
    
    # Read and display image
    img_array = media.read_as_array()
    if img_array.dtype == np.uint16:
        img_array = (img_array / 256).astype(np.uint8)
    
    ax.imshow(img_array)
    
    # Draw bounding boxes
    colors = plt.cm.tab20(np.linspace(0, 1, 20))
    
    for i, det in enumerate(detections_list):
        x1, y1 = det['bbox_x1'], det['bbox_y1']
        width = det['bbox_width']
        height = det['bbox_height']
        
        # Choose color based on class
        color = colors[det['class_id'] % len(colors)]
        
        # Draw rectangle
        rect = patches.Rectangle(
            (x1, y1), width, height,
            linewidth=2, edgecolor=color, facecolor='none'
        )
        ax.add_patch(rect)
        
        # Add label
        label = f"{det['class_name']} {det['confidence']:.2f}"
        ax.text(
            x1, y1 - 5,
            label,
            color='white',
            fontsize=10,
            bbox=dict(facecolor=color, alpha=0.8, edgecolor='none', pad=2)
        )
    
    ax.axis('off')
    ax.set_title(f"{media.filename}\n{len(detections_list)} detections", fontsize=10)
    
    return ax


# Select random sample with detections
NUM_SAMPLES = 6
df_with_detections = df[df['total_detections'] > 0]

if len(df_with_detections) > 0:
    sample_df = df_with_detections.sample(min(NUM_SAMPLES, len(df_with_detections)))
    
    # Create grid
    n_cols = 3
    n_rows = (len(sample_df) + n_cols - 1) // n_cols
    
    fig, axes = plt.subplots(n_rows, n_cols, figsize=(18, 6 * n_rows))
    if n_rows == 1:
        axes = axes.reshape(1, -1)
    
    for idx, (_, row) in enumerate(sample_df.iterrows()):
        ax_row = idx // n_cols
        ax_col = idx % n_cols
        
        # Get media object
        media = session.query(Media).filter(Media.id == row['media_id']).first()
        
        if media:
            try:
                plot_image_with_detections(media, row['detections'], axes[ax_row, ax_col])
            except Exception as e:
                axes[ax_row, ax_col].text(0.5, 0.5, f"Error: {str(e)}", ha='center', va='center')
                axes[ax_row, ax_col].axis('off')
    
    # Hide unused subplots
    for idx in range(len(sample_df), n_rows * n_cols):
        ax_row = idx // n_cols
        ax_col = idx % n_cols
        axes[ax_row, ax_col].axis('off')
    
    plt.tight_layout()
    plt.show()
else:
    print("No images with detections to display.")


## Comparison: Masks Only vs. Masks + Bounding Boxes

Compare the segmentation masks with and without bounding boxes to see the precision of pixel-level detection.

In [None]:
def plot_image_with_masks(media, detections_list, ax=None, show_boxes=True, mask_alpha=0.4):
    """
    Plot an image with segmentation masks overlaid.
    
    Args:
        media: Media object
        detections_list: List of detection dictionaries (must include mask_polygon)
        ax: Matplotlib axis (optional)
        show_boxes: If True, also draw bounding boxes
        mask_alpha: Transparency of mask overlay (0-1)
    """
    if ax is None:
        fig, ax = plt.subplots(1, 1, figsize=(12, 8))
    
    # Read and display image
    img_array = media.read_as_array()
    if img_array.dtype == np.uint16:
        img_array = (img_array / 256).astype(np.uint8)
    
    ax.imshow(img_array)
    
    # Create color map
    colors = plt.cm.tab20(np.linspace(0, 1, 20))
    
    # Draw masks and boxes
    for i, det in enumerate(detections_list):
        # Choose color based on class
        color = colors[det['class_id'] % len(colors)]
        
        # Draw segmentation mask if available
        if det.get('has_mask') and det.get('mask_polygon') is not None:
            polygon = np.array(det['mask_polygon'])
            
            # Draw filled polygon with transparency
            mask_patch = patches.Polygon(
                polygon,
                closed=True,
                facecolor=color,
                edgecolor=color,
                alpha=mask_alpha,
                linewidth=2
            )
            ax.add_patch(mask_patch)
            
            # Draw mask contour (outline)
            contour = patches.Polygon(
                polygon,
                closed=True,
                facecolor='none',
                edgecolor=color,
                alpha=1.0,
                linewidth=2
            )
            ax.add_patch(contour)
        
        # Optionally draw bounding box
        if show_boxes:
            x1, y1 = det['bbox_x1'], det['bbox_y1']
            width = det['bbox_width']
            height = det['bbox_height']
            
            rect = patches.Rectangle(
                (x1, y1), width, height,
                linewidth=1, edgecolor=color, facecolor='none',
                linestyle='--', alpha=0.5
            )
            ax.add_patch(rect)
        
        # Add label at top of detection
        x1, y1 = det['bbox_x1'], det['bbox_y1']
        label = f"{det['class_name']} {det['confidence']:.2f}"
        if det.get('has_mask'):
            label += f" ({det.get('mask_coverage_pct', 0):.0f}%)"
        
        ax.text(
            x1, y1 - 5,
            label,
            color='white',
            fontsize=9,
            bbox=dict(facecolor=color, alpha=0.8, edgecolor='none', pad=2)
        )
    
    ax.axis('off')
    title = f"{media.filename}\n{len(detections_list)} detections"
    if detections_list and detections_list[0].get('has_mask'):
        masks_count = sum(1 for d in detections_list if d.get('has_mask'))
        title += f", {masks_count} with masks"
    ax.set_title(title, fontsize=10)
    
    return ax


# Select random sample with detections that have masks
NUM_SAMPLES = 6
df_with_masks = df[(df['total_detections'] > 0) & (df['has_masks'] == True)]

if len(df_with_masks) > 0:
    sample_df = df_with_masks.sample(min(NUM_SAMPLES, len(df_with_masks)))
    
    # Create grid
    n_cols = 3
    n_rows = (len(sample_df) + n_cols - 1) // n_cols
    
    fig, axes = plt.subplots(n_rows, n_cols, figsize=(18, 6 * n_rows))
    if n_rows == 1:
        axes = axes.reshape(1, -1)
    
    for idx, (_, row) in enumerate(sample_df.iterrows()):
        ax_row = idx // n_cols
        ax_col = idx % n_cols
        
        # Get media object
        media = session.query(Media).filter(Media.id == row['media_id']).first()
        
        if media:
            try:
                plot_image_with_masks(
                    media, 
                    row['detections'], 
                    axes[ax_row, ax_col],
                    show_boxes=True,  # Show both masks and boxes
                    mask_alpha=0.35
                )
            except Exception as e:
                axes[ax_row, ax_col].text(0.5, 0.5, f"Error: {str(e)}", ha='center', va='center')
                axes[ax_row, ax_col].axis('off')
    
    # Hide unused subplots
    for idx in range(len(sample_df), n_rows * n_cols):
        ax_row = idx // n_cols
        ax_col = idx % n_cols
        axes[ax_row, ax_col].axis('off')
    
    plt.tight_layout()
    plt.show()
    
    print("\n💡 Legend:")
    print("  - Colored filled areas: Segmentation masks (pixel-level object boundaries)")
    print("  - Dashed boxes: Bounding boxes (for reference)")
    print("  - Percentage in label: How much of the bounding box is covered by the mask")
else:
    print("No images with segmentation masks to display.")
    print("Make sure you're using a segmentation model (yolov8n-seg.pt)")


In [None]:
# Create side-by-side comparison: Masks Only vs Masks + Boxes
NUM_COMPARISON = 3

if len(df_with_masks) > 0:
    # Select images for comparison
    comparison_df = df_with_masks.sample(min(NUM_COMPARISON, len(df_with_masks)))
    
    # Create grid: 2 columns (masks only, masks+boxes) x N rows
    fig, axes = plt.subplots(len(comparison_df), 2, figsize=(16, 6 * len(comparison_df)))
    
    # Handle single row case
    if len(comparison_df) == 1:
        axes = axes.reshape(1, -1)
    
    for idx, (_, row) in enumerate(comparison_df.iterrows()):
        # Get media object
        media = session.query(Media).filter(Media.id == row['media_id']).first()
        
        if media:
            try:
                # Left: Masks only
                plot_image_with_masks(
                    media, 
                    row['detections'], 
                    axes[idx, 0],
                    show_boxes=False,  # No bounding boxes
                    mask_alpha=0.45
                )
                axes[idx, 0].set_title(f"{media.filename}\nSegmentation Masks Only", fontsize=10)
                
                # Right: Masks + Boxes
                plot_image_with_masks(
                    media, 
                    row['detections'], 
                    axes[idx, 1],
                    show_boxes=True,  # With bounding boxes
                    mask_alpha=0.35
                )
                axes[idx, 1].set_title(f"{media.filename}\nMasks + Bounding Boxes", fontsize=10)
                
            except Exception as e:
                for col in [0, 1]:
                    axes[idx, col].text(0.5, 0.5, f"Error: {str(e)}", ha='center', va='center')
                    axes[idx, col].axis('off')
    
    plt.tight_layout()
    plt.show()
    
    print("\n📊 Comparison Notes:")
    print("  - Left: Shows only the segmentation masks (pixel-precise object shapes)")
    print("  - Right: Shows both masks and bounding boxes (dashed lines)")
    print("  - Notice how masks follow the actual object contours, not just rectangles")
else:
    print("No images with segmentation masks available for comparison.")


## Visualize Segmentation Masks

Display sample images with pixel-level segmentation masks overlaid. The masks show the exact shape of detected objects, not just bounding boxes.

## Advanced Analysis: Detection Patterns

In [None]:
# Analyze which classes tend to appear together
if len(df_detections) > 0:
    print("CLASS CO-OCCURRENCE ANALYSIS")
    print("=" * 60)
    
    # Get top 10 classes
    top_10_classes = df_detections['class_name'].value_counts().head(10).index.tolist()
    
    # Build co-occurrence matrix (integers while counting)
    cooccurrence = pd.DataFrame(0, index=top_10_classes, columns=top_10_classes)
    
    for _, row in df.iterrows():
        if row['detections']:
            classes_in_image = [d['class_name'] for d in row['detections'] if d['class_name'] in top_10_classes]
            # Count co-occurrences
            for i, cls1 in enumerate(classes_in_image):
                for cls2 in classes_in_image[i:]:
                    cooccurrence.loc[cls1, cls2] += 1
                    if cls1 != cls2:
                        cooccurrence.loc[cls2, cls1] += 1
    
    # Zero out diagonal (class co-occurring with itself)
    np.fill_diagonal(cooccurrence.values, 0)
    
    # Create mask for upper triangular part (including diagonal)
    mask = np.triu(np.ones_like(cooccurrence, dtype=bool))
    
    # Create a masked (float) version for visualization so NaN assignment is allowed
    cooccurrence_upper = cooccurrence.astype(float).copy()
    cooccurrence_upper.values[~mask] = np.nan  # Set lower triangle to NaN
    
    # Visualize co-occurrence matrix (upper triangle only)
    plt.figure(figsize=(12, 10))
    
    # Use masked array to hide lower triangle
    im = plt.imshow(cooccurrence_upper.values, cmap='YlOrRd', aspect='auto')
    plt.colorbar(im, label='Co-occurrence Count')
    plt.xticks(range(len(top_10_classes)), top_10_classes, rotation=45, ha='right')
    plt.yticks(range(len(top_10_classes)), top_10_classes)
    plt.title('Class Co-occurrence Matrix (Top 10 Classes)\nUpper Triangle Only - Diagonal Zeroed')
    
    # Add text annotations only for upper triangle
    for i in range(len(top_10_classes)):
        for j in range(len(top_10_classes)):
            if mask[i, j]:  # Only annotate upper triangle
                value = cooccurrence.iloc[i, j]  # Original integer counts
                # Use pandas isna check to safely handle types
                if pd.notna(value) and value > 0:  # Check for valid, non-zero values
                    plt.text(j, i, int(value),
                             ha="center", va="center", color="black", fontsize=9)
    
    plt.tight_layout()
    plt.show()
    
    print("\nMost Common Class Pairs:")
    pairs = []
    for i, cls1 in enumerate(top_10_classes):
        for j, cls2 in enumerate(top_10_classes[i+1:], i+1):
            count = cooccurrence.loc[cls1, cls2]
            if count > 0:
                pairs.append((cls1, cls2, int(count)))
    
    pairs.sort(key=lambda x: x[2], reverse=True)
    for i, (cls1, cls2, count) in enumerate(pairs[:15], 1):
        print(f"  {i:2d}. {cls1:15s} + {cls2:15s}: {count:3d} images")


## Export DataFrames for Further Analysis

In [None]:
# Optional: Export to CSV for external analysis
EXPORT_CSV = False  # Set to True to export

if EXPORT_CSV:
    output_dir = Path('../output')
    output_dir.mkdir(exist_ok=True)
    
    # Export main DataFrame (without nested detections)
    df_export = df.drop(columns=['detections'])
    df_export.to_csv(output_dir / 'yolo_analysis_images.csv', index=False)
    print(f"✓ Exported image-level data to: {output_dir / 'yolo_analysis_images.csv'}")
    
    # Export detections DataFrame
    df_detections.to_csv(output_dir / 'yolo_analysis_detections.csv', index=False)
    print(f"✓ Exported detection-level data to: {output_dir / 'yolo_analysis_detections.csv'}")
else:
    print("CSV export disabled. Set EXPORT_CSV = True to enable.")


## Summary: DataFrames Available

After running this notebook, you have access to:

1. **`df`** - Main DataFrame with one row per image
   - All image metadata (filename, date, camera, GPS, etc.)
   - YOLO detection summaries (total detections, classes, confidence stats)
   - Nested `detections` column with full detection data

2. **`df_detections`** - Expanded DataFrame with one row per detection
   - Individual detection details (class, confidence, bounding box)
   - Linked to source image via `media_id`

Use these DataFrames to experiment with features you want to add to the database!

## Cleanup

In [None]:
# Close database connection
if session:
    session.close()
    print("Database session closed.")
