# Introduction to Semantic Segmentation for Autonomous Driving

This notebook introduces the fundamentals of semantic segmentation in the context of autonomous driving. We'll explore basic concepts, load sample data, and implement a simple segmentation pipeline.

## Learning Objectives
- Understand the role of semantic segmentation in autonomous driving
- Load and visualize road scene images with annotations
- Implement basic image preprocessing for segmentation
- Evaluate segmentation performance using IoU metrics

In [None]:
# Import required libraries
import numpy as np
import matplotlib.pyplot as plt
import cv2
from PIL import Image
import torch
import torchvision.transforms as transforms
from sklearn.metrics import confusion_matrix
import seaborn as sns

# Set random seed for reproducibility
np.random.seed(42)
torch.manual_seed(42)

print("Libraries imported successfully!")

## 1. Understanding Semantic Segmentation

Semantic segmentation assigns a class label to every pixel in an image. In autonomous driving, common classes include:

- **Road**: Drivable surface
- **Vehicle**: Cars, trucks, buses
- **Person**: Pedestrians, cyclists
- **Traffic Sign**: Road signs and signals
- **Sky**: Background sky region
- **Building**: Structures and buildings

In [None]:
# Define class colors for visualization
CLASS_COLORS = {
    0: [128, 64, 128],   # Road - purple
    1: [244, 35, 232],   # Sidewalk - pink
    2: [70, 70, 70],     # Building - dark gray
    3: [102, 102, 156],  # Wall - blue-gray
    4: [190, 153, 153],  # Fence - light brown
    5: [153, 153, 153],  # Pole - gray
    6: [250, 170, 30],   # Traffic sign - orange
    7: [220, 220, 0],    # Traffic light - yellow
    8: [107, 142, 35],   # Vegetation - green
    9: [152, 251, 152],  # Terrain - light green
    10: [70, 130, 180],  # Sky - blue
    11: [220, 20, 60],   # Person - red
    12: [255, 0, 0],     # Rider - bright red
    13: [0, 0, 142],     # Car - dark blue
    14: [0, 0, 70],      # Truck - darker blue
    15: [0, 60, 100],    # Bus - blue
    16: [0, 80, 100],    # Train - blue
    17: [0, 0, 230],     # Motorcycle - bright blue
    18: [119, 11, 32]    # Bicycle - dark red
}

CLASS_NAMES = [
    'road', 'sidewalk', 'building', 'wall', 'fence', 'pole',
    'traffic_sign', 'traffic_light', 'vegetation', 'terrain',
    'sky', 'person', 'rider', 'car', 'truck', 'bus',
    'train', 'motorcycle', 'bicycle'
]

print(f"Number of classes: {len(CLASS_NAMES)}")
print("Classes:", CLASS_NAMES)

## 2. Image Loading and Preprocessing

Let's create some synthetic road scene data to demonstrate the concepts:

In [None]:
def create_synthetic_road_scene(width=640, height=480):
    """Create a synthetic road scene with basic elements"""
    # Create base image
    image = np.zeros((height, width, 3), dtype=np.uint8)
    mask = np.zeros((height, width), dtype=np.uint8)
    
    # Sky (top portion)
    sky_height = height // 3
    image[:sky_height, :] = [135, 206, 235]  # Sky blue
    mask[:sky_height, :] = 10  # Sky class
    
    # Road (bottom portion)
    road_start = int(height * 0.6)
    image[road_start:, :] = [64, 64, 64]  # Dark gray road
    mask[road_start:, :] = 0  # Road class
    
    # Buildings (middle portion)
    building_start = sky_height
    building_end = road_start
    image[building_start:building_end, :width//3] = [139, 69, 19]  # Brown building
    mask[building_start:building_end, :width//3] = 2  # Building class
    
    image[building_start:building_end, 2*width//3:] = [169, 169, 169]  # Gray building
    mask[building_start:building_end, 2*width//3:] = 2  # Building class
    
    # Add some vehicles (simple rectangles)
    # Car 1
    cv2.rectangle(image, (200, 350), (300, 400), (0, 0, 142), -1)
    cv2.rectangle(mask, (200, 350), (300, 400), 13, -1)  # Car class
    
    # Car 2
    cv2.rectangle(image, (450, 380), (550, 430), (255, 0, 0), -1)
    cv2.rectangle(mask, (450, 380), (550, 430), 13, -1)  # Car class
    
    # Add lane markings
    cv2.line(image, (width//2-20, road_start), (width//2-20, height), (255, 255, 255), 3)
    cv2.line(image, (width//2+20, road_start), (width//2+20, height), (255, 255, 255), 3)
    
    return image, mask

# Create synthetic data
sample_image, sample_mask = create_synthetic_road_scene()

# Display the synthetic scene
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

axes[0].imshow(sample_image)
axes[0].set_title('Synthetic Road Scene')
axes[0].axis('off')

axes[1].imshow(sample_mask, cmap='tab20')
axes[1].set_title('Ground Truth Segmentation')
axes[1].axis('off')

plt.tight_layout()
plt.show()

print(f"Image shape: {sample_image.shape}")
print(f"Mask shape: {sample_mask.shape}")
print(f"Unique classes in mask: {np.unique(sample_mask)}")

## 3. Segmentation Metrics

The most important metric for segmentation is Intersection over Union (IoU), also known as the Jaccard Index:

In [None]:
def calculate_iou(pred_mask, true_mask, num_classes=19):
    """Calculate IoU for each class and mean IoU"""
    ious = []
    
    for class_id in range(num_classes):
        # True positives, false positives, false negatives
        tp = np.sum((pred_mask == class_id) & (true_mask == class_id))
        fp = np.sum((pred_mask == class_id) & (true_mask != class_id))
        fn = np.sum((pred_mask != class_id) & (true_mask == class_id))
        
        if tp + fp + fn == 0:
            iou = 1.0  # Perfect score if class is not present
        else:
            iou = tp / (tp + fp + fn)
        
        ious.append(iou)
    
    mean_iou = np.mean(ious)
    return ious, mean_iou

def calculate_pixel_accuracy(pred_mask, true_mask):
    """Calculate overall pixel accuracy"""
    correct_pixels = np.sum(pred_mask == true_mask)
    total_pixels = pred_mask.size
    return correct_pixels / total_pixels

# Create a noisy prediction for demonstration
noisy_prediction = sample_mask.copy()
# Add some noise
noise_mask = np.random.random(sample_mask.shape) < 0.1
noisy_prediction[noise_mask] = np.random.randint(0, 19, np.sum(noise_mask))

# Calculate metrics
ious, mean_iou = calculate_iou(noisy_prediction, sample_mask)
pixel_acc = calculate_pixel_accuracy(noisy_prediction, sample_mask)

print(f"Mean IoU: {mean_iou:.4f}")
print(f"Pixel Accuracy: {pixel_acc:.4f}")

# Display per-class IoU
plt.figure(figsize=(12, 6))
present_classes = np.unique(sample_mask)
present_ious = [ious[i] for i in present_classes]
present_names = [CLASS_NAMES[i] for i in present_classes]

plt.bar(present_names, present_ious)
plt.title('Per-Class IoU Scores')
plt.ylabel('IoU Score')
plt.xticks(rotation=45)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 4. Basic Image Preprocessing

Before feeding images to segmentation models, we typically apply preprocessing steps:

In [None]:
def preprocess_image(image, target_size=(512, 512)):
    """Preprocess image for segmentation model"""
    # Resize image
    resized = cv2.resize(image, target_size)
    
    # Normalize to [0, 1]
    normalized = resized.astype(np.float32) / 255.0
    
    # Apply ImageNet normalization (common for pretrained models)
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    
    standardized = (normalized - mean) / std
    
    return standardized

def augment_image(image, mask):
    """Apply data augmentation"""
    # Random horizontal flip
    if np.random.random() > 0.5:
        image = np.fliplr(image)
        mask = np.fliplr(mask)
    
    # Random brightness adjustment
    brightness_factor = np.random.uniform(0.8, 1.2)
    image = np.clip(image * brightness_factor, 0, 255).astype(np.uint8)
    
    return image, mask

# Demonstrate preprocessing
preprocessed = preprocess_image(sample_image)
augmented_img, augmented_mask = augment_image(sample_image.copy(), sample_mask.copy())

# Visualize preprocessing steps
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

# Original
axes[0, 0].imshow(sample_image)
axes[0, 0].set_title('Original Image')
axes[0, 0].axis('off')

axes[1, 0].imshow(sample_mask, cmap='tab20')
axes[1, 0].set_title('Original Mask')
axes[1, 0].axis('off')

# Preprocessed
# Denormalize for visualization
denorm = (preprocessed * np.array([0.229, 0.224, 0.225]) + np.array([0.485, 0.456, 0.406]))
denorm = np.clip(denorm, 0, 1)

axes[0, 1].imshow(denorm)
axes[0, 1].set_title('Preprocessed Image')
axes[0, 1].axis('off')

axes[1, 1].imshow(cv2.resize(sample_mask, (512, 512)), cmap='tab20')
axes[1, 1].set_title('Resized Mask')
axes[1, 1].axis('off')

# Augmented
axes[0, 2].imshow(augmented_img)
axes[0, 2].set_title('Augmented Image')
axes[0, 2].axis('off')

axes[1, 2].imshow(augmented_mask, cmap='tab20')
axes[1, 2].set_title('Augmented Mask')
axes[1, 2].axis('off')

plt.tight_layout()
plt.show()

## 5. Simple Segmentation Algorithm

Let's implement a basic color-based segmentation as a starting point:

In [None]:
def simple_color_segmentation(image):
    """Simple color-based segmentation"""
    h, w = image.shape[:2]
    segmentation = np.zeros((h, w), dtype=np.uint8)
    
    # Convert to HSV for better color segmentation
    hsv = cv2.cvtColor(image, cv2.COLOR_RGB2HSV)
    
    # Sky detection (blue colors in upper part)
    sky_mask = (hsv[:, :, 2] > 100) & (hsv[:, :, 1] < 100)  # Bright, low saturation
    sky_mask[:h//2, :] = sky_mask[:h//2, :] | (hsv[:h//2, :, 0] > 90) & (hsv[:h//2, :, 0] < 130)
    segmentation[sky_mask] = 10  # Sky class
    
    # Road detection (dark colors in lower part)
    road_mask = (hsv[h//2:, :, 2] < 100) & (hsv[h//2:, :, 1] < 50)  # Dark, low saturation
    segmentation[h//2:, :][road_mask] = 0  # Road class
    
    # Vehicle detection (distinct colors)
    vehicle_mask = (hsv[:, :, 1] > 150) | ((hsv[:, :, 0] < 15) | (hsv[:, :, 0] > 165))  # High saturation or red/blue
    vehicle_mask = vehicle_mask & (hsv[:, :, 2] > 50)  # Not too dark
    segmentation[vehicle_mask] = 13  # Car class
    
    # Building detection (everything else in middle region)
    middle_region = slice(h//4, 3*h//4)
    building_mask = (segmentation[middle_region, :] == 0)
    segmentation[middle_region, :][building_mask] = 2  # Building class
    
    return segmentation

# Apply simple segmentation
simple_seg = simple_color_segmentation(sample_image)

# Compare with ground truth
fig, axes = plt.subplots(1, 3, figsize=(18, 6))

axes[0].imshow(sample_image)
axes[0].set_title('Original Image')
axes[0].axis('off')

axes[1].imshow(sample_mask, cmap='tab20')
axes[1].set_title('Ground Truth')
axes[1].axis('off')

axes[2].imshow(simple_seg, cmap='tab20')
axes[2].set_title('Simple Color Segmentation')
axes[2].axis('off')

plt.tight_layout()
plt.show()

# Calculate performance
simple_ious, simple_mean_iou = calculate_iou(simple_seg, sample_mask)
simple_pixel_acc = calculate_pixel_accuracy(simple_seg, sample_mask)

print(f"Simple Segmentation Mean IoU: {simple_mean_iou:.4f}")
print(f"Simple Segmentation Pixel Accuracy: {simple_pixel_acc:.4f}")

## 6. Confusion Matrix Analysis

Let's analyze the segmentation performance using confusion matrices:

In [None]:
def plot_confusion_matrix(y_true, y_pred, class_names, title='Confusion Matrix'):
    """Plot confusion matrix for segmentation results"""
    # Flatten arrays
    y_true_flat = y_true.flatten()
    y_pred_flat = y_pred.flatten()
    
    # Get unique classes present in the data
    present_classes = np.unique(np.concatenate([y_true_flat, y_pred_flat]))
    present_class_names = [class_names[i] for i in present_classes]
    
    # Calculate confusion matrix
    cm = confusion_matrix(y_true_flat, y_pred_flat, labels=present_classes)
    
    # Normalize confusion matrix
    cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
    cm_normalized = np.nan_to_num(cm_normalized)
    
    # Plot
    plt.figure(figsize=(10, 8))
    sns.heatmap(cm_normalized, 
                xticklabels=present_class_names,
                yticklabels=present_class_names,
                annot=True, 
                fmt='.2f',
                cmap='Blues')
    plt.title(title)
    plt.ylabel('True Label')
    plt.xlabel('Predicted Label')
    plt.xticks(rotation=45)
    plt.yticks(rotation=0)
    plt.tight_layout()
    plt.show()
    
    return cm, cm_normalized

# Plot confusion matrix for simple segmentation
cm, cm_norm = plot_confusion_matrix(sample_mask, simple_seg, CLASS_NAMES, 
                                   'Simple Color Segmentation Confusion Matrix')

## 7. Next Steps and Exercises

This notebook provided a foundation for understanding semantic segmentation in autonomous driving. Here are some exercises to extend your learning:

### Exercise 1: Improve Color Segmentation
- Enhance the simple color segmentation algorithm
- Add more sophisticated color ranges
- Use morphological operations for noise reduction

### Exercise 2: Dataset Analysis
- Load real automotive datasets (Cityscapes, KITTI)
- Analyze class distributions and imbalances
- Implement data augmentation strategies

### Exercise 3: Performance Metrics
- Implement additional metrics (Dice coefficient, F1-score)
- Analyze per-class performance in detail
- Compare different segmentation approaches

### Exercise 4: Preprocessing Pipeline
- Build a complete preprocessing pipeline
- Implement various augmentation techniques
- Optimize for model training efficiency

In [None]:
# Summary of key concepts
print("Key Concepts Covered:")
print("1. Semantic segmentation for autonomous driving")
print("2. Class definitions and color coding")
print("3. IoU and pixel accuracy metrics")
print("4. Image preprocessing and augmentation")
print("5. Simple color-based segmentation")
print("6. Confusion matrix analysis")
print("\nNext: Proceed to U-Net implementation notebook")