<!--
Copyright (c) 2025 Milin Patel
Hochschule Kempten - University of Applied Sciences

Autonomous Driving: AI Safety and Security Workshop
This project is licensed under the MIT License.
See LICENSE file in the root directory for full license text.
-->

*Copyright ¬© 2025 Milin Patel. All Rights Reserved.*

# Notebook 16: Monte Carlo Dropout and Deep Ensembles

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/milinpatel07/Autonomous-Driving_AI-Safety-and-Security/blob/main/AV_Perception_Safety_Workshop/Session_4_Uncertainty_Estimation_and_Validation/notebooks/16_MC_Dropout_and_Ensembles.ipynb)

**Session 4: Uncertainty Estimation and Validation**  

## Learning Objectives
- Understand Monte Carlo (MC) Dropout for uncertainty estimation
- Implement Deep Ensembles for robust predictions
- Compare different uncertainty quantification methods
- Apply to AV perception tasks
- Analyze computational costs vs accuracy trade-offs

---

## Introduction

In the previous notebook, we learned about uncertainty types. Now we'll learn **how to quantify** them.

**Two practical methods:**
1. **Monte Carlo Dropout:** Cheap, single model, approximate Bayesian inference
2. **Deep Ensembles:** More expensive, multiple models, strong empirical performance

Both are widely used in production AV systems!

In [None]:
# Setup
!pip install -q torch torchvision matplotlib seaborn numpy scipy scikit-learn

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import TensorDataset, DataLoader
from sklearn.datasets import make_moons, make_classification
from sklearn.model_selection import train_test_split
import time
from copy import deepcopy

sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 6)

# Set random seeds
np.random.seed(42)
torch.manual_seed(42)

# Device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

## 1. Monte Carlo Dropout

### 1.1 Theory

**Key Insight (Gal & Ghahramani, 2016):** 
- Dropout during training = approximate Bayesian inference
- Enable dropout at test time = sample from approximate posterior
- Multiple forward passes = Monte Carlo sampling

**Mathematical intuition:**
```
Regular prediction:    y = f(x; Œ∏)
MC Dropout:            y‚ÇÅ, y‚ÇÇ, ..., y‚Çô = f(x; Œ∏ * mask‚ÇÅ), f(x; Œ∏ * mask‚ÇÇ), ..., f(x; Œ∏ * mask‚Çô)
Mean prediction:       ≈∑ = (1/N) Œ£ y·µ¢
Uncertainty:           œÉ¬≤ = (1/N) Œ£ (y·µ¢ - ≈∑)¬≤
```

**Advantages:**
- ‚úÖ Single model (cheap to train)
- ‚úÖ Easy to implement
- ‚úÖ Captures epistemic uncertainty
- ‚úÖ Can be added to existing models

**Disadvantages:**
- ‚ùå Slower inference (N forward passes)
- ‚ùå Approximation quality depends on dropout rate
- ‚ùå May underestimate uncertainty

### 1.2 Implementation

In [None]:
class MCDropoutClassifier(nn.Module):
    """Neural network with MC Dropout for uncertainty estimation."""
    
    def __init__(self, input_dim=2, hidden_dim=100, num_classes=2, dropout_rate=0.2):
        super().__init__()
        self.dropout_rate = dropout_rate
        
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.dropout1 = nn.Dropout(dropout_rate)
        
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)
        self.dropout2 = nn.Dropout(dropout_rate)
        
        self.fc3 = nn.Linear(hidden_dim, hidden_dim)
        self.dropout3 = nn.Dropout(dropout_rate)
        
        self.fc4 = nn.Linear(hidden_dim, num_classes)
        
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.dropout1(x)
        
        x = F.relu(self.fc2(x))
        x = self.dropout2(x)
        
        x = F.relu(self.fc3(x))
        x = self.dropout3(x)
        
        x = self.fc4(x)
        return x
    
    def predict_with_uncertainty(self, x, n_samples=50):
        """
        Predict with uncertainty using MC Dropout.
        
        Args:
            x: Input tensor
            n_samples: Number of MC samples
            
        Returns:
            mean_probs: Mean class probabilities
            std_probs: Standard deviation of probabilities (epistemic uncertainty)
            entropy: Predictive entropy (total uncertainty)
        """
        self.train()  # Enable dropout!
        
        predictions = []
        with torch.no_grad():
            for _ in range(n_samples):
                logits = self.forward(x)
                probs = F.softmax(logits, dim=1)
                predictions.append(probs)
        
        # Stack predictions: (n_samples, batch_size, num_classes)
        predictions = torch.stack(predictions)
        
        # Mean prediction
        mean_probs = predictions.mean(dim=0)
        
        # Epistemic uncertainty (variance)
        std_probs = predictions.std(dim=0)
        
        # Predictive entropy (total uncertainty)
        entropy = -torch.sum(mean_probs * torch.log(mean_probs + 1e-10), dim=1)
        
        return mean_probs, std_probs, entropy

print("MC Dropout Classifier defined!")

In [None]:
# Create dataset with OOD region
def create_classification_dataset():
    """Create 2D classification dataset."""
    # Training data: moons dataset
    X, y = make_moons(n_samples=300, noise=0.2, random_state=42)
    
    # Create OOD region (far from training data)
    X_ood = np.random.uniform(-3, -1.5, size=(50, 2))
    
    return X, y, X_ood

X_train, y_train, X_ood = create_classification_dataset()

# Visualize
plt.figure(figsize=(10, 6))
scatter = plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap='viridis', 
                     s=50, alpha=0.6, edgecolors='black', label='Training data')
plt.scatter(X_ood[:, 0], X_ood[:, 1], c='red', marker='x', s=100, 
           linewidths=3, label='OOD region')
plt.colorbar(scatter, label='Class')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Classification Dataset with OOD Region')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# Train MC Dropout model
def train_model(model, X_train, y_train, epochs=500, lr=0.01):
    """Train a classification model."""
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    criterion = nn.CrossEntropyLoss()
    
    # Convert to tensors
    X_tensor = torch.FloatTensor(X_train).to(device)
    y_tensor = torch.LongTensor(y_train).to(device)
    
    model.to(device)
    model.train()
    
    losses = []
    for epoch in range(epochs):
        optimizer.zero_grad()
        outputs = model(X_tensor)
        loss = criterion(outputs, y_tensor)
        loss.backward()
        optimizer.step()
        losses.append(loss.item())
        
        if (epoch + 1) % 100 == 0:
            print(f"Epoch {epoch+1}/{epochs}, Loss: {loss.item():.4f}")
    
    return losses

# Train MC Dropout model
print("Training MC Dropout model...")
mc_model = MCDropoutClassifier(input_dim=2, hidden_dim=100, num_classes=2, dropout_rate=0.2)
losses = train_model(mc_model, X_train, y_train, epochs=500)

plt.figure(figsize=(10, 4))
plt.plot(losses)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('MC Dropout Training Loss')
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# Visualize MC Dropout uncertainty
def visualize_mc_dropout_uncertainty(model, X_train, y_train, X_ood, n_samples=50):
    """Visualize predictions and uncertainty from MC Dropout."""
    
    # Create grid
    x_min, x_max = X_train[:, 0].min() - 1, X_train[:, 0].max() + 1
    y_min, y_max = X_train[:, 1].min() - 1, X_train[:, 1].max() + 1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                        np.linspace(y_min, y_max, 100))
    
    grid_points = np.c_[xx.ravel(), yy.ravel()]
    grid_tensor = torch.FloatTensor(grid_points).to(device)
    
    # Get predictions with uncertainty
    mean_probs, std_probs, entropy = model.predict_with_uncertainty(grid_tensor, n_samples=n_samples)
    mean_probs = mean_probs.cpu().numpy()
    std_probs = std_probs.cpu().numpy()
    entropy = entropy.cpu().numpy()
    
    # Reshape
    pred_class = mean_probs[:, 1].reshape(xx.shape)
    uncertainty = std_probs[:, 1].reshape(xx.shape)
    entropy_map = entropy.reshape(xx.shape)
    
    # Plot
    fig, axes = plt.subplots(1, 3, figsize=(18, 5))
    
    # 1. Predictions
    ax = axes[0]
    contour = ax.contourf(xx, yy, pred_class, levels=20, cmap='viridis', alpha=0.6)
    ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap='viridis', 
              s=50, edgecolors='black', linewidth=1.5)
    ax.scatter(X_ood[:, 0], X_ood[:, 1], c='red', marker='x', s=100, linewidths=3)
    plt.colorbar(contour, ax=ax, label='P(Class 1)')
    ax.set_xlabel('Feature 1')
    ax.set_ylabel('Feature 2')
    ax.set_title('Mean Predictions (50 MC samples)')
    ax.grid(True, alpha=0.3)
    
    # 2. Epistemic uncertainty (std)
    ax = axes[1]
    contour = ax.contourf(xx, yy, uncertainty, levels=20, cmap='Reds', alpha=0.8)
    ax.scatter(X_train[:, 0], X_train[:, 1], c='blue', s=30, alpha=0.3)
    ax.scatter(X_ood[:, 0], X_ood[:, 1], c='black', marker='x', s=100, linewidths=3)
    plt.colorbar(contour, ax=ax, label='Std Dev')
    ax.set_xlabel('Feature 1')
    ax.set_ylabel('Feature 2')
    ax.set_title('Epistemic Uncertainty (Std)\nüî• High in OOD region!')
    ax.grid(True, alpha=0.3)
    
    # 3. Predictive entropy
    ax = axes[2]
    contour = ax.contourf(xx, yy, entropy_map, levels=20, cmap='OrRd', alpha=0.8)
    ax.scatter(X_train[:, 0], X_train[:, 1], c='blue', s=30, alpha=0.3)
    ax.scatter(X_ood[:, 0], X_ood[:, 1], c='black', marker='x', s=100, linewidths=3)
    plt.colorbar(contour, ax=ax, label='Entropy')
    ax.set_xlabel('Feature 1')
    ax.set_ylabel('Feature 2')
    ax.set_title('Predictive Entropy (Total Uncertainty)')
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    print("\nüéØ Key Observations:")
    print("1. Uncertainty is HIGH in OOD region (red X's) ‚úÖ")
    print("2. Uncertainty is HIGH at decision boundary ‚úÖ")
    print("3. Uncertainty is LOW in well-covered regions ‚úÖ")
    print("\n‚Üí MC Dropout successfully detects epistemic uncertainty!")

visualize_mc_dropout_uncertainty(mc_model, X_train, y_train, X_ood, n_samples=50)

## 2. Deep Ensembles

### 2.1 Theory

**Key Idea (Lakshminarayanan et al., 2017):**
- Train M independent models with different initializations
- Each model captures different aspects of the data
- Disagreement between models = uncertainty

**Mathematical formulation:**
```
Train M models:     f‚ÇÅ(x), f‚ÇÇ(x), ..., f‚Çò(x)
Mean prediction:    ≈∑ = (1/M) Œ£ f·µ¢(x)
Epistemic unc.:     œÉ¬≤_epistemic = (1/M) Œ£ (f·µ¢(x) - ≈∑)¬≤
Aleatoric unc.:     œÉ¬≤_aleatoric = (1/M) Œ£ œÉ·µ¢¬≤(x)  [if models predict variance]
```

**Advantages:**
- ‚úÖ Strong empirical performance
- ‚úÖ Can capture both aleatoric and epistemic
- ‚úÖ Simple and robust
- ‚úÖ State-of-the-art in many benchmarks

**Disadvantages:**
- ‚ùå Expensive (M times the training cost)
- ‚ùå M times the memory for deployment
- ‚ùå M times slower inference

**Production tip:** Often M=5 is enough for good results!

### 2.2 Implementation

In [None]:
class StandardClassifier(nn.Module):
    """Standard neural network for ensemble."""
    
    def __init__(self, input_dim=2, hidden_dim=100, num_classes=2):
        super().__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)
        self.fc3 = nn.Linear(hidden_dim, hidden_dim)
        self.fc4 = nn.Linear(hidden_dim, num_classes)
        
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        return self.fc4(x)

class DeepEnsemble:
    """Deep Ensemble for uncertainty estimation."""
    
    def __init__(self, num_models=5, input_dim=2, hidden_dim=100, num_classes=2):
        self.num_models = num_models
        self.models = []
        
        for i in range(num_models):
            # Different random initialization for each model
            torch.manual_seed(42 + i)
            model = StandardClassifier(input_dim, hidden_dim, num_classes)
            self.models.append(model)
    
    def train_ensemble(self, X_train, y_train, epochs=500, lr=0.01):
        """Train all models in the ensemble."""
        print(f"Training ensemble of {self.num_models} models...")
        
        for i, model in enumerate(self.models):
            print(f"\nTraining model {i+1}/{self.num_models}...")
            losses = train_model(model, X_train, y_train, epochs=epochs, lr=lr)
            print(f"Final loss: {losses[-1]:.4f}")
    
    def predict_with_uncertainty(self, x):
        """
        Predict with uncertainty using ensemble.
        
        Args:
            x: Input tensor
            
        Returns:
            mean_probs: Mean class probabilities
            std_probs: Standard deviation (epistemic uncertainty)
            entropy: Predictive entropy
        """
        predictions = []
        
        for model in self.models:
            model.eval()
            with torch.no_grad():
                logits = model(x)
                probs = F.softmax(logits, dim=1)
                predictions.append(probs)
        
        # Stack predictions: (num_models, batch_size, num_classes)
        predictions = torch.stack(predictions)
        
        # Mean prediction
        mean_probs = predictions.mean(dim=0)
        
        # Epistemic uncertainty (variance across models)
        std_probs = predictions.std(dim=0)
        
        # Predictive entropy
        entropy = -torch.sum(mean_probs * torch.log(mean_probs + 1e-10), dim=1)
        
        return mean_probs, std_probs, entropy

print("Deep Ensemble class defined!")

In [None]:
# Train ensemble
ensemble = DeepEnsemble(num_models=5, input_dim=2, hidden_dim=100, num_classes=2)
ensemble.train_ensemble(X_train, y_train, epochs=500, lr=0.01)

In [None]:
# Visualize ensemble uncertainty
def visualize_ensemble_uncertainty(ensemble, X_train, y_train, X_ood):
    """Visualize predictions and uncertainty from ensemble."""
    
    # Create grid
    x_min, x_max = X_train[:, 0].min() - 1, X_train[:, 0].max() + 1
    y_min, y_max = X_train[:, 1].min() - 1, X_train[:, 1].max() + 1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                        np.linspace(y_min, y_max, 100))
    
    grid_points = np.c_[xx.ravel(), yy.ravel()]
    grid_tensor = torch.FloatTensor(grid_points).to(device)
    
    # Get predictions with uncertainty
    mean_probs, std_probs, entropy = ensemble.predict_with_uncertainty(grid_tensor)
    mean_probs = mean_probs.cpu().numpy()
    std_probs = std_probs.cpu().numpy()
    entropy = entropy.cpu().numpy()
    
    # Reshape
    pred_class = mean_probs[:, 1].reshape(xx.shape)
    uncertainty = std_probs[:, 1].reshape(xx.shape)
    entropy_map = entropy.reshape(xx.shape)
    
    # Plot
    fig, axes = plt.subplots(1, 3, figsize=(18, 5))
    
    # 1. Predictions
    ax = axes[0]
    contour = ax.contourf(xx, yy, pred_class, levels=20, cmap='viridis', alpha=0.6)
    ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap='viridis', 
              s=50, edgecolors='black', linewidth=1.5)
    ax.scatter(X_ood[:, 0], X_ood[:, 1], c='red', marker='x', s=100, linewidths=3)
    plt.colorbar(contour, ax=ax, label='P(Class 1)')
    ax.set_xlabel('Feature 1')
    ax.set_ylabel('Feature 2')
    ax.set_title(f'Ensemble Mean Predictions ({ensemble.num_models} models)')
    ax.grid(True, alpha=0.3)
    
    # 2. Epistemic uncertainty
    ax = axes[1]
    contour = ax.contourf(xx, yy, uncertainty, levels=20, cmap='Reds', alpha=0.8)
    ax.scatter(X_train[:, 0], X_train[:, 1], c='blue', s=30, alpha=0.3)
    ax.scatter(X_ood[:, 0], X_ood[:, 1], c='black', marker='x', s=100, linewidths=3)
    plt.colorbar(contour, ax=ax, label='Std Dev')
    ax.set_xlabel('Feature 1')
    ax.set_ylabel('Feature 2')
    ax.set_title('Epistemic Uncertainty (Model Disagreement)\nüî• High in OOD region!')
    ax.grid(True, alpha=0.3)
    
    # 3. Predictive entropy
    ax = axes[2]
    contour = ax.contourf(xx, yy, entropy_map, levels=20, cmap='OrRd', alpha=0.8)
    ax.scatter(X_train[:, 0], X_train[:, 1], c='blue', s=30, alpha=0.3)
    ax.scatter(X_ood[:, 0], X_ood[:, 1], c='black', marker='x', s=100, linewidths=3)
    plt.colorbar(contour, ax=ax, label='Entropy')
    ax.set_xlabel('Feature 1')
    ax.set_ylabel('Feature 2')
    ax.set_title('Predictive Entropy (Total Uncertainty)')
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

visualize_ensemble_uncertainty(ensemble, X_train, y_train, X_ood)

## 3. Comparison: MC Dropout vs Ensembles

Let's compare both methods side-by-side.

In [None]:
def compare_uncertainty_methods(mc_model, ensemble, X_train, y_train, X_ood):
    """Compare MC Dropout and Ensemble uncertainty estimates."""
    
    # Create test points
    # 1. In-distribution points
    in_dist_idx = np.random.choice(len(X_train), 100, replace=False)
    X_in = X_train[in_dist_idx]
    
    # 2. OOD points
    X_out = X_ood
    
    # Get uncertainties
    # MC Dropout
    X_in_tensor = torch.FloatTensor(X_in).to(device)
    X_out_tensor = torch.FloatTensor(X_out).to(device)
    
    _, mc_std_in, mc_ent_in = mc_model.predict_with_uncertainty(X_in_tensor, n_samples=50)
    _, mc_std_out, mc_ent_out = mc_model.predict_with_uncertainty(X_out_tensor, n_samples=50)
    
    mc_unc_in = mc_std_in[:, 1].cpu().numpy()
    mc_unc_out = mc_std_out[:, 1].cpu().numpy()
    
    # Ensemble
    _, ens_std_in, ens_ent_in = ensemble.predict_with_uncertainty(X_in_tensor)
    _, ens_std_out, ens_ent_out = ensemble.predict_with_uncertainty(X_out_tensor)
    
    ens_unc_in = ens_std_in[:, 1].cpu().numpy()
    ens_unc_out = ens_std_out[:, 1].cpu().numpy()
    
    # Plot comparison
    fig, axes = plt.subplots(2, 2, figsize=(14, 12))
    
    # 1. Distribution comparison
    ax = axes[0, 0]
    ax.hist(mc_unc_in, bins=20, alpha=0.5, label='MC Dropout (In-Dist)', color='blue')
    ax.hist(mc_unc_out, bins=20, alpha=0.5, label='MC Dropout (OOD)', color='red')
    ax.set_xlabel('Uncertainty (Std Dev)')
    ax.set_ylabel('Count')
    ax.set_title('MC Dropout Uncertainty Distribution')
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    ax = axes[0, 1]
    ax.hist(ens_unc_in, bins=20, alpha=0.5, label='Ensemble (In-Dist)', color='blue')
    ax.hist(ens_unc_out, bins=20, alpha=0.5, label='Ensemble (OOD)', color='red')
    ax.set_xlabel('Uncertainty (Std Dev)')
    ax.set_ylabel('Count')
    ax.set_title('Ensemble Uncertainty Distribution')
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    # 2. Box plots
    ax = axes[1, 0]
    data_mc = [mc_unc_in, mc_unc_out]
    bp = ax.boxplot(data_mc, labels=['In-Distribution', 'OOD'], patch_artist=True)
    bp['boxes'][0].set_facecolor('lightblue')
    bp['boxes'][1].set_facecolor('lightcoral')
    ax.set_ylabel('Uncertainty')
    ax.set_title('MC Dropout: In-Dist vs OOD')
    ax.grid(True, alpha=0.3, axis='y')
    
    ax = axes[1, 1]
    data_ens = [ens_unc_in, ens_unc_out]
    bp = ax.boxplot(data_ens, labels=['In-Distribution', 'OOD'], patch_artist=True)
    bp['boxes'][0].set_facecolor('lightblue')
    bp['boxes'][1].set_facecolor('lightcoral')
    ax.set_ylabel('Uncertainty')
    ax.set_title('Ensemble: In-Dist vs OOD')
    ax.grid(True, alpha=0.3, axis='y')
    
    plt.tight_layout()
    plt.show()
    
    # Statistics
    print("\n" + "="*70)
    print("UNCERTAINTY QUANTIFICATION COMPARISON")
    print("="*70)
    print(f"\n{'Method':<20} {'In-Dist Mean':<15} {'OOD Mean':<15} {'Separation':<15}")
    print("-"*70)
    
    mc_sep = mc_unc_out.mean() / (mc_unc_in.mean() + 1e-10)
    ens_sep = ens_unc_out.mean() / (ens_unc_in.mean() + 1e-10)
    
    print(f"{'MC Dropout':<20} {mc_unc_in.mean():<15.4f} {mc_unc_out.mean():<15.4f} {mc_sep:<15.2f}x")
    print(f"{'Deep Ensemble':<20} {ens_unc_in.mean():<15.4f} {ens_unc_out.mean():<15.4f} {ens_sep:<15.2f}x")
    
    print("\n" + "="*70)
    print("\nüéØ Key Insights:")
    print(f"1. Both methods detect OOD: uncertainty is {mc_sep:.1f}x and {ens_sep:.1f}x higher")
    print("2. Ensembles typically show better separation (more reliable)")
    print("3. MC Dropout is cheaper (single model) but may underestimate uncertainty")
    print("4. For safety-critical AVs: Ensembles preferred despite higher cost")

compare_uncertainty_methods(mc_model, ensemble, X_train, y_train, X_ood)

## 4. Computational Cost Analysis

In [None]:
def analyze_computational_costs():
    """Analyze training and inference costs."""
    
    # Create test dataset
    X_test = np.random.randn(1000, 2)
    X_test_tensor = torch.FloatTensor(X_test).to(device)
    
    results = []
    
    # 1. Single model (baseline)
    single_model = StandardClassifier(input_dim=2, hidden_dim=100, num_classes=2).to(device)
    single_model.eval()
    
    start = time.time()
    with torch.no_grad():
        _ = single_model(X_test_tensor)
    single_time = time.time() - start
    
    results.append({
        'method': 'Single Model',
        'training_cost': '1x',
        'memory_cost': '1x',
        'inference_time': single_time,
        'relative_time': 1.0,
        'uncertainty': 'No'
    })
    
    # 2. MC Dropout
    mc_samples = [10, 30, 50]
    for n in mc_samples:
        start = time.time()
        _ = mc_model.predict_with_uncertainty(X_test_tensor, n_samples=n)
        mc_time = time.time() - start
        
        results.append({
            'method': f'MC Dropout (N={n})',
            'training_cost': '1x',
            'memory_cost': '1x',
            'inference_time': mc_time,
            'relative_time': mc_time / single_time,
            'uncertainty': 'Yes (Epistemic)'
        })
    
    # 3. Ensemble
    start = time.time()
    _ = ensemble.predict_with_uncertainty(X_test_tensor)
    ens_time = time.time() - start
    
    results.append({
        'method': f'Ensemble (M={ensemble.num_models})',
        'training_cost': f'{ensemble.num_models}x',
        'memory_cost': f'{ensemble.num_models}x',
        'inference_time': ens_time,
        'relative_time': ens_time / single_time,
        'uncertainty': 'Yes (Both types)'
    })
    
    # Plot
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
    
    # Inference time comparison
    methods = [r['method'] for r in results]
    times = [r['inference_time'] * 1000 for r in results]  # Convert to ms
    colors = ['blue'] + ['orange']*3 + ['red']
    
    bars = ax1.bar(methods, times, color=colors, alpha=0.7, edgecolor='black', linewidth=1.5)
    ax1.set_ylabel('Inference Time (ms)')
    ax1.set_title('Inference Speed Comparison (1000 samples)')
    ax1.tick_params(axis='x', rotation=45)
    ax1.grid(True, alpha=0.3, axis='y')
    
    # Add values on bars
    for bar, time_val in zip(bars, times):
        height = bar.get_height()
        ax1.text(bar.get_x() + bar.get_width()/2., height,
                f'{time_val:.1f}ms',
                ha='center', va='bottom', fontsize=9)
    
    # Relative cost visualization
    categories = ['Training Cost', 'Memory Cost', 'Inference Cost']
    mc_costs = [1, 1, 30]  # MC Dropout with N=30
    ens_costs = [5, 5, 5]  # Ensemble with M=5
    
    x = np.arange(len(categories))
    width = 0.35
    
    bars1 = ax2.bar(x - width/2, mc_costs, width, label='MC Dropout (N=30)', 
                    color='orange', alpha=0.7, edgecolor='black', linewidth=1.5)
    bars2 = ax2.bar(x + width/2, ens_costs, width, label='Ensemble (M=5)',
                   color='red', alpha=0.7, edgecolor='black', linewidth=1.5)
    
    ax2.set_ylabel('Relative Cost (vs Single Model)')
    ax2.set_title('Resource Cost Comparison')
    ax2.set_xticks(x)
    ax2.set_xticklabels(categories)
    ax2.legend()
    ax2.grid(True, alpha=0.3, axis='y')
    ax2.axhline(y=1, color='blue', linestyle='--', linewidth=2, alpha=0.5, label='Single Model')
    
    # Add values
    for bars in [bars1, bars2]:
        for bar in bars:
            height = bar.get_height()
            ax2.text(bar.get_x() + bar.get_width()/2., height,
                    f'{height:.0f}x',
                    ha='center', va='bottom', fontsize=10, fontweight='bold')
    
    plt.tight_layout()
    plt.show()
    
    # Print table
    print("\n" + "="*90)
    print("COMPUTATIONAL COST ANALYSIS")
    print("="*90)
    print(f"\n{'Method':<25} {'Train':<10} {'Memory':<10} {'Infer (ms)':<12} {'Speedup':<12} {'Uncertainty'}")
    print("-"*90)
    
    for r in results:
        print(f"{r['method']:<25} {r['training_cost']:<10} {r['memory_cost']:<10} "
              f"{r['inference_time']*1000:<12.2f} {r['relative_time']:<12.1f}x {r['uncertainty']}")
    
    print("\n" + "="*90)
    print("\nüéØ Trade-off Analysis:")
    print("\nMC Dropout:")
    print("  ‚úÖ Same training cost as single model")
    print("  ‚úÖ Same memory footprint")
    print("  ‚ùå Slower inference (N forward passes)")
    print("  ‚ö†Ô∏è  May underestimate uncertainty")
    print("\nDeep Ensemble:")
    print("  ‚ùå M times training cost")
    print("  ‚ùå M times memory (critical for edge deployment!)")
    print("  ‚ùå M times slower inference")
    print("  ‚úÖ Better uncertainty estimates")
    print("  ‚úÖ Can capture both uncertainty types")
    print("\nüí° Recommendation for AVs:")
    print("  - Development/Research: Use Ensembles (M=5-10)")
    print("  - Production (cloud): Use Ensembles (M=3-5)")
    print("  - Production (edge): Use MC Dropout (N=10-30) or optimized ensembles")

analyze_computational_costs()

## 5. Other Uncertainty Quantification Methods (Brief Overview)

### 5.1 Bayesian Neural Networks (BNNs)

**Idea:** Place distributions over network weights instead of point estimates.

```python
# Conceptual (full implementation is complex)
# Instead of: w = optimize(data)
# We get: p(w|data) - full posterior distribution
```

**Challenges:**
- Computationally expensive
- Requires careful prior selection
- Approximate inference needed (variational inference, MCMC)

**Libraries:** PyTorch BNN extensions, TensorFlow Probability, Pyro

### 5.2 Evidential Deep Learning

**Idea:** Network directly predicts parameters of a prior distribution.

**Advantages:**
- Single forward pass (fast!)
- Separates aleatoric and epistemic uncertainty
- Principled approach based on subjective logic

**Reference:** Sensoy et al. "Evidential Deep Learning to Quantify Classification Uncertainty" (NeurIPS 2018)

### 5.3 Comparison Summary

In [None]:
# Comparison table
import pandas as pd

comparison_data = {
    'Method': [
        'Single Model',
        'MC Dropout',
        'Deep Ensemble',
        'Bayesian NN',
        'Evidential DL'
    ],
    'Training Cost': ['1x', '1x', 'Mx', 'High', '1x'],
    'Inference Cost': ['1x', 'Nx', 'Mx', 'High', '1x'],
    'Memory': ['1x', '1x', 'Mx', '1-2x', '1x'],
    'Aleatoric': ['‚ùå', '‚ùå', '‚úÖ*', '‚úÖ', '‚úÖ'],
    'Epistemic': ['‚ùå', '‚úÖ', '‚úÖ', '‚úÖ', '‚úÖ'],
    'Implementation': ['Easy', 'Easy', 'Easy', 'Hard', 'Medium'],
    'Quality': ['N/A', 'Good', 'Excellent', 'Excellent', 'Good'],
    'Production Ready': ['‚úÖ', '‚úÖ', '‚úÖ', '‚ö†Ô∏è', '‚ö†Ô∏è']
}

df = pd.DataFrame(comparison_data)

print("\n" + "="*100)
print("UNCERTAINTY QUANTIFICATION METHODS COMPARISON")
print("="*100)
print(df.to_string(index=False))
print("\n*Deep Ensemble can capture aleatoric if models predict variance")
print("N = number of MC samples, M = number of ensemble models")
print("="*100)

# Visualize
fig, ax = plt.subplots(figsize=(12, 6))

methods = df['Method'].tolist()
metrics = ['Training Cost', 'Inference Cost', 'Memory']

# Convert costs to numeric (rough estimates)
cost_map = {'1x': 1, 'Nx': 5, 'Mx': 5, 'High': 10, '1-2x': 1.5}
quality_map = {'N/A': 0, 'Good': 3, 'Excellent': 5}
impl_map = {'Easy': 1, 'Medium': 2, 'Hard': 3}

# Create cost score (lower is better)
cost_scores = []
for _, row in df.iterrows():
    cost = (cost_map[row['Training Cost']] + 
            cost_map[row['Inference Cost']] + 
            cost_map[row['Memory']]) / 3
    cost_scores.append(cost)

# Create quality score (higher is better)
quality_scores = [quality_map[q] for q in df['Quality']]

# Plot
x = np.arange(len(methods))
width = 0.35

bars1 = ax.bar(x - width/2, cost_scores, width, label='Computational Cost',
              color='red', alpha=0.6, edgecolor='black', linewidth=1.5)
bars2 = ax.bar(x + width/2, quality_scores, width, label='Uncertainty Quality',
              color='green', alpha=0.6, edgecolor='black', linewidth=1.5)

ax.set_ylabel('Score')
ax.set_title('Method Comparison: Cost vs Quality\n(Lower cost is better, Higher quality is better)')
ax.set_xticks(x)
ax.set_xticklabels(methods, rotation=15, ha='right')
ax.legend()
ax.grid(True, alpha=0.3, axis='y')

# Add values
for bars in [bars1, bars2]:
    for bar in bars:
        height = bar.get_height()
        ax.text(bar.get_x() + bar.get_width()/2., height,
               f'{height:.1f}',
               ha='center', va='bottom', fontsize=9)

plt.tight_layout()
plt.show()

print("\nüéØ Bottom Line for AV Applications:")
print("\n1. **For Production AVs:**")
print("   - Deep Ensembles (M=3-5) ‚Üí Best uncertainty, proven track record")
print("   - MC Dropout (N=10-30) ‚Üí Good uncertainty, lower cost")
print("\n2. **For Research:**")
print("   - Deep Ensembles (M=5-10) ‚Üí Benchmark")
print("   - Bayesian NNs ‚Üí Theoretical rigor")
print("   - Evidential DL ‚Üí Emerging method, watch this space")
print("\n3. **For Edge Deployment:**")
print("   - MC Dropout ‚Üí Memory constrained")
print("   - Evidential DL ‚Üí Fast inference needed")
print("   - Knowledge Distillation from Ensemble ‚Üí Best of both worlds")

## Summary and Key Takeaways

### What We Learned

1. **Monte Carlo Dropout:**
   - Enable dropout at test time
   - Multiple forward passes = uncertainty estimate
   - Cheap (single model) but slower inference
   - Good epistemic uncertainty estimation

2. **Deep Ensembles:**
   - Train multiple independent models
   - Model disagreement = uncertainty
   - Expensive but excellent uncertainty quality
   - State-of-the-art empirical performance

3. **Trade-offs:**
   - MC Dropout: Low memory, moderate quality
   - Ensembles: High memory, high quality
   - Choice depends on deployment constraints

4. **For AVs:**
   - Both methods successfully detect OOD scenarios
   - Ensembles preferred for safety-critical applications
   - MC Dropout useful for resource-constrained edge deployment

### Next Steps

- **Notebook 17:** Learn how to calibrate model confidence
- **Notebook 18:** Learn validation strategies for AV perception

---

## Interactive Exercise

**Try this:**
1. Change dropout rate in MC Dropout model (try 0.1, 0.3, 0.5)
2. Change number of ensemble models (try M=3, M=10)
3. Change number of MC samples (try N=10, N=100)
4. How do these affect uncertainty estimates?

**Challenge:**
- Implement a hybrid approach: Train an ensemble with MC Dropout
- Does combining both methods improve uncertainty?

**Discussion:**
- For a production AV, would you choose MC Dropout or Ensembles? Why?
- How would you decide the threshold for triggering fallback behavior?