# Module 5.4: Multimodal Edge AI Model - **HANDS-ON VERSION**

## Combined Case Study: Cybersecurity, Edge AI and Autonomous Driving

---

## Objective

Build and train a **lightweight multimodal neural network** that processes:
- **Vehicle telemetry features** (autonomous driving sensors)
- **Network traffic features** (cybersecurity logs)

The model will classify each timestamp into:
- `0` = normal operation
- `1` = physical anomaly (vehicle sensor/behavior issue)
- `2` = network anomaly (cybersecurity threat)

**Key Learning Goals:**
- Design edge-optimized neural network architectures
- Implement multimodal data fusion techniques
- Train lightweight models suitable for real-time deployment
- Evaluate performance across different anomaly types

---

## Multimodal Architecture Overview

Our model uses a **MobileNetV2-sized dual-branch architecture** (~3.4M parameters):

```
Vehicle Telemetry    Network Traffic
   Features             Features
      ↓                    ↓
   [512→256→128→         [512→256→128→
    64→32→16]             64→32→16]
      ↓                    ↓
      └──── Attention ────┘
              ↓
   [512→256→128→64→32→16→3]
```

This enhanced design:
1. **Large capacity**: ~3.4M parameters similar to MobileNetV2
2. **Deep learning**: 6 layers per branch for complex pattern recognition
3. **Attention mechanism**: Focused feature fusion for better multimodal learning
4. **High performance**: Suitable for complex anomaly detection tasks

---
**🔥 HANDS-ON PRACTICE**: This notebook contains code completion exercises marked with `# TODO:` comments. Fill in the missing code to build and train your multimodal Edge AI model!

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from pathlib import Path

# Machine Learning libraries
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader, TensorDataset
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score, precision_score, recall_score
from sklearn.utils.class_weight import compute_class_weight

# Configuration
warnings.filterwarnings('ignore')
np.random.seed(42)
torch.manual_seed(42)
if torch.cuda.is_available():
    torch.cuda.manual_seed(42)

# Set plotting style
plt.style.use('default')
plt.rcParams['figure.figsize'] = (12, 8)
sns.set_palette("husl")

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

print("Multimodal Edge AI Model - Libraries Loaded Successfully!")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA Available: {torch.cuda.is_available()}")
print(f"Device: {device}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

## Step 1: Load and Examine the Preprocessed Dataset

Load the fused dataset from the previous notebook and examine its structure.

In [None]:
def load_and_examine_dataset(filename='combined_dataset.csv'):
    """
    Load the preprocessed dataset and examine its structure
    
    Parameters:
    - filename: Path to the CSV file created in previous notebook
    
    Returns:
    - DataFrame with the combined dataset
    """
    
    # Check if file exists
    if not Path(filename).exists():
        print(f"ERROR: {filename} not found!")
        print("\nSOLUTION: Please run Notebook 01 (Data Fusion and Preprocessing) first.")
        print("   This will create the required 'combined_dataset.csv' file.")
        raise FileNotFoundError(f"Dataset file {filename} not found")
    
    print(f"Loading dataset from {filename}...")
    
    # Load the dataset
    df = pd.read_csv(filename)
    
    print(f"Dataset loaded successfully!")
    print(f"   Shape: {df.shape}")
    print(f"   Memory usage: {df.memory_usage(deep=True).sum() / 1024**2:.2f} MB")
    
    # Examine structure
    print(f"\nDataset Structure:")
    print(f"   Columns: {list(df.columns)}")
    
    # Identify feature categories
    vehicle_features = [col for col in df.columns if col.startswith('veh_')]
    network_features = [col for col in df.columns if col.startswith('net_')]
    
    print(f"\nVehicle Features ({len(vehicle_features)}):")
    for feature in vehicle_features:
        print(f"   • {feature}")
    
    print(f"\nNetwork Features ({len(network_features)}):")
    for feature in network_features:
        print(f"   • {feature}")
    
    # Label distribution
    label_counts = df['label'].value_counts().sort_index()
    print(f"\nLabel Distribution:")
    labels = ['Normal', 'Physical Anomaly', 'Network Anomaly']
    for i, (count, label_name) in enumerate(zip(label_counts, labels)):
        print(f"   {i}: {label_name} - {count} samples ({count/len(df)*100:.1f}%)")
    
    return df, vehicle_features, network_features

# Load the dataset
dataset, vehicle_cols, network_cols = load_and_examine_dataset()

# Display sample data
print(f"\nSample Data:")
print(dataset.head(3))

# Check for missing values
missing_data = dataset.isnull().sum().sum()
print(f"\nData Quality Check:")
print(f"   Missing values: {missing_data}")
print(f"   Data completeness: {(1 - missing_data/dataset.size)*100:.2f}%")

## Step 2: Prepare Features and Train/Test Split

Separate the multimodal features and prepare data for training.

**🔥 HANDS-ON PRACTICE**: Complete the data preparation functions to organize multimodal features for neural network training!

In [None]:
def prepare_multimodal_data(df, vehicle_features, network_features, test_size=0.2, random_state=42):
    """
    Prepare multimodal data for training
    
    Parameters:
    - df: Input dataset
    - vehicle_features: List of vehicle feature column names
    - network_features: List of network feature column names
    - test_size: Fraction of data for testing
    - random_state: Random seed for reproducibility
    
    Returns:
    - Prepared train/test splits for both modalities and labels
    """
    
    print("Preparing multimodal data for training...")
    
    # TODO: Extract features from dataframe
    # HINT: Use df[column_list].values to extract feature arrays
    X_vehicle = # TODO: Extract vehicle features as numpy array
    X_network = # TODO: Extract network features as numpy array
    y = # TODO: Extract labels as numpy array
    
    print(f"   Vehicle features shape: {X_vehicle.shape}")
    print(f"   Network features shape: {X_network.shape}")
    print(f"   Labels shape: {y.shape}")
    
    # TODO: Check for any remaining missing values
    # HINT: Use np.isnan(array).sum() to count NaN values
    vehicle_missing = # TODO: Count missing values in vehicle features
    network_missing = # TODO: Count missing values in network features
    
    if vehicle_missing > 0 or network_missing > 0:
        print(f"   Found missing values: Vehicle={vehicle_missing}, Network={network_missing}")
        print(f"   Filling missing values with median...")
        
        # TODO: Fill missing values with median
        # HINT: Use SimpleImputer with strategy='median'
        from sklearn.impute import SimpleImputer
        imputer_vehicle = # TODO: Create SimpleImputer for vehicle features
        imputer_network = # TODO: Create SimpleImputer for network features
        
        # TODO: Apply imputers to transform the data
        X_vehicle = # TODO: Fit and transform vehicle features
        X_network = # TODO: Fit and transform network features
    
    # TODO: Create train/test split
    print(f"\nCreating train/test split ({(1-test_size)*100:.0f}%/{test_size*100:.0f}%)...")
    
    # TODO: Generate indices and split them to maintain alignment
    # HINT: Use np.arange(len(y)) to create indices, then train_test_split with stratify=y
    indices = # TODO: Create array of indices from 0 to len(y)
    train_idx, test_idx = # TODO: Split indices using train_test_split with stratification
    
    # TODO: Split the data using the indices
    X_vehicle_train, X_vehicle_test = # TODO: Split vehicle features using indices
    X_network_train, X_network_test = # TODO: Split network features using indices  
    y_train, y_test = # TODO: Split labels using indices
    
    # TODO: Standardize features (important for PyTorch)
    # HINT: Create StandardScaler instances and fit on training data
    scaler_vehicle = # TODO: Create StandardScaler for vehicle features
    scaler_network = # TODO: Create StandardScaler for network features
    
    # TODO: Fit scalers on training data and transform both train and test
    X_vehicle_train = # TODO: Fit and transform vehicle training data
    X_vehicle_test = # TODO: Transform vehicle test data (don't fit!)
    X_network_train = # TODO: Fit and transform network training data
    X_network_test = # TODO: Transform network test data (don't fit!)
    
    print(f"   Train set: {len(y_train)} samples")
    print(f"   Test set: {len(y_test)} samples")
    print(f"   Features standardized")
    
    # Check label distribution in splits
    train_dist = np.bincount(y_train) / len(y_train) * 100
    test_dist = np.bincount(y_test) / len(y_test) * 100
    
    print(f"\nLabel Distribution:")
    labels = ['Normal', 'Physical', 'Network']
    for i, label in enumerate(labels):
        print(f"   {label}: Train {train_dist[i]:.1f}%, Test {test_dist[i]:.1f}%")
    
    # TODO: Convert to PyTorch tensors
    # HINT: Use torch.FloatTensor for features and torch.LongTensor for labels
    X_vehicle_train_tensor = # TODO: Convert vehicle training data to FloatTensor
    X_vehicle_test_tensor = # TODO: Convert vehicle test data to FloatTensor
    X_network_train_tensor = # TODO: Convert network training data to FloatTensor
    X_network_test_tensor = # TODO: Convert network test data to FloatTensor
    y_train_tensor = # TODO: Convert training labels to LongTensor
    y_test_tensor = # TODO: Convert test labels to LongTensor
    
    return {
        'X_vehicle_train': X_vehicle_train_tensor,
        'X_vehicle_test': X_vehicle_test_tensor,
        'X_network_train': X_network_train_tensor,
        'X_network_test': X_network_test_tensor,
        'y_train': y_train_tensor,
        'y_test': y_test_tensor,
        'scaler_vehicle': scaler_vehicle,
        'scaler_network': scaler_network
    }

# TODO: Prepare the data
# HINT: Call prepare_multimodal_data with dataset and feature column lists
print("Step 2: Data Preparation for Neural Network Training")
print("=" * 60)

data_splits = # TODO: Call data preparation function

print(f"\n✅ Data preparation completed!")
print(f"   Vehicle training tensor: {data_splits['X_vehicle_train'].shape}")
print(f"   Network training tensor: {data_splits['X_network_train'].shape}")
print(f"   Training labels: {data_splits['y_train'].shape}")
print(f"   Output classes: {len(torch.unique(data_splits['y_train']))}")

## Step 3: Build Multimodal Neural Network Architecture

Design a MobileNetV2-sized dual-branch neural network for enhanced performance (~3.4M parameters).

**🔥 HANDS-ON PRACTICE**: Complete the neural network architecture by implementing the multimodal fusion layers and forward pass!

In [None]:
class MultimodalEdgeAI(nn.Module):
    """
    OPTIMIZED MobileNetV2-sized multimodal neural network
    
    Architecture with OPTIMIZED dropout rates (0.7 scale factor):
    - Vehicle Branch: 6 layers with optimized dropout
    - Network Branch: 6 layers with optimized dropout  
    - Fusion: 5 layers with reduced dropout (prevent overfitting)
    Target: ~3.4M parameters (similar to MobileNetV2)
    """
    
    def __init__(self, vehicle_input_size, network_input_size, num_classes=3):
        super(MultimodalEdgeAI, self).__init__()
        
        # OPTIMIZED dropout rates (0.7 scale factor from hyperparameter search)
        dropout_scale = 0.7
        
        # Large Vehicle telemetry branch with OPTIMIZED dropout
        self.vehicle_branch = nn.Sequential(
            # Layer 1
            nn.Linear(vehicle_input_size, 512),
            nn.ReLU(),
            nn.BatchNorm1d(512),
            nn.Dropout(0.3 * dropout_scale),  # OPTIMIZED: 0.21
            # Layer 2
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.BatchNorm1d(256),
            nn.Dropout(0.25 * dropout_scale),  # OPTIMIZED: 0.175
            # Layer 3
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.BatchNorm1d(128),
            nn.Dropout(0.2 * dropout_scale),  # OPTIMIZED: 0.14
            # Layer 4
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.BatchNorm1d(64),
            nn.Dropout(0.15 * dropout_scale),  # OPTIMIZED: 0.105
            # Layer 5
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.BatchNorm1d(32),
            nn.Dropout(0.1 * dropout_scale),  # OPTIMIZED: 0.07
            # Layer 6
            nn.Linear(32, 16),
            nn.ReLU(),
            nn.BatchNorm1d(16)
        )
        
        # Large Network traffic branch with OPTIMIZED dropout
        self.network_branch = nn.Sequential(
            # Layer 1
            nn.Linear(network_input_size, 512),
            nn.ReLU(),
            nn.BatchNorm1d(512),
            nn.Dropout(0.3 * dropout_scale),  # OPTIMIZED: 0.21
            # Layer 2
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.BatchNorm1d(256),
            nn.Dropout(0.25 * dropout_scale),  # OPTIMIZED: 0.175
            # Layer 3
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.BatchNorm1d(128),
            nn.Dropout(0.2 * dropout_scale),  # OPTIMIZED: 0.14
            # Layer 4
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.BatchNorm1d(64),
            nn.Dropout(0.15 * dropout_scale),  # OPTIMIZED: 0.105
            # Layer 5
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.BatchNorm1d(32),
            nn.Dropout(0.1 * dropout_scale),  # OPTIMIZED: 0.07
            # Layer 6
            nn.Linear(32, 16),
            nn.ReLU(),
            nn.BatchNorm1d(16)
        )
        
        # TODO: Attention mechanism for feature fusion
        # HINT: Create a sequential layer that takes 32 inputs, projects to 64, applies Tanh, 
        #       then projects back to 32 and applies Softmax
        self.attention = nn.Sequential(
            # TODO: Add linear layer from 32 to 64
            # TODO: Add Tanh activation
            # TODO: Add linear layer from 64 to 32
            # TODO: Add Softmax activation with dim=1
        )
        
        # OPTIMIZED Fusion layers with reduced dropout (prevent overfitting)
        self.fusion = nn.Sequential(
            # Layer 1
            nn.Linear(32, 512),  # 16 + 16 from both branches
            nn.ReLU(),
            nn.BatchNorm1d(512),
            nn.Dropout(0.25 * dropout_scale),  # OPTIMIZED: 0.175 (reduced from 0.4)
            # Layer 2
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.BatchNorm1d(256),
            nn.Dropout(0.2 * dropout_scale),  # OPTIMIZED: 0.14 (reduced from 0.35)
            # Layer 3
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.BatchNorm1d(128),
            nn.Dropout(0.15 * dropout_scale),  # OPTIMIZED: 0.105 (reduced from 0.3)
            # Layer 4
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.BatchNorm1d(64),
            nn.Dropout(0.1 * dropout_scale),  # OPTIMIZED: 0.07 (reduced from 0.25)
            # Layer 5
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.BatchNorm1d(32),
            nn.Dropout(0.05 * dropout_scale),  # OPTIMIZED: 0.035 (reduced from 0.2)
            # Layer 6
            nn.Linear(32, 16),
            nn.ReLU(),
            nn.BatchNorm1d(16),
            nn.Dropout(0.05),  # Minimal dropout for final layer
            # Layer 7 (Output)
            nn.Linear(16, num_classes)
        )
        
    def forward(self, vehicle_input, network_input):
        # TODO: Process each branch through their respective networks
        # HINT: Pass inputs through self.vehicle_branch and self.network_branch
        vehicle_features = # TODO: Process vehicle input through vehicle branch
        network_features = # TODO: Process network input through network branch
        
        # TODO: Concatenate features from both branches
        # HINT: Use torch.cat([tensor1, tensor2], dim=1) to concatenate along feature dimension
        fused_features = # TODO: Concatenate vehicle and network features
        
        # TODO: Apply attention mechanism
        # HINT: Pass fused_features through self.attention to get weights
        attention_weights = # TODO: Get attention weights from fused features
        
        # TODO: Apply attention weights to features (element-wise multiplication)
        # HINT: Multiply fused_features by attention_weights
        attended_features = # TODO: Apply attention weights to fused features
        
        # TODO: Final classification through fusion layers
        # HINT: Pass attended_features through self.fusion
        output = # TODO: Get final output through fusion layers
        
        return output

def build_multimodal_model(vehicle_input_shape, network_input_shape, num_classes=3):
    """
    Build a MobileNetV2-sized multimodal neural network for enhanced performance
    
    Parameters:
    - vehicle_input_shape: Size of vehicle features
    - network_input_shape: Size of network features
    - num_classes: Number of output classes
    
    Returns:
    - PyTorch model (~3.4M parameters similar to MobileNetV2)
    """
    
    print("Building MobileNetV2-sized multimodal neural network...")
    
    # TODO: Create model instance
    # HINT: Use MultimodalEdgeAI class with the provided parameters
    model = # TODO: Create MultimodalEdgeAI instance
    
    # TODO: Move model to device (CPU or GPU)
    # HINT: Use model.to(device)
    model = # TODO: Move model to appropriate device
    
    print(f"Model built successfully!")
    
    return model

# TODO: Build the model
print("Step 3: Building Multimodal Neural Network")
print("=" * 60)

# TODO: Call build_multimodal_model with appropriate parameters
# HINT: Use shapes from data_splits tensors
model = # TODO: Build model with vehicle and network input shapes

# Display model architecture
print("\nModel Architecture Summary:")
print(model)

# TODO: Calculate model size (important for edge deployment)
print("\nModel Statistics:")
# TODO: Count total parameters in the model
# HINT: Use sum(p.numel() for p in model.parameters())
total_params = # TODO: Count total parameters
trainable_params = # TODO: Count trainable parameters (same calculation with p.requires_grad)

print(f"   Total parameters: {total_params:,}")
print(f"   Trainable parameters: {trainable_params:,}")
print(f"   Model size: {total_params * 4 / 1024**2:.2f} MB (32-bit floats)")
print(f"   Target: ~3.4M parameters (MobileNetV2-sized)")
print(f"   Suitable for edge deployment!")

## Step 4: Train the Model - HANDS-ON PRACTICE

### Step 4A: Define Focal Loss and Helper Functions

First, let's define the Focal Loss class and supporting components for training.

In [None]:
# TODO: Define Focal Loss class for handling class imbalance
class FocalLoss(nn.Module):
    def __init__(self, alpha=None, gamma=2.0, reduction='mean'):
        super(FocalLoss, self).__init__()
        # TODO: Store loss parameters
        # HINT: Assign alpha, gamma, and reduction to self attributes
        self.alpha = # TODO: Store alpha parameter
        self.gamma = # TODO: Store gamma parameter  
        self.reduction = # TODO: Store reduction parameter
        
    def forward(self, inputs, targets):
        # TODO: Calculate standard cross entropy loss (without reduction)
        # HINT: Use F.cross_entropy with reduction='none'
        ce_loss = # TODO: Calculate cross entropy loss
        
        # TODO: Calculate probabilities from cross entropy
        # HINT: Use torch.exp(-ce_loss) to get pt
        pt = # TODO: Calculate pt from ce_loss
        
        # TODO: Calculate focal loss with focusing term
        # HINT: focal_loss = (1 - pt) ** self.gamma * ce_loss
        focal_loss = # TODO: Apply focal loss formula
        
        # TODO: Apply alpha weighting if provided
        if self.alpha is not None:
            # TODO: Get alpha weights for current targets
            # HINT: Use self.alpha[targets] to index into alpha tensor
            alpha_t = # TODO: Get alpha weights for targets
            # TODO: Apply alpha weighting
            focal_loss = # TODO: Multiply focal_loss by alpha_t
            
        # TODO: Apply reduction (mean, sum, or none)
        if self.reduction == 'mean':
            return # TODO: Return mean of focal_loss
        elif self.reduction == 'sum':
            return # TODO: Return sum of focal_loss
        else:
            return # TODO: Return focal_loss without reduction

def calculate_class_weights(y_train):
    """Use optimized class weights found by hyperparameter search."""
    y_train_numpy = y_train.numpy()
    class_counts = np.bincount(y_train_numpy)
    
    # TODO: Set optimized class weights
    # HINT: Weights should be [0.6, 1.4, 1.0] for [Normal, Physical, Network]
    optimized_weights = # TODO: Define optimized weights list
    
    print(f"Class Weights (from hyperparameter search):")
    labels = ['Normal', 'Physical', 'Network']
    for i, (label, weight, count) in enumerate(zip(labels, optimized_weights, class_counts)):
        print(f"   {label:<12}: {weight:.2f} (samples: {count})")
    
    # TODO: Convert to PyTorch tensor
    # HINT: Use torch.FloatTensor(optimized_weights)
    return # TODO: Return FloatTensor of weights

print("Focal Loss and helper functions defined!")

### Step 4B: Main Training Function

The core training loop with enhanced monitoring and stability improvements.

In [None]:
def train_multimodal_model(model, data_splits, epochs=100, batch_size=32, validation_split=0.2, learning_rate=3e-4):
    """
    OPTIMIZED TRAINING with parameters found by hyperparameter search:
    - Learning rate: 3e-4 (optimal for stability)
    - Weight decay: 1e-4 (optimal regularization)
    - Focal gamma: 1.5 (balanced focus on hard examples)
    - Loss mix: 50/50 (balanced Focal/CrossEntropy)
    - Class weights: [0.6, 1.4, 1.0] (optimal class balance)
    """
    
    # TODO: Calculate enhanced class weights and move to device
    # HINT: Call calculate_class_weights and use .to(device)
    alpha_tensor = # TODO: Calculate class weights and move to device
    
    # TODO: Split training data for validation
    n_train = len(data_splits['X_vehicle_train'])
    # TODO: Calculate validation size
    # HINT: n_val = int(n_train * validation_split)
    n_val = # TODO: Calculate validation size
    n_train_actual = n_train - n_val
    
    # TODO: Create random indices for train/val split
    # HINT: Use torch.randperm(n_train)
    indices = # TODO: Create random permutation of indices
    train_indices = indices[:n_train_actual]
    val_indices = indices[n_train_actual:]
    
    # TODO: Split training data using indices
    # HINT: Use data_splits['X_vehicle_train'][train_indices], etc.
    X_vehicle_train = # TODO: Get vehicle training data
    X_network_train = # TODO: Get network training data  
    y_train = # TODO: Get training labels
    
    # TODO: Split validation data using indices
    X_vehicle_val = # TODO: Get vehicle validation data
    X_network_val = # TODO: Get network validation data
    y_val = # TODO: Get validation labels
    
    # TODO: Create PyTorch datasets
    # HINT: Use TensorDataset(X_vehicle_train, X_network_train, y_train)
    train_dataset = # TODO: Create training dataset
    val_dataset = # TODO: Create validation dataset
    
    # TODO: Create data loaders
    # HINT: Use DataLoader with batch_size and shuffle parameters
    train_loader = # TODO: Create training data loader (shuffle=True)
    val_loader = # TODO: Create validation data loader (shuffle=False)
    
    # TODO: Initialize loss functions with OPTIMIZED parameters
    # HINT: Use FocalLoss and CrossEntropyLoss with alpha_tensor
    focal_criterion = # TODO: Create FocalLoss with alpha=alpha_tensor, gamma=1.5
    ce_criterion = # TODO: Create CrossEntropyLoss with weight=alpha_tensor, label_smoothing=0.05
    
    # TODO: Initialize optimizer with OPTIMIZED parameters
    # HINT: Use optim.AdamW with model parameters, lr, and weight_decay=1e-4
    optimizer = # TODO: Create AdamW optimizer
    
    # TODO: Initialize learning rate scheduler
    # HINT: Use ReduceLROnPlateau with mode='min', factor=0.7, patience=20
    scheduler = # TODO: Create learning rate scheduler
    
    # Initialize training history dictionary
    history = {
        'train_loss': [], 'val_loss': [], 'train_acc': [], 'val_acc': [],
        'train_precision': [], 'val_precision': [], 'train_recall': [], 'val_recall': [],
        'train_f1': [], 'val_f1': [],
        'learning_rate': [], 'focal_loss': [], 'ce_loss': []
    }
    
    print(f"\\nStarting fine-tuned training...")
    
    best_val_loss = float('inf')
    patience_counter = 0
    patience_limit = 40
    
    for epoch in range(epochs):
        # TODO: Set model to training mode
        # HINT: Use model.train()
        # TODO: Set model mode for training
        
        train_loss = 0.0
        train_focal_loss = 0.0
        train_ce_loss = 0.0
        train_correct = 0
        train_total = 0
        train_predictions = []
        train_targets = []
        
        for batch_vehicle, batch_network, batch_labels in train_loader:
            # TODO: Move batch data to device
            # HINT: Use .to(device) for each tensor
            batch_vehicle = # TODO: Move vehicle batch to device
            batch_network = # TODO: Move network batch to device
            batch_labels = # TODO: Move labels to device
            
            # TODO: Zero gradients
            # HINT: Use optimizer.zero_grad()
            # TODO: Clear gradients
            
            # TODO: Forward pass through model
            # HINT: Call model(batch_vehicle, batch_network)
            outputs = # TODO: Get model outputs
            
            # TODO: Calculate losses
            # HINT: Call focal_criterion and ce_criterion with outputs and labels
            focal_loss = # TODO: Calculate focal loss
            ce_loss = # TODO: Calculate cross entropy loss
            
            # TODO: Combine losses with 50/50 balance
            # HINT: combined_loss = 0.5 * focal_loss + 0.5 * ce_loss
            combined_loss = # TODO: Combine focal and CE losses
            
            # TODO: Backward pass and optimization
            # HINT: Use combined_loss.backward(), clip gradients, optimizer.step()
            # TODO: Compute gradients
            # TODO: Clip gradients with max_norm=0.5
            # TODO: Update parameters
            
            # Update training statistics
            train_loss += combined_loss.item()
            train_focal_loss += focal_loss.item()
            train_ce_loss += ce_loss.item()
            
            # TODO: Calculate predictions and accuracy
            # HINT: Use torch.max(outputs.data, 1) to get predicted classes
            _, predicted = # TODO: Get predicted classes from outputs
            train_total += batch_labels.size(0)
            train_correct += (predicted == batch_labels).sum().item()
            
            train_predictions.extend(predicted.cpu().numpy())
            train_targets.extend(batch_labels.cpu().numpy())
        
        # TODO: Validation phase - Set model to evaluation mode
        # HINT: Use model.eval()
        # TODO: Set model mode for evaluation
        val_loss = 0.0
        val_correct = 0
        val_total = 0
        val_predictions = []
        val_targets = []
        
        # TODO: Disable gradient computation for validation
        # HINT: Use torch.no_grad() context manager
        with # TODO: Create no_grad context:
            for batch_vehicle, batch_network, batch_labels in val_loader:
                # TODO: Move validation batch to device
                batch_vehicle = # TODO: Move vehicle batch to device
                batch_network = # TODO: Move network batch to device
                batch_labels = # TODO: Move labels to device
                
                # TODO: Forward pass (no gradients needed)
                outputs = # TODO: Get model outputs
                
                # TODO: Calculate validation losses
                focal_loss = # TODO: Calculate focal loss
                ce_loss = # TODO: Calculate cross entropy loss
                combined_loss = # TODO: Combine losses (50/50)
                
                val_loss += combined_loss.item()
                
                # TODO: Calculate validation predictions
                _, predicted = # TODO: Get predicted classes
                val_total += batch_labels.size(0)
                val_correct += (predicted == batch_labels).sum().item()
                
                val_predictions.extend(predicted.cpu().numpy())
                val_targets.extend(batch_labels.cpu().numpy())
        
        # TODO: Calculate average losses and accuracies
        train_loss_avg = # TODO: Calculate average training loss
        val_loss_avg = # TODO: Calculate average validation loss
        train_acc = # TODO: Calculate training accuracy
        val_acc = # TODO: Calculate validation accuracy
        
        # TODO: Calculate precision, recall, and F1 scores
        from sklearn.metrics import f1_score
        # HINT: Use precision_score, recall_score, f1_score with average='macro'
        train_precision = # TODO: Calculate training precision
        val_precision = # TODO: Calculate validation precision
        train_recall = # TODO: Calculate training recall
        val_recall = # TODO: Calculate validation recall
        train_f1 = # TODO: Calculate training F1
        val_f1 = # TODO: Calculate validation F1
        
        # TODO: Store metrics in history
        # HINT: Append each metric to corresponding history list
        history['train_loss'].append(train_loss_avg)
        history['val_loss'].append(val_loss_avg)
        # TODO: Append remaining metrics to history
        
        # TODO: Update learning rate scheduler
        # HINT: Call scheduler.step(val_loss_avg)
        # TODO: Update scheduler with validation loss
        
        # Print progress every 10 epochs
        if (epoch + 1) % 10 == 0 or epoch == epochs - 1:
            print(f"Epoch {epoch+1}/{epochs} - "
                  f"Train Loss: {train_loss_avg:.4f}, Acc: {train_acc:.4f}, F1: {train_f1:.3f} - "
                  f"Val Loss: {val_loss_avg:.4f}, Acc: {val_acc:.4f}, F1: {val_f1:.3f} - "
                  f"LR: {optimizer.param_groups[0]['lr']:.6f}")
        
        # TODO: Early stopping logic
        if val_loss_avg < best_val_loss:
            best_val_loss = val_loss_avg
            patience_counter = 0
            # TODO: Save best model
            # HINT: Use torch.save(model.state_dict(), 'best_improved_model.pth')
            # TODO: Save model state dict
        else:
            patience_counter += 1
            if patience_counter >= patience_limit:
                print(f"\\nEarly stopping after {epoch+1} epochs (patience: {patience_limit})")
                # TODO: Load best model weights
                # HINT: Use model.load_state_dict(torch.load('best_improved_model.pth'))
                # TODO: Load best model state
                break
    
    print(f"\\nFine-tuned Training Completed!")
    print(f"   Best validation loss: {best_val_loss:.4f}")
    print(f"   Final validation accuracy: {history['val_acc'][-1]:.4f}")
    
    return history

print("Complete training function defined!")

### Step 4C: Training Loop Execution

Execute the training with proper error handling and metrics tracking.

In [None]:
# TODO: Rebuild model with optimized architecture
# HINT: Use build_multimodal_model with shapes from data_splits
model = # TODO: Build model with vehicle and network input shapes and 3 classes

# TODO: Start training with optimized parameters
# HINT: Call train_multimodal_model with model, data_splits, epochs=100, batch_size=32
training_history = # TODO: Train the model and get history

print("\\n" + "="*70)
print("TRAINING COMPLETED SUCCESSFULLY!")
print("="*70)

## Step 5: Visualize Training Progress - HANDS-ON PRACTICE

In [None]:
def plot_training_history(history):
    """
    Plot comprehensive training history visualizations
    """
    
    # TODO: Create subplot figure with 2 rows and 3 columns
    # HINT: Use plt.subplots(2, 3, figsize=(18, 10))
    fig, axes = # TODO: Create subplot figure
    fig.suptitle('Enhanced Multimodal Edge AI Model - Training Progress', fontsize=16, fontweight='bold')
    
    # TODO: Create epochs range for x-axis
    # HINT: Use range(1, len(history['train_loss']) + 1)
    epochs = # TODO: Create epochs range
    
    # TODO: Plot 1 - Loss curves
    ax1 = axes[0, 0]
    # TODO: Plot training and validation loss
    # HINT: Use ax1.plot(epochs, history['train_loss'], 'b-', label='Training Loss', linewidth=2)
    # TODO: Plot training loss
    # TODO: Plot validation loss
    ax1.set_title('Model Loss (Enhanced Training)')
    ax1.set_xlabel('Epoch')
    ax1.set_ylabel('CrossEntropy Loss')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # TODO: Plot 2 - Accuracy curves
    ax2 = axes[0, 1]
    # TODO: Plot training and validation accuracy
    # HINT: Use ax2.plot with 'train_acc' and 'val_acc' from history
    # TODO: Plot training accuracy
    # TODO: Plot validation accuracy
    ax2.set_title('Model Accuracy')
    ax2.set_xlabel('Epoch')
    ax2.set_ylabel('Accuracy')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    # TODO: Plot 3 - Learning Rate
    ax3 = axes[0, 2]
    # TODO: Plot learning rate schedule
    # HINT: Use ax3.plot with 'learning_rate' from history and set_yscale('log')
    # TODO: Plot learning rate
    ax3.set_title('Learning Rate Schedule')
    ax3.set_xlabel('Epoch')
    ax3.set_ylabel('Learning Rate')
    # TODO: Set y-axis to log scale
    ax3.legend()
    ax3.grid(True, alpha=0.3)
    
    # TODO: Plot 4 - Precision curves
    ax4 = axes[1, 0]
    # TODO: Plot training and validation precision
    # HINT: Use 'train_precision' and 'val_precision' from history
    # TODO: Plot training precision
    # TODO: Plot validation precision
    ax4.set_title('Model Precision')
    ax4.set_xlabel('Epoch')
    ax4.set_ylabel('Precision')
    ax4.legend()
    ax4.grid(True, alpha=0.3)
    
    # TODO: Plot 5 - Recall curves
    ax5 = axes[1, 1]
    # TODO: Plot training and validation recall
    # HINT: Use 'train_recall' and 'val_recall' from history
    # TODO: Plot training recall
    # TODO: Plot validation recall
    ax5.set_title('Model Recall')
    ax5.set_xlabel('Epoch')
    ax5.set_ylabel('Recall')
    ax5.legend()
    ax5.grid(True, alpha=0.3)
    
    # TODO: Plot 6 - Combined validation metrics
    ax6 = axes[1, 2]
    # TODO: Plot multiple validation metrics on same plot
    # HINT: Plot val_acc, val_precision, val_recall from history
    # TODO: Plot validation accuracy
    # TODO: Plot validation precision
    # TODO: Plot validation recall
    ax6.set_title('Validation Metrics Combined')
    ax6.set_xlabel('Epoch')
    ax6.set_ylabel('Score')
    ax6.legend()
    ax6.grid(True, alpha=0.3)
    
    # TODO: Apply tight layout and show plot
    # HINT: Use plt.tight_layout() and plt.show()
    # TODO: Apply tight layout
    # TODO: Show the plot
    
    # TODO: Calculate training insights
    final_train_acc = # TODO: Get final training accuracy from history
    final_val_acc = # TODO: Get final validation accuracy from history
    # TODO: Find best validation accuracy
    # HINT: Use max(history['val_acc'])
    best_val_acc = # TODO: Get best validation accuracy
    # TODO: Find epoch with best validation accuracy
    # HINT: Use history['val_acc'].index(best_val_acc) + 1
    best_val_epoch = # TODO: Get epoch with best validation accuracy
    # TODO: Calculate overfitting gap
    overfitting = # TODO: Calculate training - validation accuracy gap
    
    print(f"\\nEnhanced Training Insights:")
    print(f"   Final training accuracy: {final_train_acc:.3f}")
    print(f"   Final validation accuracy: {final_val_acc:.3f}")
    print(f"   Best validation accuracy: {best_val_acc:.3f} (epoch {best_val_epoch})")
    print(f"   Overfitting gap: {overfitting:.3f}")
    
    # TODO: Assess performance based on best validation accuracy
    if best_val_acc > 0.85:
        performance_status = # TODO: Set status for excellent performance
    elif best_val_acc > 0.75:
        performance_status = # TODO: Set status for good performance
    elif best_val_acc > 0.65:
        performance_status = # TODO: Set status for acceptable performance
    else:
        performance_status = # TODO: Set status for needs improvement
    
    print(f"   Performance assessment: {performance_status}")
    
    # TODO: Analyze overfitting level
    if overfitting > 0.15:
        print(f"   High overfitting - consider more regularization")
    elif overfitting > 0.05:
        print(f"   Moderate overfitting - model is learning well")
    else:
        print(f"   Low overfitting - excellent generalization")

# TODO: Plot training history
# HINT: Call plot_training_history with training_history
# TODO: Plot the training history

## Step 6: Evaluate Model Performance - HANDS-ON PRACTICE

### Step 6A: Core Evaluation Function

Define the evaluation function with balanced class-friendly parameters.

In [None]:
def evaluate_model_balanced(model, data_splits, temperature=1.0):
    """
    Streamlined evaluation for balanced datasets - no complex thresholding needed!
    
    Parameters:
    - model: Trained model
    - data_splits: Data splits dictionary  
    - temperature: Temperature scaling factor
    
    Returns:
    - Dictionary with evaluation results
    """
    
    print("Evaluating model on BALANCED dataset...")
    # TODO: Set model to evaluation mode
    # HINT: Use model.eval()
    # TODO: Set model mode for evaluation
    
    # TODO: Prepare test data and move to device
    # HINT: Get test data from data_splits and use .to(device)
    X_vehicle_test = # TODO: Get vehicle test data and move to device
    X_network_test = # TODO: Get network test data and move to device
    y_test = # TODO: Get test labels (keep on CPU for metrics)
    
    # TODO: Standard prediction (no gradients needed for evaluation)
    # HINT: Use torch.no_grad() context manager
    with # TODO: Create no_grad context:
        # TODO: Get model outputs
        # HINT: Call model(X_vehicle_test, X_network_test)
        outputs = # TODO: Get model predictions
        
        # TODO: Apply temperature scaling
        # HINT: Divide outputs by temperature
        calibrated_outputs = # TODO: Apply temperature scaling
        
        # TODO: Get class probabilities
        # HINT: Use F.softmax(calibrated_outputs, dim=1)
        y_pred_proba = # TODO: Get prediction probabilities
        
        # TODO: Get predicted classes
        # HINT: Use torch.max(calibrated_outputs, 1) and take second return value
        _, y_pred = # TODO: Get predicted class indices
    
    # TODO: Convert tensors to numpy for sklearn metrics
    # HINT: Use .cpu().numpy() for tensors
    y_pred_proba_np = # TODO: Convert probabilities to numpy
    y_pred_np = # TODO: Convert predictions to numpy
    y_true_np = # TODO: Convert true labels to numpy
    
    # TODO: Calculate evaluation metrics
    # HINT: Use sklearn.metrics functions with average='macro' for multiclass
    test_accuracy = # TODO: Calculate accuracy using accuracy_score
    test_precision = # TODO: Calculate precision using precision_score with average='macro'
    test_recall = # TODO: Calculate recall using recall_score with average='macro'
    # TODO: Calculate test loss
    # HINT: Use F.cross_entropy(calibrated_outputs.cpu(), y_test).item()
    test_loss = # TODO: Calculate cross entropy loss
    
    print(f"\\nBalanced Dataset Results:")
    print(f"   Accuracy: {test_accuracy:.3f}")
    print(f"   Precision: {test_precision:.3f}")  
    print(f"   Recall: {test_recall:.3f}")
    print(f"   Loss: {test_loss:.3f}")
    
    # TODO: Return evaluation results dictionary
    # HINT: Include accuracy, precision, recall, loss, predictions, probabilities, true_labels
    return {
        'accuracy': # TODO: Add accuracy
        'precision': # TODO: Add precision 
        'recall': # TODO: Add recall
        'loss': # TODO: Add loss
        'predictions': # TODO: Add numpy predictions
        'probabilities': # TODO: Add numpy probabilities
        'true_labels': # TODO: Add numpy true labels
    }

print("Streamlined evaluation function defined!")

### Step 6B: Run Evaluation

Execute the streamlined evaluation on our balanced dataset.

In [None]:
# TODO: Run streamlined evaluation
print("DATASET EVALUATION")
print("="*50)

# TODO: Evaluate model on test set
# HINT: Call evaluate_model_balanced with model, data_splits, and temperature=1.0
results = # TODO: Run evaluation and get results

# TODO: Display detailed classification report
# HINT: Define class_names list and use classification_report
class_names = # TODO: Define list of class names ['Normal', 'Physical Anomaly', 'Network Anomaly']
print(f"\\nDetailed Classification Report:")
# TODO: Print classification report
# HINT: Use classification_report(results['true_labels'], results['predictions'], target_names=class_names, digits=3)
print(# TODO: Generate and print classification report)

### Step 6C: Visualization

Create focused visualizations for the balanced dataset results.

In [None]:
# TODO: Create comprehensive visualization and analysis function
def visualize_confusion_matrix_and_metrics(results):
    """
    Create comprehensive visualizations for model evaluation results
    """
    
    # TODO: Create subplot figure with 1 row and 2 columns
    # HINT: Use plt.subplots(1, 2, figsize=(15, 6))
    fig, axes = # TODO: Create subplot figure
    fig.suptitle('Multimodal Edge AI Model - Evaluation Results', fontsize=16, fontweight='bold')
    
    # TODO: Plot 1 - Confusion Matrix
    ax1 = axes[0]
    class_names = ['Normal', 'Physical Anomaly', 'Network Anomaly']
    # TODO: Calculate confusion matrix
    # HINT: Use confusion_matrix(results['true_labels'], results['predictions'])
    cm = # TODO: Calculate confusion matrix
    
    # TODO: Create heatmap for confusion matrix
    # HINT: Use sns.heatmap with cm, annot=True, fmt='d', cmap='Blues'
    # TODO: Create confusion matrix heatmap
    ax1.set_title('Confusion Matrix')
    ax1.set_xlabel('Predicted Label')
    ax1.set_ylabel('True Label')
    
    # TODO: Plot 2 - Per-Class Performance
    ax2 = axes[1]
    # TODO: Calculate per-class precision, recall, and F1
    # HINT: Use precision_score and recall_score with average=None
    per_class_precision = # TODO: Calculate per-class precision
    per_class_recall = # TODO: Calculate per-class recall
    # TODO: Calculate F1 score manually
    # HINT: F1 = 2 * (precision * recall) / (precision + recall + 1e-8)
    per_class_f1 = # TODO: Calculate per-class F1 scores
    
    # TODO: Create bar chart positions
    x_pos = np.arange(len(class_names))
    width = 0.25
    
    # TODO: Create bar charts for metrics
    # HINT: Use ax2.bar with different x positions (x_pos - width, x_pos, x_pos + width)
    # TODO: Plot precision bars
    # TODO: Plot recall bars
    # TODO: Plot F1 bars
    
    ax2.set_xlabel('Classes')
    ax2.set_ylabel('Score')
    ax2.set_title('Per-Class Performance Metrics')
    ax2.set_xticks(x_pos)
    ax2.set_xticklabels(class_names, rotation=45, ha='right')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    ax2.set_ylim(0, 1.0)
    
    # TODO: Apply layout and show plot
    # HINT: Use plt.tight_layout() and plt.show()
    # TODO: Apply tight layout
    # TODO: Show the plot
    
    # TODO: Print detailed per-class analysis
    print(f"\\nPer-Class Performance Analysis:")
    print(f"{'Class':<15} {'Precision':<10} {'Recall':<10} {'F1-Score':<10} {'Support':<10}")
    print("-" * 65)
    
    # TODO: Calculate class support (number of samples per class)
    # HINT: Use np.bincount(results['true_labels'])
    class_support = # TODO: Calculate class support
    
    for i, class_name in enumerate(class_names):
        if i < len(per_class_precision):
            print(f"{class_name:<15} {per_class_precision[i]:<10.3f} {per_class_recall[i]:<10.3f} "
                  f"{per_class_f1[i]:<10.3f} {class_support[i]:<10d}")

# TODO: Create comprehensive evaluation visualizations
# HINT: Call visualize_confusion_matrix_and_metrics with results
# TODO: Visualize evaluation results

# TODO: Advanced Network Anomaly Analysis
print(f"\\n" + "="*70)
print("ADVANCED NETWORK ANOMALY DETECTION ANALYSIS")
print("="*70)

# TODO: Analyze probability distributions for each class
# HINT: Set model to eval mode and use no_grad context
# TODO: Set model to evaluation mode
with # TODO: Create no_grad context:
    # TODO: Get test data and move to device
    X_vehicle_test = # TODO: Get vehicle test data on device
    X_network_test = # TODO: Get network test data on device
    y_test = # TODO: Get test labels
    
    # TODO: Get model outputs and probabilities
    outputs = # TODO: Get model outputs
    # TODO: Calculate softmax probabilities
    # HINT: Use F.softmax(outputs, dim=1)
    y_pred_proba = # TODO: Get prediction probabilities

# TODO: Convert probabilities to numpy
y_pred_proba_np = # TODO: Convert probabilities to numpy
y_true_np = # TODO: Convert true labels to numpy

print(f"\\nProbability Distribution Analysis:")
for class_idx in range(3):
    # TODO: Create mask for current class
    class_mask = # TODO: Create boolean mask for class_idx
    if class_mask.sum() > 0:
        class_name = ['Normal', 'Physical Anomaly', 'Network Anomaly'][class_idx]
        # TODO: Calculate confidence statistics for this class
        # HINT: Get probabilities for samples of this class: y_pred_proba_np[class_mask, class_idx]
        class_confidences = # TODO: Get confidence scores for this class
        print(f"   {class_name}:")
        print(f"      Mean confidence: {class_confidences.mean():.3f}")
        print(f"      Min confidence: {class_confidences.min():.3f}")
        print(f"      Max confidence: {class_confidences.max():.3f}")