# Milestone 3: Model Training and Evaluation with PyTorch Lightning

Welcome to Milestone 3 of LIS 640 – Introduction to Applied Deep Learning. In this milestone, you'll build upon your work from Milestones 1 and 2 by upgrading your neural network baseline to a more robust training framework using PyTorch Lightning and TensorBoard logging. You will also be exploring the advantages of different neural architectures (recurrent and convolutional neural networks) and different optimizers.

## Purpose

The goal of Milestone 3 is to:
- **Explore advanced architectures:** The main goal of Milestone 3 is to strengthen your knowledge about and experience with popular neural architectures including convolutional neural networks (CNNs) and recurrent neural networks (RNNs).
- **Streamline your model development:** Make sure you are working with easy-to-maintain Lightning modules.
- **Enhance experiment tracking:** Integrate TensorBoard to log and visualize training metrics, making it easier to monitor performance and debug issues.
- **Investigate optimizer effects:** Experiment with different optimizers (such as Adam, SGD, and RMSprop) to understand their impact on model training and performance.


## Part 1: Benchmarking Feedforward NN vs. RNN on Sequence Data

In this step, you'll compare the performance of a Recurrent Neural Network (RNN) against a Feedforward Neural Network (FFNN) on a dataset that contains sequential data. **For this exercise, you must use PyTorch Lightning to build your models and manage the training loop, as well as TensorBoard for logging and visualizing your training metrics.**

### A. Choose Your Dataset

- **Option 1:**  
  Use one of the datasets from Milestone 1 **if it contains sequence data**.  
  *For example, if your dataset involves time series, text, or any ordered data, it qualifies for this comparison.* In that case you have already done part B and can skip on to part C.
  

- **Option 2:**  
  If your Milestone 1 dataset does not include sequence data, search online for and download a dataset that features sequential information (e.g., time series forecasting, text classification, sensor data, etc.). Take inspiration from previous milestones on how to do part B (Data Preparation) for your new dataset.



### B. Data Preparation

1. **Create a Custom Dataset Class:**  
   - Implement a PyTorch `Dataset` class that loads your sequence data.
   - Include any necessary preprocessing steps (e.g., normalization, tokenization, padding for sequences).
   - Ensure that your `__getitem__` method returns the data in a format suitable for your models.

2. **Build DataLoaders:**  
   - Use `torch.utils.data.DataLoader` to create train, validation, and test loaders.
   - Choose appropriate batch sizes and shuffling to ensure effective training.

### C. Model Implementation with PyTorch Lightning

*Reuse implementations from Milestone 2 if that makes sense. The key difference now is that you should implement your models as PyTorch Lightning modules to take advantage of the built-in training loop and logging features.*

1. **Feedforward Neural Network (FFNN):**  
   - Implement a baseline feedforward network that treats the sequence data as independent features (e.g., by flattening the sequence).
   - Keep the architecture simple to establish a baseline for comparison.

2. **Recurrent Neural Network (RNN):**  
   - Implement an RNN model (using LSTM or GRU) to handle the sequential nature of the data.
   - Ensure that your model processes the sequence appropriately (e.g., using the final hidden state or an attention mechanism for prediction).

*Remember to use the PyTorch Lightning `Trainer` for model training, and configure the module to log metrics to TensorBoard.*

### D. Benchmarking and Evaluation

1. **Training Both Models:**  
   - Train both the FFNN and the RNN on your chosen dataset using similar training settings (e.g., number of epochs, learning rate, optimizer) to ensure a fair comparison.
   - Use PyTorch Lightning’s `Trainer` to manage the training process.

2. **Logging and Evaluation Metrics:**  
   - Leverage TensorBoard logging to visualize training and validation metrics in real-time.
   - Compare the performance of both models using metrics such as loss, accuracy, or any task-specific metric.
   - Optionally, record additional statistics like training time or convergence behavior.

3. **Document Your Findings:**  
   - Summarize the dataset and preprocessing steps.
   - Describe the architectures used for the FFNN and RNN.
   - Provide a comparative analysis discussing which model performed better and why that might be the case.
   - Include TensorBoard screenshots or logged results to support your analysis.

In [2]:
import os
import torch
import cv2
import numpy as np
import random
import time
import copy
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
from PIL import Image

# ----- Settings -----
resize_height, resize_width = 256, 512
seq_len = 5  # Number of consecutive frames per sequence

# ----- Data Transforms -----
data_transforms = {
    'train': transforms.Compose([
        transforms.Resize((resize_height, resize_width)),
        transforms.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.1),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize((resize_height, resize_width)),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

class Rescale():
    def __init__(self, output_size):
        assert isinstance(output_size, tuple)
        self.output_size = output_size

    def __call__(self, sample):
        return cv2.resize(sample, dsize=self.output_size, interpolation=cv2.INTER_NEAREST)

target_transforms = transforms.Compose([
    Rescale((resize_width, resize_height)),
])

# ----- Dataset: Grouping Images into Sequences -----
class SequenceTusimpleData(Dataset):
    """
    This dataset groups consecutive images (and their binary masks)
    into sequences. The label is taken from the last frame in each sequence.
    """
    def __init__(self, dataset, seq_len=5, n_labels=3, transform=None, target_transform=None, training=True, optuna=False):
        self.seq_len = seq_len
        self.transform = transform
        self.target_transform = target_transform
        self.n_labels = n_labels
        self._gt_img_list = []
        self._gt_label_binary_list = []
        
        with open(dataset, 'r') as file:
            for line in file:
                info_tmp = line.strip().split()
                self._gt_img_list.append(info_tmp[0])
                self._gt_label_binary_list.append(info_tmp[1])
                
        # Sort the lists to (hopefully) preserve temporal order.
        self._gt_img_list, self._gt_label_binary_list = zip(*sorted(zip(self._gt_img_list, self._gt_label_binary_list)))
        
        # Optionally reduce dataset size for training.
        purger = 0.2
        if optuna:
            purger = 0.01
        if purger < 1.0 and training:
            total_size = len(self._gt_img_list)
            subset_size = int(total_size * purger)
            self._gt_img_list = self._gt_img_list[:subset_size]
            self._gt_label_binary_list = self._gt_label_binary_list[:subset_size]

    def __len__(self):
        # Number of sequences = total images - seq_len + 1
        return len(self._gt_img_list) - self.seq_len + 1

    def __getitem__(self, idx):
        # Build a sequence of images from idx to idx+seq_len.
        imgs = []
        for i in range(self.seq_len):
            img = Image.open(self._gt_img_list[idx + i])
            if self.transform:
                img = self.transform(img)
            imgs.append(img)
        # Stack into a tensor with shape (seq_len, channels, H, W)
        imgs = torch.stack(imgs, dim=0)
        
        # Use the binary mask from the last frame as the target.
        label_img = cv2.imread(self._gt_label_binary_list[idx + self.seq_len - 1], cv2.IMREAD_COLOR)
        if self.target_transform:
            label_img = self.target_transform(label_img)
        # Convert to binary mask (1 for non-black, 0 for black)
        label_binary = np.zeros([label_img.shape[0], label_img.shape[1]], dtype=np.uint8)
        mask = np.where((label_img != [0, 0, 0]).all(axis=2))
        label_binary[mask] = 1
        label_binary = torch.tensor(label_binary, dtype=torch.long)
        
        return imgs, label_binary

# File paths for dataset text files.
train_dataset_file = 'archive/TUSimple/train_set/training/train.txt'
val_dataset_file = 'archive/TUSimple/train_set/training/val.txt'

train_dataset = SequenceTusimpleData(train_dataset_file, seq_len=seq_len,
                                     transform=data_transforms['train'],
                                     target_transform=target_transforms, training=True)
val_dataset = SequenceTusimpleData(val_dataset_file, seq_len=seq_len,
                                   transform=data_transforms['val'],
                                   target_transform=target_transforms, training=False)

train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=16, shuffle=False)

dataloaders = {'train': train_loader, 'val': val_loader}
dataset_sizes = {'train': len(train_dataset), 'val': len(val_dataset)}

# ----- Model: RNN-based Lane Segmentation -----
class LaneLinesRNN(nn.Module):
    def __init__(self, hidden_dim=1024, seq_len=5):
        super(LaneLinesRNN, self).__init__()
        self.seq_len = seq_len
        # CNN encoder to extract features per frame.
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=2, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1)
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, stride=2, padding=1)
        self.relu = nn.ReLU()
        # Pooling to reduce spatial dimensions.
        self.pool = nn.AdaptiveAvgPool2d((8, 16))  # Output: (batch, 128, 8, 16)
        
        # Flattened feature dimension.
        self.feature_dim = 128 * 8 * 16  # 16384
        
        # LSTM to process sequence of features.
        self.lstm = nn.LSTM(input_size=self.feature_dim, hidden_size=hidden_dim, num_layers=1, batch_first=False)
        
        # Map LSTM output back to CNN feature space.
        self.fc = nn.Linear(hidden_dim, self.feature_dim)
        
        # Decoder: Upsample back to segmentation mask.
        self.deconv1 = nn.ConvTranspose2d(128, 64, kernel_size=3, stride=2, padding=1, output_padding=1)
        self.deconv2 = nn.ConvTranspose2d(64, 32, kernel_size=3, stride=2, padding=1, output_padding=1)
        self.deconv3 = nn.ConvTranspose2d(32, 2, kernel_size=3, stride=2, padding=1, output_padding=1)
        self.dropout = nn.Dropout(0.2)
        
    def forward(self, x_seq):
        # x_seq shape: (batch, seq_len, channels, H, W)
        # Permute to (seq_len, batch, channels, H, W) for LSTM processing.
        x_seq = x_seq.permute(1, 0, 2, 3, 4)
        seq_len, batch_size, C, H, W = x_seq.size()
        
        encoded_features = []
        # Process each frame through the CNN encoder.
        for t in range(seq_len):
            x = x_seq[t]  # (batch, C, H, W)
            x = self.relu(self.conv1(x))
            x = self.relu(self.conv2(x))
            x = self.relu(self.conv3(x))
            x = self.pool(x)  # (batch, 128, 8, 16)
            x = x.view(batch_size, -1)  # Flatten to (batch, feature_dim)
            encoded_features.append(x)
        
        features_seq = torch.stack(encoded_features, dim=0)  # (seq_len, batch, feature_dim)
        
        # Process the sequence with LSTM.
        lstm_out, (h_n, c_n) = self.lstm(features_seq)
        # Use the last hidden state.
        last_hidden = h_n[0]  # (batch, hidden_dim)
        
        # Map back to feature space.
        fc_out = self.fc(last_hidden)  # (batch, feature_dim)
        decoder_input = fc_out.view(batch_size, 128, 8, 16)
        
        # Decode to segmentation mask.
        x = self.relu(self.deconv1(decoder_input))
        x = self.relu(self.deconv2(x))
        x = self.deconv3(x)  # (batch, 2, H, W)
        binary_pred = torch.argmax(x, dim=1, keepdim=True)
        return {"binary_seg_logits": x, "binary_seg_pred": binary_pred}

# ----- Loss Function & Training/Validation Loops -----
def compute_loss(net_output, binary_label):
    k_binary = 10
    loss_fn = nn.CrossEntropyLoss()
    binary_seg_logits = net_output["binary_seg_logits"]
    binary_loss = loss_fn(binary_seg_logits, binary_label)
    binary_loss *= k_binary
    total_loss = binary_loss
    out = net_output["binary_seg_pred"]
    return total_loss, binary_loss, out

def train_loop(model, dataloader, optimizer, scheduler, device):
    model.train()
    running_loss = 0.0
    running_loss_b = 0.0

    for inputs, binarys in dataloader:
        # inputs: (batch, seq_len, channels, H, W)
        inputs = inputs.float().to(device)
        binarys = binarys.long().to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        total_loss, binary_loss, _ = compute_loss(outputs, binarys)
        total_loss.backward()
        optimizer.step()

        batch_size = inputs.size(0)
        running_loss += total_loss.item() * batch_size
        running_loss_b += binary_loss.item() * batch_size

    if scheduler is not None:
        scheduler.step()

    return running_loss, running_loss_b

def test_loop(model, dataloader, device):
    model.eval()  
    running_loss = 0.0
    running_loss_b = 0.0

    with torch.no_grad():
        for inputs, binarys in dataloader:
            inputs = inputs.float().to(device)
            binarys = binarys.long().to(device)

            outputs = model(inputs)
            total_loss, binary_loss, _ = compute_loss(outputs, binarys)
            batch_size = inputs.size(0)
            running_loss += total_loss.item() * batch_size
            running_loss_b += binary_loss.item() * batch_size

    return running_loss, running_loss_b

# ----- Training Loop -----
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = LaneLinesRNN(hidden_dim=1024, seq_len=seq_len).to(DEVICE)
optimizer = optim.Adam(model.parameters(), lr=0.001)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)

num_epochs = 100
best_model_wts = copy.deepcopy(model.state_dict())
best_loss = float("inf")
losses = {}

for epoch in range(num_epochs):
    print(f"Epoch {epoch + 1}/{num_epochs}")
    train_loss, train_loss_b = train_loop(model, dataloaders['train'], optimizer, scheduler, DEVICE)
    print(f"Training Loss: {train_loss:.4f} | Binary Loss: {train_loss_b:.4f}")
    val_loss, val_loss_b = test_loop(model, dataloaders['val'], DEVICE)
    print(f"Validation Loss: {val_loss:.4f} | Binary Loss: {val_loss_b:.4f}")

    losses[epoch] = val_loss

    if val_loss < best_loss:
        best_loss = val_loss
        best_model_wts = copy.deepcopy(model.state_dict())
        torch.save(best_model_wts, "best_model.pth")

model.load_state_dict(best_model_wts)

# ----- Testing Function -----
def load_test_data(img_path, transform):
    img = Image.open(img_path)
    img = transform(img)
    return img

def test():
    # Create output directory if needed.
    if not os.path.exists('test_output'):
        os.mkdir('test_output')
    
    # For demonstration, we load a single image and duplicate it to form a sequence.
    img_path = '0001.png'
    data_transform = transforms.Compose([
        transforms.Resize((resize_height, resize_width)),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])
    
    # Load and replicate the test image to create a sequence.
    img = load_test_data(img_path, data_transform)
    # Create a sequence: shape (seq_len, channels, H, W)
    img_seq = torch.stack([img for _ in range(seq_len)], dim=0)
    # Add batch dimension: (1, seq_len, channels, H, W)
    img_seq = torch.unsqueeze(img_seq, dim=0)
    
    # Load best model.
    model = LaneLinesRNN(hidden_dim=1024, seq_len=seq_len)
    state_dict = torch.load("best_model.pth", map_location=DEVICE)
    model.load_state_dict(state_dict)
    model.eval()
    model.to(DEVICE)
    
    with torch.no_grad():
        outputs = model(img_seq.to(DEVICE))
    
    # Process output: overlay prediction on the original image.
    input_img = Image.open(img_path).resize((resize_width, resize_height))
    input_img_np = np.array(input_img)
    binary_pred = outputs['binary_seg_pred']
    binary_pred_np = binary_pred.detach().cpu().numpy()
    
    overlay = input_img_np.copy()
    # Overlay in red where prediction is positive.
    overlay[binary_pred_np[0, 0, :, :] > 0] = [0, 0, 255]
    cv2.imwrite(os.path.join('test_output', 'input_with_prediction_overlay.jpg'), overlay)
    print("Test output saved to 'test_output/input_with_prediction_overlay.jpg'.")

# ----- Run Testing -----
test()


OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 11.75 GiB total capacity; 945.97 MiB already allocated; 91.81 MiB free; 1.02 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

## Part 2: Benchmarking Feedforward NN vs. CNN on Image Data

In this step, you'll compare the performance of a Convolutional Neural Network (CNN) against a Feedforward Neural Network (FFNN) on an image-based dataset. **For this exercise, you must use PyTorch Lightning to implement your models and manage training, and use TensorBoard for logging and visualizing your training metrics.**

### A. Choose Your Dataset

- **Option 1:**  
  Use one of the datasets from Milestone 1 **if it contains image data**.  
  *For example, if your dataset involves images for classification, segmentation, or any visual task, it qualifies for this comparison.*

- **Option 2:**  
  If your Milestone 1 dataset does not include image data, search online for and download an image dataset (e.g., Fashion MNIST, CIFAR-10, or any domain-specific image dataset).

### B. Data Preparation

1. **Create a Custom Dataset Class:**  
   - Implement a PyTorch `Dataset` class that loads your image data.
   - Include any necessary preprocessing steps (e.g., normalization, resizing, data augmentation).
   - Ensure that your `__getitem__` method returns the data in a format suitable for your models.

2. **Build DataLoaders:**  
   - Use `torch.utils.data.DataLoader` to create train, validation, and test loaders.
   - Choose appropriate batch sizes and apply shuffling to ensure effective training.

### C. Model Implementation with PyTorch Lightning

*Reuse or adapt implementations from Milestone 2 as needed. The key requirement is to implement your models as PyTorch Lightning modules to take advantage of the built-in training loop and logging features.*

1. **Feedforward Neural Network (FFNN):**  
   - Implement a baseline FFNN that treats image data as a flat vector (i.e., by flattening the image).
   - Keep the architecture simple to serve as a baseline for comparison.

2. **Convolutional Neural Network (CNN):**  
   - Implement a CNN architecture that leverages convolutional layers to capture spatial hierarchies in the image data.
   - Typical layers might include convolution, activation (ReLU), pooling, and fully connected layers.
   - Ensure that your model architecture is designed to process image data effectively.

*Remember to use the PyTorch Lightning `Trainer` for training and to configure your Lightning module to log metrics to TensorBoard.*

### D. Benchmarking and Evaluation

1. **Training Both Models:**  
   - Train both the FFNN and the CNN on your chosen dataset using similar training settings (e.g., number of epochs, learning rate, optimizer) to ensure a fair comparison.
   - Use PyTorch Lightning’s `Trainer` to manage the training process.

2. **Logging and Evaluation Metrics:**  
   - Leverage TensorBoard to log and visualize training and validation metrics in real-time.
   - Compare the performance of both models using metrics such as loss, accuracy, or any task-specific evaluation metric.
   - Optionally, record additional details like training time and convergence behavior.

3. **Document Your Findings:**  
   - Summarize the dataset and preprocessing steps.
   - Describe the architectures used for both the FFNN and the CNN.
   - Provide a comparative analysis discussing which model performed better and why, supported by TensorBoard screenshots or logged results.

## Part 3: Comparing Optimizers and Analyzing Training Curves

In this step, you'll experiment with different optimizers—SGD, Adam, and RMSProp—to understand how they affect model performance. You will compare their effects using evaluation metrics on held-out test data and analyze the training and validation curves logged in TensorBoard.

### A. Experiment Setup

1. **Maintain Consistent Training Settings:**  
   - Use the same model architecture (whether FFNN, CNN, or RNN from Parts 1 and 2) and dataset for all experiments.
   - Ensure that the number of epochs, batch size, learning rate, and other hyperparameters are kept constant across different optimizer runs, aside from the optimizer itself.

2. **Implement Optimizer Switching:**  
   - Modify the `configure_optimizers` method in your PyTorch Lightning module to easily switch between optimizers:
     ```python
     def configure_optimizers(self):
         # Uncomment the optimizer you want to use
         # return torch.optim.SGD(self.parameters(), lr=0.01)
         # return torch.optim.Adam(self.parameters(), lr=1e-3)
         # return torch.optim.RMSprop(self.parameters(), lr=1e-3)
     ```
   - Train your model separately with each optimizer.

### B. Evaluation Metrics and Analysis

1. **Held-Out Test Evaluation:**  
   - After training, evaluate each model on a held-out test set.
   - Record quantitative metrics such as loss, accuracy, or any other relevant task-specific metric for each optimizer.

2. **TensorBoard Analysis:**  
   - Use TensorBoard to review the training and validation curves during training.
   - Focus on:
     - **Convergence Behavior:** How quickly does each optimizer reduce the loss?
     - **Stability:** Are there noticeable fluctuations or instability in the curves?
     - **Overfitting/Underfitting:** Do you observe signs of overfitting or underfitting, and how do these behaviors differ across optimizers?

### C. Document Your Findings

- **Summarize Performance:**  
  - Create a table or a brief report comparing the evaluation metrics for SGD, Adam, and RMSProp.
- **Include Visual Evidence:**  
  - Attach TensorBoard screenshots or summaries of the logged training/validation curves.
- **Provide a Comparative Analysis:**  
  - Discuss which optimizer provided the best performance on the test set.
  - Reflect on the convergence rates and stability differences you observed.
  - Explain potential reasons for these differences based on your results.

By the end of this exercise, you will have a deeper understanding of how different optimizers affect model training dynamics and performance. This insight is essential for making informed decisions when tuning models in future projects.

## Submission Instructions

**What to Submit:**

1. Your complete iPython notebook for Milestone 3 (including all code, outputs, and markdown explanations).
2. A single PDF file that contains your entire report for the milestone, covering:
   - Part 1: Benchmarking FFNN vs. RNN on sequence data.
   - Part 2: (Any additional tasks, if applicable.)
   - Part 3: Comparing optimizers and analyzing training curves.

**How to Submit:**

- Upload both your iPython notebook and the PDF report to Canvas.
- Name your files clearly, for example:
  - `YourName_Milestone3.ipynb`
  - `YourName_Milestone3_Report.pdf`

**Deadline:**

- All submissions are due **4/18/21**.

Happy Deep Learning!