## Milestone 2: Neural Network Baseline and Hyperparameter Optimization

LIS 640 - Introduction to Applied Deep Learning

Due 3/7/25

## **Overview**
In Milestone 1 you have:
1. **Defined a deep learning problem** where AI can make a meaningful impact.
2. **Identified three datasets** that fit your topic and justified their relevance.
3. **Explored and visualized** the datasets to understand their structure.
4. **Implemented a PyTorch Dataset class** to prepare data for deep learning.

In Milestone 2 we will take the next step and implement a neural network baseline based on what we have learned in class! For this milestone, please use one of the datasets you picked in the last milestone. If you pick a new one, make sure to do Steps 2 - 4 again. 


## **Step 1: Define Your Deep Learning Problem**

The first step is to be clear about what you want your model to predict. Is your goal a classification or a regression task? what are the input features and what are you prediction targets y? Make sure that you have a sensible choice of features and a sensible choice of prediction targets y in your dataloader.

**Write down one paragraph of justification for how you set up your DataLoader below. If it makes sense to change the DataLoader from Milestone 1, describe what you changed and why:** 
We decided to completely switch our dataset to the TuSimple lane lines dataset. The reason for this is because we found that our roboflow datasets we were using before were not uniform at all in the labelling, while the TuSimple dataset provided 10000+ images with consistent labelling. So our DataLoader consists of the TuSimple dataset, where the labels are a binary mask (white or black) over an image indicating where the lane lines are. We decrease the size of each image heavily down to 512x256 to make the dataset more manageable and also implemented a amount of dataset used percentage to control how much of the 10000 images we actually use since training on all 10000 images is very time consuming. To get the TuSimple dataset formatted and set up into a DataLoader, we took inspiration from this project we found (https://github.com/IrohXu/lanenet-lane-detection-pytorch) and learned how they modified the TuSimple dataset and what transformations they applied to make the data better for training. They also implemented data shuffling so that each epoch sees the data in a different order, which we thought would be a good idea so we implemented that into our DataLoader as well. We then created a dictionary with our train and test DataLoader for ease of accessing while training and testing the model.

## **Step 2: Train a Neural Network in PyTorch**

We learned in class how to implement and train a feed forward neural network in pytorch. You can find reference implementations [here](https://github.com/mariru/Intro2ADL/blob/main/Week5/Week5_Lab_Example.ipynb) and [here](https://www.kaggle.com/code/girlboss/mmlm2025-pytorch-lb-0-00000). Tip: Try to implement the neural network by yourself from scratch before looking at the reference.


In [None]:
# imports
import os
import torch
import cv2
import pandas as pd
import numpy as np
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
import torch.optim as optim
from torch.autograd import Variable
from torchvision import models
import random
from torch.optim import lr_scheduler
import time
import copy
import torch.nn as nn
from PIL import Image
import os
import sys
import torch
import numpy as np
from torchvision import transforms
from PIL import Image
import cv2


# define dataloaders: make sure to have a train, validation and a test loader
train_dataset_file = 'archive/TUSimple/train_set/training/train.txt'
val_dataset_file = 'archive/TUSimple/train_set/training/val.txt'

resize_height, resize_width = 256, 512

class Rescale():
    def __init__(self, output_size):
        assert isinstance(output_size, (tuple))
        self.output_size = output_size

    def __call__(self, sample):
        sample = cv2.resize(sample, dsize=self.output_size, interpolation=cv2.INTER_NEAREST)
        return sample

class TusimpleData(Dataset):
    def __init__(self, dataset, n_labels=3, transform=None, target_transform=None, training=True, optuna=False):
        self._gt_img_list = []
        self._gt_label_binary_list = []
        self.transform = transform
        self.target_transform = target_transform
        self.n_labels = n_labels

        with open(dataset, 'r') as file:
            for _info in file:
                info_tmp = _info.strip(' ').split()

                self._gt_img_list.append(info_tmp[0])
                self._gt_label_binary_list.append(info_tmp[1])

        self._shuffle()

        # DECREASE AMOUNT OF DATA
        purger = 0.2
        if optuna:
            purger = 0.01
        if purger < 1.0 and training:
            total_size = len(self._gt_img_list)
            subset_size = int(total_size * purger)
            self._gt_img_list = self._gt_img_list[:subset_size]
            self._gt_label_binary_list = self._gt_label_binary_list[:subset_size]

    def _shuffle(self):
        c = list(zip(self._gt_img_list, self._gt_label_binary_list))
        random.shuffle(c)
        self._gt_img_list, self._gt_label_binary_list = zip(*c)

    def __len__(self):
        return len(self._gt_img_list)

    def __getitem__(self, idx):
        img = Image.open(self._gt_img_list[idx])
        label_img = cv2.imread(self._gt_label_binary_list[idx], cv2.IMREAD_COLOR)
        if self.transform:
            img = self.transform(img)
        if self.target_transform:
            label_img = self.target_transform(label_img)
        label_binary = np.zeros([label_img.shape[0], label_img.shape[1]], dtype=np.uint8)
        mask = np.where((label_img[:, :, :] != [0, 0, 0]).all(axis=2))
        label_binary[mask] = 1
        return img, label_binary

data_transforms = {
    'train': transforms.Compose([
        transforms.Resize((resize_height, resize_width)),
        transforms.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.1),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize((resize_height, resize_width)),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

target_transforms = transforms.Compose([
    Rescale((resize_width, resize_height)),
])

train_dataset = TusimpleData(train_dataset_file, transform=data_transforms['train'], target_transform=target_transforms, training=True)
val_dataset = TusimpleData(val_dataset_file, transform=data_transforms['val'], target_transform=target_transforms, training=False)
train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=4, shuffle=True)

dataloaders = {
    'train': train_loader,
    'val': val_loader
}
dataset_sizes = {
    'train': len(train_loader.dataset),
    'val': len(val_loader.dataset)
}

# define the model
class LaneLines(nn.Module):
    def __init__(self):
        super(LaneLines, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=2, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1)
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, stride=2, padding=1)
        self.relu = nn.ReLU()
        self.deconv1_binary = nn.ConvTranspose2d(128, 64, kernel_size=3, stride=2, padding=1, output_padding=1)
        self.deconv2_binary = nn.ConvTranspose2d(64, 32, kernel_size=3, stride=2, padding=1, output_padding=1)
        self.deconv3_binary = nn.ConvTranspose2d(32, 2, kernel_size=3, stride=2, padding=1, output_padding=1)
        self.dropout = nn.Dropout(0.2)

    def forward(self, x):
        x = self.relu(self.conv1(x))
        x = self.relu(self.conv2(x))
        x = self.relu(self.conv3(x))
        binary = self.relu(self.deconv1_binary(x))
        binary = self.relu(self.deconv2_binary(binary))
        binary = self.deconv3_binary(binary)
        binary_pred = torch.argmax(binary, dim=1, keepdim=True)
        return {
            "binary_seg_logits": binary,
            "binary_seg_pred": binary_pred
        }


# define the loss function and the optimizer
def compute_loss(net_output, binary_label):
    k_binary = 10
    loss_fn = nn.CrossEntropyLoss()
    binary_seg_logits = net_output["binary_seg_logits"]
    binary_loss = loss_fn(binary_seg_logits, binary_label)
    binary_loss = binary_loss * k_binary
    total_loss = binary_loss
    out = net_output["binary_seg_pred"]
    return total_loss, binary_loss, out

def train_loop(model, dataloader, optimizer, scheduler, device):
    model.train()
    running_loss = 0.0
    running_loss_b = 0.0

    for inputs, binarys in dataloader:
        inputs = inputs.float().to(device)
        binarys = binarys.long().to(device)
        optimizer.zero_grad()
        with torch.set_grad_enabled(True): 
            outputs = model(inputs)
            total_loss, binary_loss, out = compute_loss(outputs, binarys)
            total_loss.backward()
            optimizer.step()

        batch_size = inputs.size(0)
        running_loss += total_loss.item() * batch_size
        running_loss_b += binary_loss.item() * batch_size

    if scheduler is not None:
        scheduler.step()

    return running_loss, running_loss_b

def test_loop(model, dataloader, device):
    model.eval()  
    running_loss = 0.0
    running_loss_b = 0.0

    with torch.no_grad():
        for inputs, binarys in dataloader:
            inputs = inputs.float().to(device)
            binarys = binarys.long().to(device)

            outputs = model(inputs)
            total_loss, binary_loss, out = compute_loss(outputs, binarys)
            batch_size = inputs.size(0)
            running_loss += total_loss.item() * batch_size
            running_loss_b += binary_loss.item() * batch_size

    return running_loss, running_loss_b


# Commnted out training loop as it takes too long to run and repeating this in the last step

# train the model
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = LaneLines().to(DEVICE)
optimizer = optim.Adam(model.parameters(), lr=0.001)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)
train_dataloader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=32, shuffle=False)
num_epochs = 100
best_model_wts = copy.deepcopy(model.state_dict())
best_loss = float("inf")
losses = {}

for epoch in range(num_epochs):
    print(f"Epoch {epoch + 1}/{num_epochs}")
    train_loss, train_loss_b = train_loop(model, train_dataloader, optimizer, scheduler, DEVICE)
    print(f"Training Loss: {train_loss} | Binary Loss: {train_loss_b}")
    val_loss, val_loss_b = test_loop(model, val_dataloader, DEVICE)
    print(f"Validation Loss: {val_loss:.4f} | Binary Loss: {val_loss_b:.4f}")

    losses[epoch] = val_loss

    if val_loss < best_loss:
        best_loss = val_loss
        best_model_wts = copy.deepcopy(model.state_dict())
        torch.save(best_model_wts, "best_model.pth")

model.load_state_dict(best_model_wts)

# test the model
DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

def load_test_data(img_path, transform):
    img = Image.open(img_path)
    img = transform(img)
    return img

def test():
    if not os.path.exists('test_output'):
        os.mkdir('test_output')
    img_path = '0001.png'
    resize_height, resize_width = 256, 512
    model_path = 'best_model.pth'
    data_transform = transforms.Compose([
        transforms.Resize((resize_height, resize_width)),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])
    
    model = LaneLines()
    state_dict = torch.load(model_path)
    model.load_state_dict(state_dict)
    model.eval()
    model.to(DEVICE)
    inp = load_test_data(img_path, data_transform).to(DEVICE)
    inp = torch.unsqueeze(inp, dim=0)
    with torch.no_grad():
        outputs = model(inp)
    input_img = Image.open(img_path)
    input_img = input_img.resize((resize_width, resize_height))
    input_img = np.array(input_img)
    binary_pred = outputs['binary_seg_pred']
    binary_pred_np = binary_pred.detach().cpu().numpy()
    overlay = input_img.copy()
    overlay[binary_pred_np[0, 0, :, :] > 0] = [0, 0, 255]
    cv2.imwrite(os.path.join('test_output', 'input_with_prediction_overlay.jpg'), overlay)

## **Step 2 continued: Try Stuff**

Use your code above to try different architectures. Make sure to use early stopping! Try adding Dropout and BatchNorm, try different learning rates. How do they affect training and validation performance? 

 **Summarize your observations in a paragraph below:**
 Since we have a large dataset, we decided to try this with only 10% of the data (1000 images). We found that deeper networks made training way more time consuming without giving us much benefit in performance of the final model, and we also found that Dropout did help with overfitting and slightly improved inference of our model, but only when we trained the model on a larger portion of the dataset, so we added Dropout. We also found that BatchNorm did not make a big difference. We found that the best learning rate was in the range of 0.001 to 0.004. We also implemented early stopping by keeping track of each epoch's loss on validation and stopping if the loss is worse than the previous 5 epochs. We hope to use the extension to mess around with different things more, since our biggest limitation has been time to train.


## **Step 3: Hyperparameter Optimization with Optuna**

As you can see, hyperparameter optimization can be tedious. In class we used [optuna](https://optuna.org/#code_examples) to automate the process. Your next task is to wrap your code from Step 2 into an objective which you can then optimize with optuna. Under the [code exaples](https://optuna.org/#code_examples) there is a tab *PyTorch* which should be helpful as it provides a minimal example on how to wrap PyTorch code inside an objective.

**Important: Make sure the model is evaluated on a validation set, not the training data!!**


In [2]:
import optuna
import torch
import torch.optim as optim
import torch.nn as nn
import copy
from torch.utils.data import DataLoader
import optuna.exceptions

train_dataset = TusimpleData(train_dataset_file, transform=data_transforms['train'], target_transform=target_transforms, training=True, optuna=True)
val_dataset = TusimpleData(val_dataset_file, transform=data_transforms['val'], target_transform=target_transforms, training=False)

# Define an objective function to be maximized.
def objective(trial):
    lr = trial.suggest_loguniform('lr', 1e-5, 1e-2)
    batch_size = trial.suggest_categorical('batch_size', [16, 32, 64, 128])
    optimizer_name = trial.suggest_categorical('optimizer', ['Adam', 'SGD'])

    train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    val_dataloader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = LaneLines().to(device)

    if optimizer_name == 'Adam':
        optimizer = optim.Adam(model.parameters(), lr=lr)
    else:
        optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.9)

    scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)

    # loss_fn = nn.CrossEntropyLoss() 

    num_epochs = 15

    best_model_wts = copy.deepcopy(model.state_dict())
    best_loss = float("inf")

    for epoch in range(num_epochs):
        print(f"Epoch {epoch + 1}/{num_epochs}")

        train_loss, train_loss_b = train_loop(model, train_dataloader, optimizer, scheduler, device)
        print(f"Training Loss: {train_loss:.4f} | Binary Loss: {train_loss_b:.4f}")

        val_loss, val_loss_b = test_loop(model, val_dataloader, device)
        print(f"Validation Loss: {val_loss:.4f} | Binary Loss: {val_loss_b:.4f}")

        trial.report(val_loss, epoch)

        if trial.should_prune():
            raise optuna.exceptions.TrialPruned()

        if val_loss < best_loss:
            best_loss = val_loss
            best_model_wts = copy.deepcopy(model.state_dict())

        scheduler.step()

    model.load_state_dict(best_model_wts)

    return best_loss

# Create a study object
study = optuna.create_study(direction="minimize", pruner=optuna.pruners.MedianPruner(n_startup_trials=2, n_warmup_steps=3))

# Optimize the objective function
study.optimize(objective, n_trials=20)

# Print out the best parameters
print("Best hyperparameters:", study.best_params)

  from .autonotebook import tqdm as notebook_tqdm
[I 2025-04-04 10:31:01,595] A new study created in memory with name: no-name-0d616d61-d37e-4f4a-839e-b2af6c9d7c85


  lr = trial.suggest_loguniform('lr', 1e-5, 1e-2)


Epoch 1/15
Training Loss: 853.4618 | Binary Loss: 853.4618
Validation Loss: 6357.5110 | Binary Loss: 6357.5110
Epoch 2/15


[W 2025-04-04 10:31:27,755] Trial 0 failed with parameters: {'lr': 0.00513443621337721, 'batch_size': 64, 'optimizer': 'SGD'} because of the following error: KeyboardInterrupt().
Traceback (most recent call last):
  File "/home/sriram/anaconda3/envs/yolov10/lib/python3.9/site-packages/optuna/study/_optimize.py", line 197, in _run_trial
    value_or_values = func(trial)
  File "/tmp/ipykernel_681933/3962289433.py", line 41, in objective
    train_loss, train_loss_b = train_loop(model, train_dataloader, optimizer, scheduler, device)
  File "/tmp/ipykernel_681933/1764026698.py", line 164, in train_loop
    for inputs, binarys in dataloader:
  File "/home/sriram/anaconda3/envs/yolov10/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 633, in __next__
    data = self._next_data()
  File "/home/sriram/anaconda3/envs/yolov10/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 677, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIterat

KeyboardInterrupt: 

## **Step 3 continued: Insights**

The hyperparameter search was somewhat helpful, but we think we weren’t able to harness a lot of the potential benefits due to the significant training times when running the Optuna trials – even with only using a small percentage (5-10%) of the data. While increasing the number of trials did yield hyperparameters that improved results, we could not increase the number of trials by a significant amount. Additionally, we found the pruning feature on the Optuna website and found it to be useful in cutting off less promising trials early, which allowed us to bump up the number of trials. We plan to look into potentially using cloud compute so we can run more, longer trials for the hyperparameter search.

## **Step 4: Final Training**

Now that you have found a good hyperparameter setting the validation set is no longer needed. The last step is to combine the training and validation set into a combined training set and retrain the model under the best parameter setting found. Report your final loss on your test data.

In [None]:
optuna_lr = study.best_params['lr']
optuna_batch_size = study.best_params['batch_size']
optuna_optimizer = study.best_params['optimizer']

In [None]:
# train the model
train_dataset = TusimpleData(train_dataset_file, transform=data_transforms['train'], target_transform=target_transforms, training=True)
val_dataset = TusimpleData(val_dataset_file, transform=data_transforms['val'], target_transform=target_transforms, training=False)


DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = LaneLines().to(DEVICE)

if optuna_optimizer == 'SGD':
    optimizer = optim.SGD(model.parameters(), lr=optuna_lr, momentum=0.9)
else:
    optimizer = optim.Adam(model.parameters(), lr=optuna_lr)

scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)
train_dataloader = DataLoader(train_dataset, batch_size=optuna_batch_size, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=optuna_batch_size, shuffle=False)
num_epochs = 100
best_model_wts = copy.deepcopy(model.state_dict())
best_loss = float("inf")
losses = {}

for epoch in range(num_epochs):
    print(f"Epoch {epoch + 1}/{num_epochs}")
    train_loss, train_loss_b = train_loop(model, train_dataloader, optimizer, scheduler, DEVICE)
    print(f"Training Loss: {train_loss} | Binary Loss: {train_loss_b}")
    val_loss, val_loss_b = test_loop(model, val_dataloader, DEVICE)
    print(f"Validation Loss: {val_loss:.4f} | Binary Loss: {val_loss_b:.4f}")

    losses[epoch] = val_loss

    if val_loss < best_loss:
        best_loss = val_loss
        best_model_wts = copy.deepcopy(model.state_dict())
        torch.save(best_model_wts, "best_model.pth")

model.load_state_dict(best_model_wts)

# test the model
DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

def load_test_data(img_path, transform):
    img = Image.open(img_path)
    img = transform(img)
    return img

def test():
    if not os.path.exists('test_output'):
        os.mkdir('test_output')
    img_path = '0001.png'
    resize_height, resize_width = 256, 512
    model_path = 'best_model.pth'
    data_transform = transforms.Compose([
        transforms.Resize((resize_height, resize_width)),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])
    
    model = LaneLines()
    state_dict = torch.load(model_path)
    model.load_state_dict(state_dict)
    model.eval()
    model.to(DEVICE)
    inp = load_test_data(img_path, data_transform).to(DEVICE)
    inp = torch.unsqueeze(inp, dim=0)
    with torch.no_grad():
        outputs = model(inp)
    input_img = Image.open(img_path)
    input_img = input_img.resize((resize_width, resize_height))
    input_img = np.array(input_img)
    binary_pred = outputs['binary_seg_pred']
    binary_pred_np = binary_pred.detach().cpu().numpy()
    overlay = input_img.copy()
    overlay[binary_pred_np[0, 0, :, :] > 0] = [0, 0, 255]
    cv2.imwrite(os.path.join('test_output', 'input_with_prediction_overlay.jpg'), overlay)

Epoch 1/100
Training Loss: 1879.4857931137085 | Binary Loss: 1879.4857931137085
Validation Loss: 604.4004 | Binary Loss: 604.4004
Epoch 2/100
Training Loss: 1405.152413368225 | Binary Loss: 1405.152413368225
Validation Loss: 564.0801 | Binary Loss: 564.0801
Epoch 3/100
Training Loss: 1327.1682257652283 | Binary Loss: 1327.1682257652283
Validation Loss: 531.2248 | Binary Loss: 531.2248
Epoch 4/100
Training Loss: 1260.9797005653381 | Binary Loss: 1260.9797005653381
Validation Loss: 508.2955 | Binary Loss: 508.2955
Epoch 5/100
Training Loss: 1218.2152633666992 | Binary Loss: 1218.2152633666992
Validation Loss: 480.7238 | Binary Loss: 480.7238
Epoch 6/100
Training Loss: 1162.5182104110718 | Binary Loss: 1162.5182104110718
Validation Loss: 469.5759 | Binary Loss: 469.5759
Epoch 7/100
Training Loss: 1126.847858428955 | Binary Loss: 1126.847858428955
Validation Loss: 458.5229 | Binary Loss: 458.5229
Epoch 8/100
Training Loss: 1112.384331703186 | Binary Loss: 1112.384331703186
Validation Loss:

## **Final Submission**
Upload your submission for Milestone 2 to Canvas. 
Happy Deep Learning! 🚀