## Milestone 2: Neural Network Baseline and Hyperparameter Optimization

LIS 640 - Introduction to Applied Deep Learning

Due 3/7/25

## **Overview**
In Milestone 1 you have:
1. **Defined a deep learning problem** where AI can make a meaningful impact.
2. **Identified three datasets** that fit your topic and justified their relevance.
3. **Explored and visualized** the datasets to understand their structure.
4. **Implemented a PyTorch Dataset class** to prepare data for deep learning.

In Milestone 2 we will take the next step and implement a neural network baseline based on what we have learned in class! For this milestone, please use one of the datasets you picked in the last milestone. If you pick a new one, make sure to do Steps 2 - 4 again. 


## **Step 1: Define Your Deep Learning Problem**

The first step is to be clear about what you want your model to predict. Is your goal a classification or a regression task? what are the input features and what are you prediction targets y? Make sure that you have a sensible choice of features and a sensible choice of prediction targets y in your dataloader.

**Write down one paragraph of justification for how you set up your DataLoader below. If it makes sense to change the DataLoader from Milestone 1, describe what you changed and why:** 
We decided to completely switch our dataset to the TuSimple lane lines dataset. The reason for this is because we found that our roboflow datasets we were using before were not uniform at all in the labelling, while the TuSimple dataset provided 10000+ images with consistent labelling. So our DataLoader consists of the TuSimple dataset, where the labels are a binary mask (white or black) over an image indicating where the lane lines are. We decrease the size of each image heavily down to 512x256 to make the dataset more manageable and also implemented a amount of dataset used percentage to control how much of the 10000 images we actually use since training on all 10000 images is very time consuming. To get the TuSimple dataset formatted and set up into a DataLoader, we took inspiration from this project we found (https://github.com/IrohXu/lanenet-lane-detection-pytorch) and learned how they modified the TuSimple dataset and what transformations they applied to make the data better for training. They also implemented data shuffling so that each epoch sees the data in a different order, which we thought would be a good idea so we implemented that into our DataLoader as well. We then created a dictionary with our train and test DataLoader for ease of accessing while training and testing the model.

## **Step 2: Train a Neural Network in PyTorch**

We learned in class how to implement and train a feed forward neural network in pytorch. You can find reference implementations [here](https://github.com/mariru/Intro2ADL/blob/main/Week5/Week5_Lab_Example.ipynb) and [here](https://www.kaggle.com/code/girlboss/mmlm2025-pytorch-lb-0-00000). Tip: Try to implement the neural network by yourself from scratch before looking at the reference.


In [4]:
# imports
import os
import torch
import cv2
import pandas as pd
import numpy as np
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
import torch.optim as optim
from torch.autograd import Variable
from torchvision import models
import random
from torch.optim import lr_scheduler
import time
import copy
import torch.nn as nn
from PIL import Image
import os
import sys
import torch
import numpy as np
from torchvision import transforms
from PIL import Image
import cv2


# define dataloaders: make sure to have a train, validation and a test loader
train_dataset_file = 'archive/TUSimple/train_set/training/train.txt'
val_dataset_file = 'archive/TUSimple/train_set/training/val.txt'

resize_height, resize_width = 256, 512

class Rescale():
    def __init__(self, output_size):
        assert isinstance(output_size, (tuple))
        self.output_size = output_size

    def __call__(self, sample):
        sample = cv2.resize(sample, dsize=self.output_size, interpolation=cv2.INTER_NEAREST)
        return sample

class TusimpleData(Dataset):
    def __init__(self, dataset, n_labels=3, transform=None, target_transform=None, training=True, optuna=False):
        self._gt_img_list = []
        self._gt_label_binary_list = []
        self.transform = transform
        self.target_transform = target_transform
        self.n_labels = n_labels

        with open(dataset, 'r') as file:
            for _info in file:
                info_tmp = _info.strip(' ').split()

                self._gt_img_list.append(info_tmp[0])
                self._gt_label_binary_list.append(info_tmp[1])

        self._shuffle()

        # DECREASE AMOUNT OF DATA
        purger = 0.2
        if optuna:
            purger = 0.01
        if purger < 1.0 and training:
            total_size = len(self._gt_img_list)
            subset_size = int(total_size * purger)
            self._gt_img_list = self._gt_img_list[:subset_size]
            self._gt_label_binary_list = self._gt_label_binary_list[:subset_size]

    def _shuffle(self):
        c = list(zip(self._gt_img_list, self._gt_label_binary_list))
        random.shuffle(c)
        self._gt_img_list, self._gt_label_binary_list = zip(*c)

    def __len__(self):
        return len(self._gt_img_list)

    def __getitem__(self, idx):
        img = Image.open(self._gt_img_list[idx])
        label_img = cv2.imread(self._gt_label_binary_list[idx], cv2.IMREAD_COLOR)
        if self.transform:
            img = self.transform(img)
        if self.target_transform:
            label_img = self.target_transform(label_img)
        label_binary = np.zeros([label_img.shape[0], label_img.shape[1]], dtype=np.uint8)
        mask = np.where((label_img[:, :, :] != [0, 0, 0]).all(axis=2))
        label_binary[mask] = 1
        return img, label_binary

data_transforms = {
    'train': transforms.Compose([
        transforms.Resize((resize_height, resize_width)),
        transforms.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.1),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize((resize_height, resize_width)),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

target_transforms = transforms.Compose([
    Rescale((resize_width, resize_height)),
])

train_dataset = TusimpleData(train_dataset_file, transform=data_transforms['train'], target_transform=target_transforms, training=True)
val_dataset = TusimpleData(val_dataset_file, transform=data_transforms['val'], target_transform=target_transforms, training=False)
train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=4, shuffle=True)

dataloaders = {
    'train': train_loader,
    'val': val_loader
}
dataset_sizes = {
    'train': len(train_loader.dataset),
    'val': len(val_loader.dataset)
}

# define the model
class LaneLines(nn.Module):
    def __init__(self):
        super(LaneLines, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=2, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1)
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, stride=2, padding=1)
        self.relu = nn.ReLU()
        self.deconv1_binary = nn.ConvTranspose2d(128, 64, kernel_size=3, stride=2, padding=1, output_padding=1)
        self.deconv2_binary = nn.ConvTranspose2d(64, 32, kernel_size=3, stride=2, padding=1, output_padding=1)
        self.deconv3_binary = nn.ConvTranspose2d(32, 2, kernel_size=3, stride=2, padding=1, output_padding=1)
        self.dropout = nn.Dropout(0.2)

    def forward(self, x):
        x = self.relu(self.conv1(x))
        x = self.relu(self.conv2(x))
        x = self.relu(self.conv3(x))
        binary = self.relu(self.deconv1_binary(x))
        binary = self.relu(self.deconv2_binary(binary))
        binary = self.deconv3_binary(binary)
        binary_pred = torch.argmax(binary, dim=1, keepdim=True)
        return {
            "binary_seg_logits": binary,
            "binary_seg_pred": binary_pred
        }


# define the loss function and the optimizer
def compute_loss(net_output, binary_label):
    k_binary = 10
    loss_fn = nn.CrossEntropyLoss()
    binary_seg_logits = net_output["binary_seg_logits"]
    binary_loss = loss_fn(binary_seg_logits, binary_label)
    binary_loss = binary_loss * k_binary
    total_loss = binary_loss
    out = net_output["binary_seg_pred"]
    return total_loss, binary_loss, out

def train_loop(model, dataloader, optimizer, scheduler, device):
    model.train()
    running_loss = 0.0
    running_loss_b = 0.0

    for inputs, binarys in dataloader:
        inputs = inputs.float().to(device)
        binarys = binarys.long().to(device)
        optimizer.zero_grad()
        with torch.set_grad_enabled(True): 
            outputs = model(inputs)
            total_loss, binary_loss, out = compute_loss(outputs, binarys)
            total_loss.backward()
            optimizer.step()

        batch_size = inputs.size(0)
        running_loss += total_loss.item() * batch_size
        running_loss_b += binary_loss.item() * batch_size

    if scheduler is not None:
        scheduler.step()

    return running_loss, running_loss_b

def test_loop(model, dataloader, device):
    model.eval()  
    running_loss = 0.0
    running_loss_b = 0.0

    with torch.no_grad():
        for inputs, binarys in dataloader:
            inputs = inputs.float().to(device)
            binarys = binarys.long().to(device)

            outputs = model(inputs)
            total_loss, binary_loss, out = compute_loss(outputs, binarys)
            batch_size = inputs.size(0)
            running_loss += total_loss.item() * batch_size
            running_loss_b += binary_loss.item() * batch_size

    return running_loss, running_loss_b


# Commnted out training loop as it takes too long to run and repeating this in the last step

# # train the model
# DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# model = LaneLines().to(DEVICE)
# optimizer = optim.Adam(model.parameters(), lr=0.001)
# scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)
# train_dataloader = DataLoader(train_dataset, batch_size=32, shuffle=True)
# val_dataloader = DataLoader(val_dataset, batch_size=32, shuffle=False)
# num_epochs = 100
# best_model_wts = copy.deepcopy(model.state_dict())
# best_loss = float("inf")
# losses = {}

# for epoch in range(num_epochs):
#     print(f"Epoch {epoch + 1}/{num_epochs}")
#     train_loss, train_loss_b = train_loop(model, train_dataloader, optimizer, scheduler, DEVICE)
#     print(f"Training Loss: {train_loss} | Binary Loss: {train_loss_b}")
#     val_loss, val_loss_b = test_loop(model, val_dataloader, DEVICE)
#     print(f"Validation Loss: {val_loss:.4f} | Binary Loss: {val_loss_b:.4f}")

#     losses[epoch] = val_loss

#     if val_loss < best_loss:
#         best_loss = val_loss
#         best_model_wts = copy.deepcopy(model.state_dict())
#         torch.save(best_model_wts, "best_model.pth")

# model.load_state_dict(best_model_wts)

# # test the model
# DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# def load_test_data(img_path, transform):
#     img = Image.open(img_path)
#     img = transform(img)
#     return img

# def test():
#     if not os.path.exists('test_output'):
#         os.mkdir('test_output')
#     img_path = '0001.png'
#     resize_height, resize_width = 256, 512
#     model_path = 'best_model.pth'
#     data_transform = transforms.Compose([
#         transforms.Resize((resize_height, resize_width)),
#         transforms.ToTensor(),
#         transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
#     ])
    
#     model = LaneLines()
#     state_dict = torch.load(model_path)
#     model.load_state_dict(state_dict)
#     model.eval()
#     model.to(DEVICE)
#     inp = load_test_data(img_path, data_transform).to(DEVICE)
#     inp = torch.unsqueeze(inp, dim=0)
#     with torch.no_grad():
#         outputs = model(inp)
#     input_img = Image.open(img_path)
#     input_img = input_img.resize((resize_width, resize_height))
#     input_img = np.array(input_img)
#     binary_pred = outputs['binary_seg_pred']
#     binary_pred_np = binary_pred.detach().cpu().numpy()
#     overlay = input_img.copy()
#     overlay[binary_pred_np[0, 0, :, :] > 0] = [0, 0, 255]
#     cv2.imwrite(os.path.join('test_output', 'input_with_prediction_overlay.jpg'), overlay)

## **Step 2 continued: Try Stuff**

Use your code above to try different architectures. Make sure to use early stopping! Try adding Dropout and BatchNorm, try different learning rates. How do they affect training and validation performance? 

 **Summarize your observations in a paragraph below:**
 Since we have a large dataset, we decided to try this with only 10% of the data (1000 images). We found that deeper networks made training way more time consuming without giving us much benefit in performance of the final model, and we also found that Dropout did help with overfitting and slightly improved inference of our model, but only when we trained the model on a larger portion of the dataset, so we added Dropout. We also found that BatchNorm did not make a big difference. We found that the best learning rate was in the range of 0.001 to 0.004. We also implemented early stopping by keeping track of each epoch's loss on validation and stopping if the loss is worse than the previous 5 epochs. We hope to use the extension to mess around with different things more, since our biggest limitation has been time to train.


## **Step 3: Hyperparameter Optimization with Optuna**

As you can see, hyperparameter optimization can be tedious. In class we used [optuna](https://optuna.org/#code_examples) to automate the process. Your next task is to wrap your code from Step 2 into an objective which you can then optimize with optuna. Under the [code exaples](https://optuna.org/#code_examples) there is a tab *PyTorch* which should be helpful as it provides a minimal example on how to wrap PyTorch code inside an objective.

**Important: Make sure the model is evaluated on a validation set, not the training data!!**


In [5]:
import optuna
import torch
import torch.optim as optim
import torch.nn as nn
import copy
from torch.utils.data import DataLoader
import optuna.exceptions

train_dataset = TusimpleData(train_dataset_file, transform=data_transforms['train'], target_transform=target_transforms, training=True, optuna=True)
val_dataset = TusimpleData(val_dataset_file, transform=data_transforms['val'], target_transform=target_transforms, training=False)

# Define an objective function to be maximized.
def objective(trial):
    lr = trial.suggest_loguniform('lr', 1e-5, 1e-2)
    batch_size = trial.suggest_categorical('batch_size', [16, 32, 64, 128])
    optimizer_name = trial.suggest_categorical('optimizer', ['Adam', 'SGD'])

    train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    val_dataloader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = LaneLines().to(device)

    if optimizer_name == 'Adam':
        optimizer = optim.Adam(model.parameters(), lr=lr)
    else:
        optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.9)

    scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)

    # loss_fn = nn.CrossEntropyLoss() 

    num_epochs = 15

    best_model_wts = copy.deepcopy(model.state_dict())
    best_loss = float("inf")

    for epoch in range(num_epochs):
        print(f"Epoch {epoch + 1}/{num_epochs}")

        train_loss, train_loss_b = train_loop(model, train_dataloader, optimizer, scheduler, device)
        print(f"Training Loss: {train_loss:.4f} | Binary Loss: {train_loss_b:.4f}")

        val_loss, val_loss_b = test_loop(model, val_dataloader, device)
        print(f"Validation Loss: {val_loss:.4f} | Binary Loss: {val_loss_b:.4f}")

        trial.report(val_loss, epoch)

        if trial.should_prune():
            raise optuna.exceptions.TrialPruned()

        if val_loss < best_loss:
            best_loss = val_loss
            best_model_wts = copy.deepcopy(model.state_dict())

        scheduler.step()

    model.load_state_dict(best_model_wts)

    return best_loss

# Create a study object
study = optuna.create_study(direction="minimize", pruner=optuna.pruners.MedianPruner(n_startup_trials=2, n_warmup_steps=3))

# Optimize the objective function
study.optimize(objective, n_trials=20)

# Print out the best parameters
print("Best hyperparameters:", study.best_params)

[I 2025-03-13 22:10:14,450] A new study created in memory with name: no-name-7c3acc29-8e2a-4486-8052-ecc30c686499


  lr = trial.suggest_loguniform('lr', 1e-5, 1e-2)


Epoch 1/15
Training Loss: 457.6011 | Binary Loss: 457.6011
Validation Loss: 1284.1334 | Binary Loss: 1284.1334
Epoch 2/15
Training Loss: 185.2092 | Binary Loss: 185.2092
Validation Loss: 1125.5753 | Binary Loss: 1125.5753
Epoch 3/15
Training Loss: 119.3285 | Binary Loss: 119.3285
Validation Loss: 1064.7070 | Binary Loss: 1064.7070
Epoch 4/15
Training Loss: 110.8861 | Binary Loss: 110.8861
Validation Loss: 830.6626 | Binary Loss: 830.6626
Epoch 5/15
Training Loss: 102.8825 | Binary Loss: 102.8825
Validation Loss: 791.7606 | Binary Loss: 791.7606
Epoch 6/15
Training Loss: 93.8583 | Binary Loss: 93.8583
Validation Loss: 765.5646 | Binary Loss: 765.5646
Epoch 7/15
Training Loss: 91.0717 | Binary Loss: 91.0717
Validation Loss: 744.3363 | Binary Loss: 744.3363
Epoch 8/15
Training Loss: 89.0643 | Binary Loss: 89.0643
Validation Loss: 734.6345 | Binary Loss: 734.6345
Epoch 9/15
Training Loss: 88.4783 | Binary Loss: 88.4783
Validation Loss: 729.9181 | Binary Loss: 729.9181
Epoch 10/15
Training 

[I 2025-03-13 22:15:39,674] Trial 0 finished with value: 720.5514109134674 and parameters: {'lr': 0.0028488698756383777, 'batch_size': 32, 'optimizer': 'Adam'}. Best is trial 0 with value: 720.5514109134674.


Validation Loss: 720.5514 | Binary Loss: 720.5514
Epoch 1/15
Training Loss: 697.4196 | Binary Loss: 697.4196
Validation Loss: 5242.0920 | Binary Loss: 5242.0920
Epoch 2/15
Training Loss: 568.8501 | Binary Loss: 568.8501
Validation Loss: 3974.0038 | Binary Loss: 3974.0038
Epoch 3/15
Training Loss: 424.5943 | Binary Loss: 424.5943
Validation Loss: 2933.6669 | Binary Loss: 2933.6669
Epoch 4/15
Training Loss: 315.5001 | Binary Loss: 315.5001
Validation Loss: 2222.8325 | Binary Loss: 2222.8325
Epoch 5/15
Training Loss: 243.1270 | Binary Loss: 243.1270
Validation Loss: 1767.9155 | Binary Loss: 1767.9155
Epoch 6/15
Training Loss: 209.8668 | Binary Loss: 209.8668
Validation Loss: 1735.3577 | Binary Loss: 1735.3577
Epoch 7/15
Training Loss: 206.2001 | Binary Loss: 206.2001
Validation Loss: 1707.3242 | Binary Loss: 1707.3242
Epoch 8/15
Training Loss: 202.9805 | Binary Loss: 202.9805
Validation Loss: 1681.8404 | Binary Loss: 1681.8404
Epoch 9/15
Training Loss: 200.0235 | Binary Loss: 200.0235
Val

[I 2025-03-13 22:21:01,614] Trial 1 finished with value: 1624.5446063280106 and parameters: {'lr': 0.0007567340516525363, 'batch_size': 16, 'optimizer': 'SGD'}. Best is trial 0 with value: 720.5514109134674.


Validation Loss: 1624.5446 | Binary Loss: 1624.5446
Epoch 1/15
Training Loss: 741.5212 | Binary Loss: 741.5212
Validation Loss: 6200.9681 | Binary Loss: 6200.9681
Epoch 2/15
Training Loss: 740.7170 | Binary Loss: 740.7170
Validation Loss: 6194.3877 | Binary Loss: 6194.3877
Epoch 3/15
Training Loss: 739.9341 | Binary Loss: 739.9341
Validation Loss: 6187.8660 | Binary Loss: 6187.8660
Epoch 4/15
Training Loss: 739.1718 | Binary Loss: 739.1718


[I 2025-03-13 22:22:26,804] Trial 2 pruned. 


Validation Loss: 6181.4159 | Binary Loss: 6181.4159
Epoch 1/15
Training Loss: 570.6067 | Binary Loss: 570.6067
Validation Loss: 4632.1980 | Binary Loss: 4632.1980
Epoch 2/15
Training Loss: 543.2350 | Binary Loss: 543.2350
Validation Loss: 4280.1268 | Binary Loss: 4280.1268
Epoch 3/15
Training Loss: 497.8413 | Binary Loss: 497.8413
Validation Loss: 3848.6661 | Binary Loss: 3848.6661
Epoch 4/15
Training Loss: 445.5739 | Binary Loss: 445.5739


[I 2025-03-13 22:23:51,335] Trial 3 pruned. 


Validation Loss: 3409.5172 | Binary Loss: 3409.5172
Epoch 1/15
Training Loss: 737.2841 | Binary Loss: 737.2841
Validation Loss: 5970.7808 | Binary Loss: 5970.7808
Epoch 2/15
Training Loss: 713.3460 | Binary Loss: 713.3460
Validation Loss: 5354.3066 | Binary Loss: 5354.3066
Epoch 3/15
Training Loss: 639.3289 | Binary Loss: 639.3289
Validation Loss: 3838.0700 | Binary Loss: 3838.0700
Epoch 4/15
Training Loss: 458.8314 | Binary Loss: 458.8314


[I 2025-03-13 22:25:17,939] Trial 4 pruned. 


Validation Loss: 2267.2305 | Binary Loss: 2267.2305
Epoch 1/15
Training Loss: 703.7047 | Binary Loss: 703.7047
Validation Loss: 5887.3589 | Binary Loss: 5887.3589
Epoch 2/15
Training Loss: 703.2767 | Binary Loss: 703.2767
Validation Loss: 5883.7889 | Binary Loss: 5883.7889
Epoch 3/15
Training Loss: 702.8518 | Binary Loss: 702.8518
Validation Loss: 5880.2069 | Binary Loss: 5880.2069
Epoch 4/15
Training Loss: 702.4237 | Binary Loss: 702.4237


[I 2025-03-13 22:26:45,106] Trial 5 pruned. 


Validation Loss: 5876.5771 | Binary Loss: 5876.5771
Epoch 1/15
Training Loss: 721.9696 | Binary Loss: 721.9696
Validation Loss: 5521.2736 | Binary Loss: 5521.2736
Epoch 2/15
Training Loss: 548.6885 | Binary Loss: 548.6885
Validation Loss: 2701.5889 | Binary Loss: 2701.5889
Epoch 3/15
Training Loss: 233.4089 | Binary Loss: 233.4089
Validation Loss: 1578.5329 | Binary Loss: 1578.5329
Epoch 4/15
Training Loss: 199.0376 | Binary Loss: 199.0376


[I 2025-03-13 22:28:10,156] Trial 6 pruned. 


Validation Loss: 1588.5721 | Binary Loss: 1588.5721
Epoch 1/15
Training Loss: 731.7396 | Binary Loss: 731.7396
Validation Loss: 6032.6644 | Binary Loss: 6032.6644
Epoch 2/15
Training Loss: 714.0469 | Binary Loss: 714.0469
Validation Loss: 5799.7338 | Binary Loss: 5799.7338
Epoch 3/15
Training Loss: 683.6397 | Binary Loss: 683.6397
Validation Loss: 5499.6481 | Binary Loss: 5499.6481
Epoch 4/15
Training Loss: 646.6267 | Binary Loss: 646.6267


[I 2025-03-13 22:29:34,975] Trial 7 pruned. 


Validation Loss: 5172.4071 | Binary Loss: 5172.4071
Epoch 1/15
Training Loss: 649.5221 | Binary Loss: 649.5221
Validation Loss: 5429.1556 | Binary Loss: 5429.1556
Epoch 2/15
Training Loss: 647.8131 | Binary Loss: 647.8131
Validation Loss: 5409.0900 | Binary Loss: 5409.0900
Epoch 3/15
Training Loss: 645.1627 | Binary Loss: 645.1627
Validation Loss: 5384.2119 | Binary Loss: 5384.2119
Epoch 4/15
Training Loss: 642.0760 | Binary Loss: 642.0760


[I 2025-03-13 22:30:59,951] Trial 8 pruned. 


Validation Loss: 5357.1679 | Binary Loss: 5357.1679
Epoch 1/15
Training Loss: 693.7491 | Binary Loss: 693.7491
Validation Loss: 5798.0105 | Binary Loss: 5798.0105
Epoch 2/15
Training Loss: 692.0781 | Binary Loss: 692.0781
Validation Loss: 5775.6475 | Binary Loss: 5775.6475
Epoch 3/15
Training Loss: 689.1305 | Binary Loss: 689.1305
Validation Loss: 5745.5144 | Binary Loss: 5745.5144
Epoch 4/15
Training Loss: 685.3521 | Binary Loss: 685.3521


[I 2025-03-13 22:32:27,003] Trial 9 pruned. 


Validation Loss: 5710.4286 | Binary Loss: 5710.4286
Epoch 1/15
Training Loss: 729.3682 | Binary Loss: 729.3682
Validation Loss: 4226.1427 | Binary Loss: 4226.1427
Epoch 2/15
Training Loss: 505.4047 | Binary Loss: 505.4047
Validation Loss: 1764.4432 | Binary Loss: 1764.4432
Epoch 3/15
Training Loss: 210.2749 | Binary Loss: 210.2749
Validation Loss: 2477.3511 | Binary Loss: 2477.3511
Epoch 4/15
Training Loss: 295.8389 | Binary Loss: 295.8389


[I 2025-03-13 22:33:54,168] Trial 10 pruned. 


Validation Loss: 1735.7565 | Binary Loss: 1735.7565
Epoch 1/15
Training Loss: 597.6442 | Binary Loss: 597.6442
Validation Loss: 4045.5851 | Binary Loss: 4045.5851
Epoch 2/15
Training Loss: 406.9542 | Binary Loss: 406.9542
Validation Loss: 2473.8994 | Binary Loss: 2473.8994
Epoch 3/15
Training Loss: 248.6470 | Binary Loss: 248.6470
Validation Loss: 1574.5388 | Binary Loss: 1574.5388
Epoch 4/15
Training Loss: 166.2611 | Binary Loss: 166.2611
Validation Loss: 1159.6640 | Binary Loss: 1159.6640
Epoch 5/15
Training Loss: 128.8382 | Binary Loss: 128.8382
Validation Loss: 970.4685 | Binary Loss: 970.4685
Epoch 6/15
Training Loss: 115.6946 | Binary Loss: 115.6946
Validation Loss: 959.7126 | Binary Loss: 959.7126
Epoch 7/15
Training Loss: 114.5466 | Binary Loss: 114.5466
Validation Loss: 951.5045 | Binary Loss: 951.5045
Epoch 8/15
Training Loss: 113.6343 | Binary Loss: 113.6343
Validation Loss: 944.6121 | Binary Loss: 944.6121
Epoch 9/15
Training Loss: 112.8501 | Binary Loss: 112.8501
Validatio

[I 2025-03-13 22:39:21,937] Trial 11 finished with value: 930.0603764057159 and parameters: {'lr': 0.0017284707129619876, 'batch_size': 16, 'optimizer': 'SGD'}. Best is trial 0 with value: 720.5514109134674.


Validation Loss: 930.0604 | Binary Loss: 930.0604
Epoch 1/15
Training Loss: 832.4450 | Binary Loss: 832.4450
Validation Loss: 6714.7458 | Binary Loss: 6714.7458
Epoch 2/15
Training Loss: 791.0397 | Binary Loss: 791.0397
Validation Loss: 6138.7287 | Binary Loss: 6138.7287
Epoch 3/15
Training Loss: 718.1548 | Binary Loss: 718.1548
Validation Loss: 5404.5337 | Binary Loss: 5404.5337
Epoch 4/15
Training Loss: 628.9182 | Binary Loss: 628.9182


[I 2025-03-13 22:40:49,934] Trial 12 pruned. 


Validation Loss: 4623.2462 | Binary Loss: 4623.2462
Epoch 1/15
Training Loss: 342.0801 | Binary Loss: 342.0801
Validation Loss: 1319.6030 | Binary Loss: 1319.6030
Epoch 2/15
Training Loss: 120.1376 | Binary Loss: 120.1376
Validation Loss: 839.7959 | Binary Loss: 839.7959
Epoch 3/15
Training Loss: 98.7392 | Binary Loss: 98.7392
Validation Loss: 747.7483 | Binary Loss: 747.7483
Epoch 4/15
Training Loss: 90.4433 | Binary Loss: 90.4433
Validation Loss: 731.6487 | Binary Loss: 731.6487
Epoch 5/15
Training Loss: 88.1318 | Binary Loss: 88.1318
Validation Loss: 711.5677 | Binary Loss: 711.5677
Epoch 6/15
Training Loss: 85.4540 | Binary Loss: 85.4540
Validation Loss: 706.6779 | Binary Loss: 706.6779
Epoch 7/15
Training Loss: 84.9096 | Binary Loss: 84.9096
Validation Loss: 697.1443 | Binary Loss: 697.1443
Epoch 8/15
Training Loss: 84.1064 | Binary Loss: 84.1064
Validation Loss: 694.4857 | Binary Loss: 694.4857
Epoch 9/15
Training Loss: 84.0292 | Binary Loss: 84.0292
Validation Loss: 692.4810 | B

[I 2025-03-13 22:46:15,097] Trial 13 finished with value: 689.1264949440956 and parameters: {'lr': 0.0072306956704766895, 'batch_size': 16, 'optimizer': 'Adam'}. Best is trial 13 with value: 689.1264949440956.


Validation Loss: 689.1265 | Binary Loss: 689.1265
Epoch 1/15
Training Loss: 383.3054 | Binary Loss: 383.3054
Validation Loss: 1230.5810 | Binary Loss: 1230.5810
Epoch 2/15
Training Loss: 127.2585 | Binary Loss: 127.2585
Validation Loss: 961.0655 | Binary Loss: 961.0655
Epoch 3/15
Training Loss: 105.5372 | Binary Loss: 105.5372
Validation Loss: 801.7987 | Binary Loss: 801.7987
Epoch 4/15
Training Loss: 99.5582 | Binary Loss: 99.5582
Validation Loss: 752.7385 | Binary Loss: 752.7385
Epoch 5/15
Training Loss: 95.0911 | Binary Loss: 95.0911
Validation Loss: 791.4991 | Binary Loss: 791.4991
Epoch 6/15
Training Loss: 95.4446 | Binary Loss: 95.4446
Validation Loss: 755.8560 | Binary Loss: 755.8560
Epoch 7/15
Training Loss: 89.4965 | Binary Loss: 89.4965
Validation Loss: 720.0071 | Binary Loss: 720.0071
Epoch 8/15
Training Loss: 87.9359 | Binary Loss: 87.9359
Validation Loss: 737.2969 | Binary Loss: 737.2969
Epoch 9/15
Training Loss: 88.6393 | Binary Loss: 88.6393
Validation Loss: 719.3614 | B

[I 2025-03-13 22:51:40,721] Trial 14 finished with value: 710.9592831134796 and parameters: {'lr': 0.008417221799649469, 'batch_size': 32, 'optimizer': 'Adam'}. Best is trial 13 with value: 689.1264949440956.


Validation Loss: 710.9593 | Binary Loss: 710.9593
Epoch 1/15
Training Loss: 350.3881 | Binary Loss: 350.3881
Validation Loss: 1130.0716 | Binary Loss: 1130.0716
Epoch 2/15
Training Loss: 129.6255 | Binary Loss: 129.6255
Validation Loss: 1008.7328 | Binary Loss: 1008.7328
Epoch 3/15
Training Loss: 111.9180 | Binary Loss: 111.9180
Validation Loss: 784.2329 | Binary Loss: 784.2329
Epoch 4/15
Training Loss: 92.5128 | Binary Loss: 92.5128
Validation Loss: 709.8897 | Binary Loss: 709.8897
Epoch 5/15
Training Loss: 83.8103 | Binary Loss: 83.8103
Validation Loss: 680.5921 | Binary Loss: 680.5921
Epoch 6/15
Training Loss: 81.9260 | Binary Loss: 81.9260
Validation Loss: 669.8806 | Binary Loss: 669.8806
Epoch 7/15
Training Loss: 80.7326 | Binary Loss: 80.7326
Validation Loss: 666.5199 | Binary Loss: 666.5199
Epoch 8/15
Training Loss: 80.6233 | Binary Loss: 80.6233
Validation Loss: 664.6921 | Binary Loss: 664.6921
Epoch 9/15
Training Loss: 80.5440 | Binary Loss: 80.5440
Validation Loss: 663.0781 |

[I 2025-03-13 22:57:05,409] Trial 15 finished with value: 659.2741931676865 and parameters: {'lr': 0.009405886808874463, 'batch_size': 16, 'optimizer': 'Adam'}. Best is trial 15 with value: 659.2741931676865.


Validation Loss: 659.2742 | Binary Loss: 659.2742
Epoch 1/15
Training Loss: 646.1021 | Binary Loss: 646.1021
Validation Loss: 5358.5081 | Binary Loss: 5358.5081
Epoch 2/15
Training Loss: 634.4232 | Binary Loss: 634.4232
Validation Loss: 5203.9114 | Binary Loss: 5203.9114
Epoch 3/15
Training Loss: 608.8526 | Binary Loss: 608.8526
Validation Loss: 4861.6639 | Binary Loss: 4861.6639
Epoch 4/15
Training Loss: 555.7357 | Binary Loss: 555.7357


[I 2025-03-13 22:58:32,566] Trial 16 pruned. 


Validation Loss: 4251.4456 | Binary Loss: 4251.4456
Epoch 1/15
Training Loss: 251.1440 | Binary Loss: 251.1440
Validation Loss: 795.3431 | Binary Loss: 795.3431
Epoch 2/15
Training Loss: 96.6996 | Binary Loss: 96.6996
Validation Loss: 780.7080 | Binary Loss: 780.7080
Epoch 3/15
Training Loss: 90.3943 | Binary Loss: 90.3943
Validation Loss: 699.4892 | Binary Loss: 699.4892
Epoch 4/15
Training Loss: 84.1657 | Binary Loss: 84.1657
Validation Loss: 704.7021 | Binary Loss: 704.7021
Epoch 5/15
Training Loss: 84.6999 | Binary Loss: 84.6999
Validation Loss: 663.3749 | Binary Loss: 663.3749
Epoch 6/15
Training Loss: 80.2912 | Binary Loss: 80.2912
Validation Loss: 655.5919 | Binary Loss: 655.5919
Epoch 7/15
Training Loss: 79.5912 | Binary Loss: 79.5912
Validation Loss: 660.5322 | Binary Loss: 660.5322
Epoch 8/15
Training Loss: 79.3293 | Binary Loss: 79.3293
Validation Loss: 648.9534 | Binary Loss: 648.9534
Epoch 9/15
Training Loss: 78.8233 | Binary Loss: 78.8233
Validation Loss: 647.2829 | Binar

[I 2025-03-13 23:04:00,111] Trial 17 finished with value: 644.1527051925659 and parameters: {'lr': 0.007328191710575833, 'batch_size': 16, 'optimizer': 'Adam'}. Best is trial 17 with value: 644.1527051925659.


Validation Loss: 644.1527 | Binary Loss: 644.1527
Epoch 1/15
Training Loss: 599.3834 | Binary Loss: 599.3834
Validation Loss: 2262.5526 | Binary Loss: 2262.5526
Epoch 2/15
Training Loss: 202.0865 | Binary Loss: 202.0865
Validation Loss: 1655.2545 | Binary Loss: 1655.2545
Epoch 3/15
Training Loss: 152.5385 | Binary Loss: 152.5385
Validation Loss: 1064.3509 | Binary Loss: 1064.3509
Epoch 4/15
Training Loss: 122.5781 | Binary Loss: 122.5781


[I 2025-03-13 23:05:27,300] Trial 18 pruned. 


Validation Loss: 840.3817 | Binary Loss: 840.3817
Epoch 1/15
Training Loss: 462.1770 | Binary Loss: 462.1770
Validation Loss: 1379.7782 | Binary Loss: 1379.7782
Epoch 2/15
Training Loss: 137.2747 | Binary Loss: 137.2747
Validation Loss: 894.3344 | Binary Loss: 894.3344
Epoch 3/15
Training Loss: 109.0129 | Binary Loss: 109.0129
Validation Loss: 827.5910 | Binary Loss: 827.5910
Epoch 4/15
Training Loss: 97.5343 | Binary Loss: 97.5343


[I 2025-03-13 23:06:54,687] Trial 19 pruned. 


Validation Loss: 773.1251 | Binary Loss: 773.1251
Best hyperparameters: {'lr': 0.007328191710575833, 'batch_size': 16, 'optimizer': 'Adam'}


## **Step 3 continued: Insights**

The hyperparameter search was somewhat helpful, but we think we weren’t able to harness a lot of the potential benefits due to the significant training times when running the Optuna trials – even with only using a small percentage (5-10%) of the data. While increasing the number of trials did yield hyperparameters that improved results, we could not increase the number of trials by a significant amount. Additionally, we found the pruning feature on the Optuna website and found it to be useful in cutting off less promising trials early, which allowed us to bump up the number of trials. We plan to look into potentially using cloud compute so we can run more, longer trials for the hyperparameter search.

## **Step 4: Final Training**

Now that you have found a good hyperparameter setting the validation set is no longer needed. The last step is to combine the training and validation set into a combined training set and retrain the model under the best parameter setting found. Report your final loss on your test data.

In [6]:
optuna_lr = study.best_params['lr']
optuna_batch_size = study.best_params['batch_size']
optuna_optimizer = study.best_params['optimizer']

In [7]:
# train the model
train_dataset = TusimpleData(train_dataset_file, transform=data_transforms['train'], target_transform=target_transforms, training=True)
val_dataset = TusimpleData(val_dataset_file, transform=data_transforms['val'], target_transform=target_transforms, training=False)


DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = LaneLines().to(DEVICE)

if optuna_optimizer == 'SGD':
    optimizer = optim.SGD(model.parameters(), lr=optuna_lr, momentum=0.9)
else:
    optimizer = optim.Adam(model.parameters(), lr=optuna_lr)

scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)
train_dataloader = DataLoader(train_dataset, batch_size=optuna_batch_size, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=optuna_batch_size, shuffle=False)
num_epochs = 100
best_model_wts = copy.deepcopy(model.state_dict())
best_loss = float("inf")
losses = {}

for epoch in range(num_epochs):
    print(f"Epoch {epoch + 1}/{num_epochs}")
    train_loss, train_loss_b = train_loop(model, train_dataloader, optimizer, scheduler, DEVICE)
    print(f"Training Loss: {train_loss} | Binary Loss: {train_loss_b}")
    val_loss, val_loss_b = test_loop(model, val_dataloader, DEVICE)
    print(f"Validation Loss: {val_loss:.4f} | Binary Loss: {val_loss_b:.4f}")

    losses[epoch] = val_loss

    if val_loss < best_loss:
        best_loss = val_loss
        best_model_wts = copy.deepcopy(model.state_dict())
        torch.save(best_model_wts, "best_model.pth")

model.load_state_dict(best_model_wts)

# test the model
DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

def load_test_data(img_path, transform):
    img = Image.open(img_path)
    img = transform(img)
    return img

def test():
    if not os.path.exists('test_output'):
        os.mkdir('test_output')
    img_path = '0001.png'
    resize_height, resize_width = 256, 512
    model_path = 'best_model.pth'
    data_transform = transforms.Compose([
        transforms.Resize((resize_height, resize_width)),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])
    
    model = LaneLines()
    state_dict = torch.load(model_path)
    model.load_state_dict(state_dict)
    model.eval()
    model.to(DEVICE)
    inp = load_test_data(img_path, data_transform).to(DEVICE)
    inp = torch.unsqueeze(inp, dim=0)
    with torch.no_grad():
        outputs = model(inp)
    input_img = Image.open(img_path)
    input_img = input_img.resize((resize_width, resize_height))
    input_img = np.array(input_img)
    binary_pred = outputs['binary_seg_pred']
    binary_pred_np = binary_pred.detach().cpu().numpy()
    overlay = input_img.copy()
    overlay[binary_pred_np[0, 0, :, :] > 0] = [0, 0, 255]
    cv2.imwrite(os.path.join('test_output', 'input_with_prediction_overlay.jpg'), overlay)

Epoch 1/100
Training Loss: 1879.4857931137085 | Binary Loss: 1879.4857931137085
Validation Loss: 604.4004 | Binary Loss: 604.4004
Epoch 2/100
Training Loss: 1405.152413368225 | Binary Loss: 1405.152413368225
Validation Loss: 564.0801 | Binary Loss: 564.0801
Epoch 3/100
Training Loss: 1327.1682257652283 | Binary Loss: 1327.1682257652283
Validation Loss: 531.2248 | Binary Loss: 531.2248
Epoch 4/100
Training Loss: 1260.9797005653381 | Binary Loss: 1260.9797005653381
Validation Loss: 508.2955 | Binary Loss: 508.2955
Epoch 5/100
Training Loss: 1218.2152633666992 | Binary Loss: 1218.2152633666992
Validation Loss: 480.7238 | Binary Loss: 480.7238
Epoch 6/100
Training Loss: 1162.5182104110718 | Binary Loss: 1162.5182104110718
Validation Loss: 469.5759 | Binary Loss: 469.5759
Epoch 7/100
Training Loss: 1126.847858428955 | Binary Loss: 1126.847858428955
Validation Loss: 458.5229 | Binary Loss: 458.5229
Epoch 8/100
Training Loss: 1112.384331703186 | Binary Loss: 1112.384331703186
Validation Loss:

## **Final Submission**
Upload your submission for Milestone 2 to Canvas. 
Happy Deep Learning! 🚀