## Milestone 2: Neural Network Baseline and Hyperparameter Optimization

LIS 640 - Introduction to Applied Deep Learning

Due 3/7/25

## **Overview**
In Milestone 1 you have:
1. **Defined a deep learning problem** where AI can make a meaningful impact.
2. **Identified three datasets** that fit your topic and justified their relevance.
3. **Explored and visualized** the datasets to understand their structure.
4. **Implemented a PyTorch Dataset class** to prepare data for deep learning.

In Milestone 2 we will take the next step and implement a neural network baseline based on what we have learned in class! For this milestone, please use one of the datasets you picked in the last milestone. If you pick a new one, make sure to do Steps 2 - 4 again. 


## **Step 1: Define Your Deep Learning Problem**

The first step is to be clear about what you want your model to predict. Is your goal a classification or a regression task? what are the input features and what are you prediction targets y? Make sure that you have a sensible choice of features and a sensible choice of prediction targets y in your dataloader.

**Write down one paragraph of justification for how you set up your DataLoader below. If it makes sense to change the DataLoader from Milestone 1, describe what you changed and why:**


The dataset I used is about prediction of lung cancer, and the final output is a binary variable. To better apply the model to my dataset, I use one-hotting and labelencoder to change object into int, for most of my data use words like 'High', 'Limited' to describe the evaluation. After that, to make MLP train the data more accurately, I use normalization and standardization to scale column Age and Country. One thing fascinated me was that I learnt how to transform country name to numeric, using target encoding.

## **Step 2: Train a Neural Network in PyTorch**

We learned in class how to implement and train a feed forward neural network in pytorch. You can find reference implementations [here](https://github.com/mariru/Intro2ADL/blob/main/Week5/Week5_Lab_Example.ipynb) and [here](https://www.kaggle.com/code/girlboss/mmlm2025-pytorch-lb-0-00000). Tip: Try to implement the neural network by yourself from scratch before looking at the reference.


In [None]:
#pip install --upgrade category-encoders

In [1]:
# imports
import numpy as np
import pandas as pd
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
from sklearn import ensemble
from sklearn.metrics import *
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import LabelEncoder
import category_encoders as ce

lc = pd.read_csv("lung_cancer_prediction.csv")



lc['Mutation_Type'] = lc['Mutation_Type'].fillna(lc['Mutation_Type'].mode()[0])
lc['Treatment_Access'] = lc['Treatment_Access'].fillna(lc['Treatment_Access'].mode()[0])

encoder = LabelEncoder()
lc['Final_Prediction'] = lc['Final_Prediction'].map({'Yes': 1, 'No': 0})
lc['Gender'] = lc['Gender'].map({'Male': 1, 'Female': 0})
lc['Second_Hand_Smoke'] = lc['Second_Hand_Smoke'].map({'Yes': 1, 'No': 0})
lc['Occupation_Exposure'] = lc['Occupation_Exposure'].map({'Yes': 1, 'No': 0})
lc['Insurance_Coverage'] = lc['Insurance_Coverage'].map({'Yes': 1, 'No': 0})
lc['Screening_Availability'] = lc['Screening_Availability'].map({'Yes': 1, 'No': 0})
lc['Treatment_Access'] = lc['Treatment_Access'].map({'Full': 1, 'Partial': 0})
lc['Clinical_Trial_Access'] = lc['Clinical_Trial_Access'].map({'Yes': 1, 'No': 0})
lc['Language_Barrier'] = lc['Language_Barrier'].map({'Yes': 1, 'No': 0})
lc['Delay_in_Diagnosis'] = lc['Delay_in_Diagnosis'].map({'Yes': 1, 'No': 0})
lc['Family_History'] = lc['Family_History'].map({'Yes': 1, 'No': 0})
lc['Indoor_Smoke_Exposure'] = lc['Indoor_Smoke_Exposure'].map({'Yes': 1, 'No': 0})
lc['Tobacco_Marketing_Exposure'] = lc['Tobacco_Marketing_Exposure'].map({'Yes': 1, 'No': 0})

lc['Air_Pollution_Exposure'] = encoder.fit_transform(lc['Air_Pollution_Exposure'])
lc['Socioeconomic_Status'] = encoder.fit_transform(lc['Socioeconomic_Status'])
lc['Healthcare_Access'] = encoder.fit_transform(lc['Healthcare_Access'])
lc['Stage_at_Diagnosis'] = encoder.fit_transform(lc['Stage_at_Diagnosis'])

lc = pd.get_dummies(lc, columns=['Cancer_Type', 'Rural_or_Urban', 'Smoking_Status', 'Mutation_Type'], drop_first=True)
bool_cols = ['Cancer_Type_SCLC', 'Rural_or_Urban_Urban', 'Smoking_Status_Non-Smoker', 
             'Smoking_Status_Smoker', 'Mutation_Type_EGFR', 'Mutation_Type_KRAS']

lc[bool_cols] = lc[bool_cols].astype(int)

target_encoder = ce.TargetEncoder(cols=['Country'])
lc['Country'] = target_encoder.fit_transform(lc['Country'], lc['Final_Prediction'])

lc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 460292 entries, 0 to 460291
Data columns (total 27 columns):
 #   Column                       Non-Null Count   Dtype  
---  ------                       --------------   -----  
 0   Country                      460292 non-null  float64
 1   Age                          460292 non-null  int64  
 2   Gender                       460292 non-null  int64  
 3   Second_Hand_Smoke            460292 non-null  int64  
 4   Air_Pollution_Exposure       460292 non-null  int32  
 5   Occupation_Exposure          460292 non-null  int64  
 6   Socioeconomic_Status         460292 non-null  int32  
 7   Healthcare_Access            460292 non-null  int32  
 8   Insurance_Coverage           460292 non-null  int64  
 9   Screening_Availability       460292 non-null  int64  
 10  Stage_at_Diagnosis           460292 non-null  int32  
 11  Treatment_Access             460292 non-null  int64  
 12  Clinical_Trial_Access        460292 non-null  int64  
 13 

In [3]:
from sklearn.preprocessing import MinMaxScaler

mmscaler = MinMaxScaler() 
scaler = StandardScaler()
lc["Country"] = scaler.fit_transform(lc[["Country"]])
lc["Age"] = scaler.fit_transform(lc[["Age"]])

lc["Air_Pollution_Exposure"] = mmscaler.fit_transform(lc[["Air_Pollution_Exposure"]])
lc["Socioeconomic_Status"] = mmscaler.fit_transform(lc[["Socioeconomic_Status"]])
lc["Healthcare_Access"] = mmscaler.fit_transform(lc[["Healthcare_Access"]])
lc["Stage_at_Diagnosis"] = mmscaler.fit_transform(lc[["Stage_at_Diagnosis"]])

X = lc.drop('Final_Prediction', axis=1)  
Y = lc["Final_Prediction"]
#consider using robustscaler

In [5]:
# define dataloaders: make sure to have a train, validation and a test loader
import torch
from torch.utils.data import TensorDataset, DataLoader
from torch.utils.data import random_split

x_train_tensor = torch.tensor(X.values, dtype=torch.float32)  
y_train_tensor = torch.tensor(Y.values, dtype=torch.long)
train_dataset = TensorDataset(x_train_tensor, y_train_tensor)

train_size = int(0.8 * len(train_dataset))  # 80% for training
val_size = len(train_dataset) - train_size  # Remaining 20% for validation
train_dataset, val_dataset = random_split(train_dataset, [train_size, val_size])

batch_size = 32  # Set batch size
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

In [7]:
# define the model
from torch import nn

class NeuralNetwork(nn.Module):
    def __init__(self, d_in, d_out, d_hidden, n_layers = 3):
        super().__init__()
        layers = [nn.Linear(d_in, d_hidden), nn.BatchNorm1d(d_hidden),
            nn.ReLU()]
        for layer in range(n_layers):
            layers += [nn.Linear(d_hidden, d_hidden), nn.BatchNorm1d(d_hidden), nn.ReLU(), nn.Dropout(p=0.3)]
        layers += [nn.Linear(d_hidden, d_out)]
        self.linear_relu_stack = nn.Sequential(*layers)

    def forward(self, x):
        logits = self.linear_relu_stack(x)
        return logits  # logits are returned without sigmoid
        
import torch.optim as optim

model = NeuralNetwork(X.shape[1], 1, 200)
print(model)
loss_fn = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)  # Adjust learning rate as needed



NeuralNetwork(
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=26, out_features=200, bias=True)
    (1): BatchNorm1d(200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Linear(in_features=200, out_features=200, bias=True)
    (4): BatchNorm1d(200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU()
    (6): Dropout(p=0.3, inplace=False)
    (7): Linear(in_features=200, out_features=200, bias=True)
    (8): BatchNorm1d(200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (9): ReLU()
    (10): Dropout(p=0.3, inplace=False)
    (11): Linear(in_features=200, out_features=200, bias=True)
    (12): BatchNorm1d(200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (13): ReLU()
    (14): Dropout(p=0.3, inplace=False)
    (15): Linear(in_features=200, out_features=1, bias=True)
  )
)


In [9]:
# define the loss function and the optimizer
# Training function
def train(model, train_loader, loss_fn, optimizer, device):
    model.train()  # Set model to training mode
    total_loss = 0
    
    for batch_X, batch_y in train_loader:
        batch_X, batch_y = batch_X.to(device), batch_y.to(device).float()  # Move data to GPU if available

        optimizer.zero_grad()  # Reset gradients
        outputs = model(batch_X).squeeze()  # Forward pass (ensure output shape matches labels)
        loss = loss_fn(outputs, batch_y)  # Compute MSE loss
        loss.backward()  # Backpropagation
        optimizer.step()  # Update weights
        
        total_loss += loss.item()  # Accumulate loss
    
    return total_loss / len(train_loader)  # Return average loss per batch

# Evaluation function
def evaluate(model, val_loader, loss_fn, device):
    model.eval()  # Set model to evaluation mode
    total_loss = 0
    
    with torch.no_grad():  # Disable gradient computation
        for batch_X, batch_y in val_loader:
            batch_X, batch_y = batch_X.to(device), batch_y.to(device).float()
            outputs = model(batch_X).squeeze()  # Ensure output shape matches labels
            
            loss = loss_fn(outputs, batch_y)
            total_loss += loss.item()

    avg_loss = total_loss / len(val_loader)
    return avg_loss  # Return BCE loss (lower is better)


In [13]:
# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Training loop
num_epochs = 10  # Adjust the number of epochs based on performance

for epoch in range(num_epochs):
    train_loss = train(model, train_loader, loss_fn, optimizer, device)
    val_loss = evaluate(model, val_loader, loss_fn, device)
    
    print(f"Epoch {epoch+1}/{num_epochs}:")
    print(f"  Train Loss (BCE): {train_loss:.4f}")
    print(f"  Validation Loss (BCE): {val_loss:.4f}")

Epoch 1/10:
  Train Loss (BCE): 0.5069
  Validation Loss (BCE): 0.5063
Epoch 2/10:
  Train Loss (BCE): 0.5032
  Validation Loss (BCE): 0.5037
Epoch 3/10:
  Train Loss (BCE): 0.5025
  Validation Loss (BCE): 0.5029
Epoch 4/10:
  Train Loss (BCE): 0.5023
  Validation Loss (BCE): 0.5031
Epoch 5/10:
  Train Loss (BCE): 0.5020
  Validation Loss (BCE): 0.5033
Epoch 6/10:
  Train Loss (BCE): 0.5018
  Validation Loss (BCE): 0.5021
Epoch 7/10:
  Train Loss (BCE): 0.5017
  Validation Loss (BCE): 0.5025
Epoch 8/10:
  Train Loss (BCE): 0.5015
  Validation Loss (BCE): 0.5021
Epoch 9/10:
  Train Loss (BCE): 0.5014
  Validation Loss (BCE): 0.5023
Epoch 10/10:
  Train Loss (BCE): 0.5013
  Validation Loss (BCE): 0.5032


In [15]:
#test model
model.eval() 
val_loss = 0.0
correct = 0
total = 0

with torch.no_grad():
    for inputs, targets in val_loader:  #use the validation to test, since I have no test loader
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs)

        outputs = outputs.squeeze(1)
        targets=targets.float()
        
       
        loss = loss_fn(outputs, targets)
        val_loss += loss.item()

        
        preds = (outputs > 0.5).float()
        correct += (preds == targets).sum().item()
        total += targets.size(0)


val_loss /= len(val_loader)
accuracy = correct / total

print(f"Validation Loss: {val_loss:.4f}")
print(f"Validation Accuracy: {accuracy:.4f}")



Validation Loss: 0.5032
Validation Accuracy: 0.7994


## **Step 2 continued: Try Stuff**

Use your code above to try different architectures. Make sure to use early stopping! Try adding Dropout and BatchNorm, try different learning rates. How do they affect training and validation performance? 

 **Summarize your observations in a paragraph below:**


Since my dataset has an output of binary variable, I change the loss function to another one. The change of loss seems slow, so I change the learning rate, hoping to make it faster. But it's still hard to lower the validation loss, so I add more layers and epoches. With more training epoches the loss seems to be decreasing, but still hard to reach 0.5 or lower. For the testing part, the accuracy is 0.79, which is not very high. I hope that I can add more functions into the model or try some other methods to optimize the training.

## **Step 3: Hyperparameter Optimization with Optuna**

As you can see, hyperparameter optimization can be tedious. In class we used [optuna](https://optuna.org/#code_examples) to automate the process. Your next task is to wrap your code from Step 2 into an objective which you can then optimize with optuna. Under the [code exaples](https://optuna.org/#code_examples) there is a tab *PyTorch* which should be helpful as it provides a minimal example on how to wrap PyTorch code inside an objective.

**Important: Make sure the model is evaluated on a validation set, not the training data!!**


In [21]:
#!pip install optuna

In [31]:
import optuna

class NeuralNetwork(nn.Module):
    def __init__(self, d_in, d_out, d_hidden, n_layers, dropout_p):
        super().__init__()
        layers = [nn.Linear(d_in, d_hidden), nn.BatchNorm1d(d_hidden), nn.ReLU()]
        for _ in range(n_layers):
            layers += [nn.Linear(d_hidden, d_hidden), nn.BatchNorm1d(d_hidden), nn.ReLU(), nn.Dropout(p=dropout_p)]
        layers += [nn.Linear(d_hidden, d_out)]
        self.linear_relu_stack = nn.Sequential(*layers)

    def forward(self, x):
        return self.linear_relu_stack(x) 

def objective(trial):
    
    d_hidden = trial.suggest_int("d_hidden", 64, 512, step=64)
    n_layers = trial.suggest_int("n_layers", 1, 5)
    dropout_p = trial.suggest_float("dropout_p", 0.1, 0.5)
    lr = trial.suggest_float("lr", 1e-5, 1e-2, log=True)
    optimizer_name = trial.suggest_categorical("optimizer", ["Adam", "SGD"])

    
    model = NeuralNetwork(x_train_tensor.shape[1], 1, d_hidden, n_layers,dropout_p)
    model = model.to(device)

    
    optimizer = optim.Adam(model.parameters(), lr=lr) if optimizer_name == "Adam" else optim.SGD(model.parameters(), lr=lr)
    loss_fn = nn.BCEWithLogitsLoss()

   
    train_loss = train(model, train_loader, loss_fn, optimizer, device)

    
    model.eval()
    preds = []
    true_labels = []
    with torch.no_grad():
        for batch_X, batch_y in val_loader:  
            batch_X = batch_X.to(device)
            batch_y = batch_y.to(device).float()
            output = model(batch_X).cpu().numpy()
            preds.extend(output) 
            true_labels.extend(batch_y.cpu().numpy())

    auc = roc_auc_score(true_labels, preds)  
    return auc  

study = optuna.create_study(direction="maximize")  
study.optimize(objective, n_trials=10)  

print("The best parameter:", study.best_params)


[I 2025-03-06 14:24:32,590] A new study created in memory with name: no-name-3820a3d0-d7fe-4a29-8dcc-06fcb9468e35
[I 2025-03-06 14:25:59,473] Trial 0 finished with value: 0.5021956781183987 and parameters: {'d_hidden': 128, 'n_layers': 5, 'dropout_p': 0.4405594627998367, 'lr': 0.00011192254646357547, 'optimizer': 'SGD'}. Best is trial 0 with value: 0.5021956781183987.
[I 2025-03-06 14:27:05,825] Trial 1 finished with value: 0.5014343377381695 and parameters: {'d_hidden': 384, 'n_layers': 1, 'dropout_p': 0.17266511435810852, 'lr': 9.867612329104415e-05, 'optimizer': 'Adam'}. Best is trial 0 with value: 0.5021956781183987.
[I 2025-03-06 14:28:20,477] Trial 2 finished with value: 0.5048189806764775 and parameters: {'d_hidden': 64, 'n_layers': 3, 'dropout_p': 0.28876430627807925, 'lr': 0.0001568931733391693, 'optimizer': 'Adam'}. Best is trial 2 with value: 0.5048189806764775.
[I 2025-03-06 14:29:17,137] Trial 3 finished with value: 0.4971452393195883 and parameters: {'d_hidden': 256, 'n_l

The best parameter: {'d_hidden': 512, 'n_layers': 4, 'dropout_p': 0.415032311722213, 'lr': 0.0021586453079194693, 'optimizer': 'SGD'}


## **Step 3 continued: Insights**

Did you find the hyperparameter search helpful? Does it help to increase the number of trials in the optimization? Note that so far we have used the simplest version of optuna which has many nice features. Can you discover more useful features by browsing the optuna website? (Hint: try pruning)

The hyperparameter search with Optuna has been helpful in finding optimal configurations and improving model performance. Increasing the number of trials can further enhance the search, testing more combinations to find the best solution. Furthermore, it done well both in optimazing and time-saving, if I apply pruning into this optimization, it will work faster than the one with manual optimization.

## **Step 4: Final Training**

Now that you have found a good hyperparameter setting the validation set is no longer needed. The last step is to combine the training and validation set into a combined training set and retrain the model under the best parameter setting found. Report your final loss on your test data.

In [50]:

best_params = {'d_hidden': 512, 'n_layers': 4, 'dropout_p': 0.415032311722213, 'lr': 0.0021586453079194693, 'optimizer': 'SGD'}


epochs = 10  


full_train_dataset = torch.utils.data.ConcatDataset([train_dataset, val_dataset])


train_loader = DataLoader(full_train_dataset, batch_size=batch_size, shuffle=True)


d_hidden = best_params['d_hidden']
n_layers = best_params['n_layers']
dropout_p = best_params['dropout_p']
lr = best_params['lr']
optimizer_choice = best_params['optimizer']


model = NeuralNetwork(d_in=26, d_out=1, d_hidden=d_hidden, n_layers=n_layers, dropout_p=dropout_p)
model.to(device)

if optimizer_choice == 'SGD':
    optimizer = torch.optim.SGD(model.parameters(), lr=lr)
else:
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)


loss_fn = torch.nn.BCEWithLogitsLoss()

model.train()
for epoch in range(epochs):
    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = loss_fn(outputs.squeeze(1), targets.float())
        loss.backward()
        optimizer.step()




In [54]:
model.eval() 
val_loss = 0.0
correct = 0
total = 0

with torch.no_grad():
    for inputs, targets in val_loader:  
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs)

        outputs = outputs.squeeze(1)  
        targets = targets.float()
        
        loss = loss_fn(outputs, targets)
        val_loss += loss.item()

        preds = (outputs > 0.5).float()  
        correct += (preds == targets).sum().item()  
        total += targets.size(0) 

val_loss /= len(val_loader)
accuracy = correct / total

print(f"Validation Loss: {val_loss:.4f}")
print(f"Validation Accuracy: {accuracy:.4f}")



Validation Loss: 0.5036
Validation Accuracy: 0.7994


## **Final Submission**
Upload your submission for Milestone 2 to Canvas. 
Happy Deep Learning! 🚀