---
<h1 style="text-align: center;">
CSCI 4521: Applied Machine Learning (Fall 2024)
</h1>

<h1 style="text-align: center;">
Homework 5
</h1>

<h3 style="text-align: center;">
(Due Tue, Nov. 26, 11:59 PM CT)
</h3>

---

![nn.png](attachment:fde9d58f-62e0-4c07-aacb-8334e3ef1027.png)

Image from https://aibusiness.com/ml/how-neural-networks-can-think-like-humans-and-why-it-matters#close-modal

### In this homework, your task is to experiment with fully-connected, feed-forward neural networks to predict whether a sonar signal bounces off a metal cylinder or a cylindrical rock. The only data you have available is the sonar data in the dataset `sonar_csci4521_hw5.csv`. Each row is a sample and columns are the sonar features, and the last column is the label of metal ("M") or rock ("R").

### You do not need to clean or preprocess the data in this homework except encoding the label using the `LabelEncoder`; focus on building and training neural networks. You still need to determine what kind of neural network to use, which and how to tune any hyperparameters, how to measure performance, which models to select, and which final model to use. We do expect that you will try a few different architectures (e.g., number of layers, number of units in each layer), activation functions, and gradient descent algorithms (e.g., stochastic gradient descent, Adagrad, RMSprop, Adam). We also expect that you will tune hyperparameters (not necessarily with cross validation but definitely only using the training dataset) and measure the performance of the final model on a held-out test set. Additionally, we expect you to track the performance of your experiments using Tensorboard, for example, track the average loss and accuracy per epoch on the training and test sets.

### You must use **PyTorch** to build and train your neural network, no other packages will be accepted (for example, you cannot use Tensorflow). If you use anything other than PyTorch to build your network, you will receive no credit for this homework. Make sure to write and submit clean, working code. Reminder, you cannot use ChatGPT or similar technologies. Please see the syllabus for more details.

### You also need to submit a short report of your work describing all steps you took, explanations of why you took those steps, results, what you learned, how you might use what you learned in the future, and your conclusions. We expect the report to be well-written and clearly describe everything you've done and why.

---

### Write your code here

In [58]:
# PyTorch Imports
import torch
from torch import nn, optim
from torch.utils.data import DataLoader, TensorDataset
from torch.utils.tensorboard import SummaryWriter

# Sklearn Imports
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler

# Google Colab Imports
from google.colab import drive
drive.mount('/content/drive')

# Misc Imports
import pandas as pd
import numpy as np
import matplotlib as plt

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [59]:
# Step 1: Load Data

# Load dataset
df = pd.read_csv("/content/drive/MyDrive/colab_data/sonar_csci4521_hw5.csv", header=None)

# No. of Samples
print("Number of samples = ", df.count())
# No. of Features available
print("Number of features = ", len(df.columns))

# Print head upto 10
df.head(10)

Number of samples =  0     208
1     208
2     208
3     208
4     208
     ... 
56    208
57    208
58    208
59    208
60    208
Length: 61, dtype: int64
Number of features =  61


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,51,52,53,54,55,56,57,58,59,60
0,0.02,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,...,0.0027,0.0065,0.0159,0.0072,0.0167,0.018,0.0084,0.009,0.0032,R
1,0.0453,0.0523,0.0843,0.0689,0.1183,0.2583,0.2156,0.3481,0.3337,0.2872,...,0.0084,0.0089,0.0048,0.0094,0.0191,0.014,0.0049,0.0052,0.0044,R
2,0.0262,0.0582,0.1099,0.1083,0.0974,0.228,0.2431,0.3771,0.5598,0.6194,...,0.0232,0.0166,0.0095,0.018,0.0244,0.0316,0.0164,0.0095,0.0078,R
3,0.01,0.0171,0.0623,0.0205,0.0205,0.0368,0.1098,0.1276,0.0598,0.1264,...,0.0121,0.0036,0.015,0.0085,0.0073,0.005,0.0044,0.004,0.0117,R
4,0.0762,0.0666,0.0481,0.0394,0.059,0.0649,0.1209,0.2467,0.3564,0.4459,...,0.0031,0.0054,0.0105,0.011,0.0015,0.0072,0.0048,0.0107,0.0094,R
5,0.0286,0.0453,0.0277,0.0174,0.0384,0.099,0.1201,0.1833,0.2105,0.3039,...,0.0045,0.0014,0.0038,0.0013,0.0089,0.0057,0.0027,0.0051,0.0062,R
6,0.0317,0.0956,0.1321,0.1408,0.1674,0.171,0.0731,0.1401,0.2083,0.3513,...,0.0201,0.0248,0.0131,0.007,0.0138,0.0092,0.0143,0.0036,0.0103,R
7,0.0519,0.0548,0.0842,0.0319,0.1158,0.0922,0.1027,0.0613,0.1465,0.2838,...,0.0081,0.012,0.0045,0.0121,0.0097,0.0085,0.0047,0.0048,0.0053,R
8,0.0223,0.0375,0.0484,0.0475,0.0647,0.0591,0.0753,0.0098,0.0684,0.1487,...,0.0145,0.0128,0.0145,0.0058,0.0049,0.0065,0.0093,0.0059,0.0022,R
9,0.0164,0.0173,0.0347,0.007,0.0187,0.0671,0.1056,0.0697,0.0962,0.0251,...,0.009,0.0223,0.0179,0.0084,0.0068,0.0032,0.0035,0.0056,0.004,R


In [60]:
# Step 2: Make train and test data

# Seperate features and labels
X = df.iloc[:, :-1].values
y = LabelEncoder().fit_transform(df.iloc[:, -1].values)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=5782267, stratify=y)

# Standardize
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Training tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.long)

# Testing tensors
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.long)

# Create PyTorch datasets and dataloaders
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

In [61]:
# Step 3: Helper function to build models

def build_model(input_size, hidden_sizes, output_size, activation_fn):
    '''
    Function to help build models of different architectures. Lets me setup
    models of different layer numbers, input sizes, output sizes, and hidden
    sizes, as well as multiple activation functions.
    '''
    layers = []
    prev_size = input_size
    for next_size in hidden_sizes:
        # Add a layer and the function after
        layers.append(nn.Linear(prev_size, next_size))
        layers.append(activation_fn())

        # Track what previous layer size was to connect the next NN layer
        prev_size = next_size

    # Add the output layer
    layers.append(nn.Linear(prev_size, output_size))
    return nn.Sequential(*layers)

In [62]:
# Step 4: Helper function for epoch training for model

def train_model(model, train_loader, optimizer_name, learning_rate, epochs=50):
    '''
    Helper function to train models. Uses different optimizers and learning rates, as
    well as modifiable epochs.
    '''
    # Variable to track loss
    criterion = nn.CrossEntropyLoss()
    if optimizer_name == "Adam":
        optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    elif optimizer_name == "SGD":
        optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9)
    elif optimizer_name == "RMSprop":
        optimizer = optim.RMSprop(model.parameters(), lr=learning_rate)

    # Add TensorBoard
    writer = SummaryWriter(log_dir=f'runs/sonar_{optimizer_name}_lr{learning_rate}')

    # Training loop
    for epoch in range(epochs):
        model.train()
        epoch_loss = 0
        correct = 0
        total = 0

        # Batch train the model per epoch
        for X_batch, y_batch in train_loader:
            optimizer.zero_grad()
            outputs = model(X_batch)
            loss = criterion(outputs, y_batch)
            loss.backward()
            optimizer.step()

            epoch_loss += loss.item()
            _, predicted = outputs.max(1)
            correct += (predicted == y_batch).sum().item()
            total += y_batch.size(0)

        train_accuracy = 100 * correct / total
        writer.add_scalar('Loss/Train', epoch_loss / len(train_loader), epoch)
        writer.add_scalar('Accuracy/Train', train_accuracy, epoch)

    writer.close()
    return model


In [63]:
# Step 5: Helper function to evaluate each created model on test data

def evaluate_model(model, test_loader):
    '''
    Evaluates model accuracy for the test data.
    '''
    model.eval()
    correct = 0
    total = 0

    with torch.no_grad():
        for X_batch, y_batch in test_loader:
            outputs = model(X_batch)
            _, predicted = outputs.max(1)
            correct += (predicted == y_batch).sum().item()
            total += y_batch.size(0)

    test_accuracy = 100 * correct / total
    print(f'Test Accuracy: {test_accuracy:.2f}%')
    return test_accuracy

In [64]:
# Step 6: "Main" function, code that runs the tests for different architectures
# activators, learning rates and optimizers,

# Possible Architectures
architectures = [[64, 32], [128, 64, 32], [256, 128, 64, 32]]

# Possible activation functions
activations = [nn.ReLU, nn.Tanh, nn.Sigmoid]

# Possible optimizers
optimizers = ["Adam", "SGD", "RMSprop"]

# Possible learning rates
learning_rates = [0.001, 0.01]

# Reasoning in report
epochs = 40
# input_size = X_train.shape[1]
# output_size = 2

# Reporting variables
best_model = None
best_accuracy = 0
params = []

# Testing loop
for arch in architectures:
    for activation_fn in activations:
        for optimizer in optimizers:
            for lr in learning_rates:
                print(f"Training with Architecture={arch}, Activation={activation_fn.__name__}, Optimizer={optimizer}, LR={lr}")
                model = build_model(input_size, arch, output_size, activation_fn)
                model = train_model(model, train_loader, optimizer, lr, epochs)
                accuracy = evaluate_model(model, test_loader)
                if accuracy > best_accuracy:
                    best_accuracy = accuracy
                    best_model = model
                    params = [arch, activation_fn, optimizer, epochs]

print(f"Best Model Accuracy: {best_accuracy:.2f}%")
print(f"Best Model Parameters: {params}")
torch.save(best_model.state_dict(), 'best_sonar_model.pth')

Training with Architecture=[64, 32], Activation=ReLU, Optimizer=Adam, LR=0.001
Test Accuracy: 78.57%
Training with Architecture=[64, 32], Activation=ReLU, Optimizer=Adam, LR=0.01
Test Accuracy: 83.33%
Training with Architecture=[64, 32], Activation=ReLU, Optimizer=SGD, LR=0.001
Test Accuracy: 66.67%
Training with Architecture=[64, 32], Activation=ReLU, Optimizer=SGD, LR=0.01
Test Accuracy: 80.95%
Training with Architecture=[64, 32], Activation=ReLU, Optimizer=RMSprop, LR=0.001
Test Accuracy: 83.33%
Training with Architecture=[64, 32], Activation=ReLU, Optimizer=RMSprop, LR=0.01
Test Accuracy: 80.95%
Training with Architecture=[64, 32], Activation=Tanh, Optimizer=Adam, LR=0.001
Test Accuracy: 76.19%
Training with Architecture=[64, 32], Activation=Tanh, Optimizer=Adam, LR=0.01
Test Accuracy: 78.57%
Training with Architecture=[64, 32], Activation=Tanh, Optimizer=SGD, LR=0.001
Test Accuracy: 73.81%
Training with Architecture=[64, 32], Activation=Tanh, Optimizer=SGD, LR=0.01
Test Accuracy: 

---

### Write your report here

1. Step 1: Loading the data
* After setting up imports I simply load the data
* Since there were no specifics, I used pandas just because I am more comfortable with it
* Also helped me confirm that the data wasn't filled with nulls
* I skipped all preprocessing and cleaning except labelling and standardizing as per assignment instructions.

2. Step 2: Splitting the data
* Since no cleaning was required, I moved to splitting the data into a standard 80-20 train test split.
* Then I performed some basic data conversions using pytorch so I can use the data in neural network training.

3. Step 3: Building model funciton
* Since I need to test mutliple types of networks I decided to make the building training and evaluating portions functions.
* The function takes in an input size, architecture, output size and activation function.
* Then it constructs a neural network layer for each number of nodes as mentioned per layer and adds the activation function specified to each layer
* It returns a model of the defined architectural configuration, with the specified activation function and output size.
* Input size is feature length, and output size is 2, since we're predicting between 2 possible outcomes.

4. Step 4: Training Models
* Inputs: Takes in the previously built model, the training data loader to perform batch-wise training, and an optimizer as well as a learning rate.
* Epochs: I found that around 40 epochs performed most optimally, after which the perfromance started degrading, while also keeping training time low.
* I used CrossEntropyLoss to find the loss in each epoch
* This function allows for three optimizers to be used with it, ADAM, SGD, or RSMprop, and this could probably be extended in the future.
* Also added tensorboard logging here.
* In the loop itself,
  * Set the model to training mode.
  * Reset temporary variables
  * Processes each batch:
    * Computes model predictions.
    * Calculate the loss
    * Backpropagate loss and update model parameters
  * Tracks cumulative loss and accuracy metrics.
  * Logs the average loss and training accuracy to TensorBoard for each epoch.
* This procedure will let me train different models efficiently since all updates and changes are made per model optimizer/architecture/etc.


5. Step 5: Evaluating Model

* Simple function to test the trained models performance on the test data.
* Returns a percentage of accuracy.




6. Step 6: Main Loop
* Finally, we put these function together for a multitude of combinations to see which model performs best.
* Reasoning for architectures - seeing whether layer size and number of layers influences the performance so I increased oth linearly to see if there was positive correlation
* Reasoning for activation functions :
 * ReLu - reduces the risk of vanishing graidents, and can produce quick convergences
 * Tanh - helps guage any negative values, which might be present, since I didn't examine the entirey of the dataset
 * Sigmoid - Standard binary classifcation activation function
* Reasoning for optimizers - these were easily available optimizers, all of which make sense withing the context of the task.
 * SGD - Good generalisation because of stable descent
 * RSMprop - Something I wanted to test as an option.
 * Adam - generally accepted as a good optimizer and works well for most use cases.
* Reasoning for learning rates - similar to architectures I wanted to see whether there was a some magnitude of a power of 10 that was most efficient.


Lessons:
* Tracking metrics with tensorboard, efficiency of neural networks and necessity to try different optimizers, activation functions and architectures, to find a good combination of variables.

Future Applications
* I have known about neural networks for a while, so I have used them before. I am currently working on using neural networks for financial applications. It also seems evident to me that a lot of research occurs in this area, especially with understaning what is being learned, to reduce the black box effect. That would be a great area to read into as well.

Conclusions
* Over multiple iterations I found that the Best Model Accuracy was 88.10%
with this configuration:
* Best Model Parameters: [[256, 128, 64, 32], <class 'torch.nn.modules.activation.ReLU'>, 'Adam', 40]
* This means the the neural network with the above settings can accurately predict whether the sonar signal bounced off of metal or rock 88% of the time.

---