# Project 6: Neural Networks with PyTorch
---

### Names: Keiran Berry
### Course Level: Undergraduate
---
## Note: Due to the length of time required to train the ANN models, and there are 2-3 of them depending on the course level you are enrolled, please submit a PDF of your solution along with your .ipynb file
---

**Introduction:**
* In this project, we explore Artificial Neural Networks for dimensionality reduction and classification using PyTorch.

<u>**Note:** The project will be graded by me running your notebook from top to bottom (choosing the "run all" option) - if it errors out at any point - this is where I stop grading and you'll lose ALL points after the error - Even if they are correct!</u>

* <u>Moral of the story is, **Make sure your entire notebook executes from top to bottom and you're happy with the results BEFORE you submit to the drop box!**</u>

**Objectives:**
* The objective of this project is to use PyTorch to investigate ANN-based dimensionality reduction techniques and classifiers, then evaluate their performance on image data.

# Let's grab the data and have a look at the dataset

## All Students

---
**Problem A (60pts)**

We'll be looking at the Fashion MNIST data that we investigated in project 5, so go ahead and grab your project 5 code to read in the data.

* Unlike project 5, we do NOT need to normalize our data between [-1,1], instead, we want our data to be normalized to [0,1]

In [7]:
# imports, parameters, data, etc.
# Fashion MNIST #
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
import kagglehub # need to install this: pip install kagglehub
# !pip install kagglehub -U # update when needed
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Define Hyperparameters (can do this later if preferred)
batch_size = 100 # Number of images we use before updating the gradients
learning_rate = 0.001 # Learning rate for gradient updates
epochs = 10 # Number of times to run through the entire training loop and all batches
latent_dim = 64  # Dimension of the latent space

# Download latest version
path = kagglehub.dataset_download("zalando-research/fashionmnist")

# The data is stored in a path on your hard drive in .csv files #
print("Path to dataset files:", path)

# Get the data from the .csv files #
data_train = pd.read_csv(path + "/fashion-mnist_train.csv")
data_test = pd.read_csv(path + "/fashion-mnist_test.csv")

Path to dataset files: C:\Users\101080740\.cache\kagglehub\datasets\zalando-research\fashionmnist\versions\4


1 (10pts). Separate out the data between training and testing for both labels and data

* Normalize between [0,1], convert to a torch tensor, and set up your dataloaders (one for training and one for testing)
* **Recall:** We need our data to be of type torch tensor for using the pytorch modules

In [9]:
from torch.utils.data import TensorDataset

# Grab the training data and testing data and store for model development #
# same code as Project 5
trainX = data_train.iloc[:, 1:]  # first column is labels so we want the rest
trainY = data_train.iloc[:, 0]   # now we want the labels

testX = data_test.iloc[:, 1:] # rest are data
testY = data_test.iloc[:, 0]  # first contains labels

# Normalize the digits to [0,1] for better scaling and convert to torch tensors for processing #
trainX = trainX / 255.0
testX = testX / 255.0


# We don't need the labels to be tensors (yet) since we don't need them for an autoencoder (it's unsupervised)
trainXTensor = torch.tensor(trainX.values, dtype = torch.float32)
testXTensor = torch.tensor(testX.values, dtype = torch.float32)

# set up the dataloaders #
trainData = TensorDataset(trainXTensor)
testData = TensorDataset(testXTensor)

trainLoader = DataLoader(trainData, batch_size = 64, shuffle = True)
testLoader = DataLoader(testData, batch_size = 64, shuffle = True)


ValueError: could not determine the shape of object type 'DataFrame'

2 (20pts). Define an Autoencoder Model

* You can use the nn.Sequential module to define both the encoder and decoder.  
* Use the following for dimensions:
    - Encoder (single hidden layer):
        + input_dim = 28*28 (size of image vector)
        + hidden_dim = 256 (number of hidden neurons)
        + latent_dim = 64 (dimension of the latent space)
        + Note: the only dimension that is "fixed" is the input dimension (dictated by the number of features)
            - Similar to PCA, the latent dimension controls how "much" information is kept
        + Use linear activation for the hidden layer and ReLU for the latent output
    - Decoder (single hidden layer):
        + input_dim = latent_dim
        + hidden_dim = 256 (number of hidden neurons)
        + output_dim = 28*28 (same size as the image vectors)
        + Note: the only dimension that is "free" is the hidden dimension (the others are dictated by the encoder and number of features)
            - Output has to be the same as the input because we're "comparing" the reconstruction of each image as the loss
        + Use ReLU activation for the hidden layer and Sigmoid for the decoder output (pixel range is [0,1])

In [None]:
# Define the Autoencoder Model
input_dim = 28 * 28
hidden_dim = 256
latent_dim = 64

# defining autoencoder based on specifications

class Autoencoder(nn.Module):
    def __init__(self, input_dim, encoding_dim):
        super(Autoencoder, self).__init__()
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),                        
            nn.Linear(hidden_dim, latent_dim)
        )
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim, hidden_dim),
            nn.ReLU(),                        
            nn.Linear(hidden_dim, input_dim),  
            nn.Sigmoid()                   
        )

    def forward(self, x):
        x = x.view(x.size(0), -1)
        
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return decoded

3 (10pts). Train the model

* Use MSELoss, and the Adam optimizer.  The learning rate, epochs, batch size, etc. are all defined above

In [None]:
# Define the training function (can also just wrap this into the training loop)
# will be wrapping it into the training loop

# Initialize Model, Loss, and Optimizer
autoencoder = Autoencoder(input_dim, latent_dim)
criterion = nn.MSELoss()
optimizer = optim.Adam(autoencoder.parameters(), lr=learning_rate)

# Function to Train the Autoencoder
def trainAutoencoder(model, loader, criterion, optimizer, epochs):
    model.train(True)
    
    for epoch in range(epochs):
        runningLoss = 0.0
        for data in loader:
            inputs = data[0].to(torch.device("cuda" if torch.cuda.is_available() else "cpu"))
            #inputs = inputs.view(inputs.size(0), -1) # convert the images into a 1d vector
            
            optimizer.zero_grad() # gradients accumulate and we dont want that
            outputs = model(inputs)
            loss = criterion(outputs, inputs)
            
            loss.backward()
            optimizer.step()
            runningLoss += loss.item()

        # printing information for the epoch
        print(f"Epoch [{epoch+1}/{epochs}], Loss: {runningLoss/len(loader):.4f}")
    model.train(False)

In [None]:
# Train the model (e.g., write the Training Loop)
trainAutoencoder(autoencoder, trainLoader, criterion, optimizer, epochs)

4 (10pts). Test your model for reconstruction of a few sample images

* Create a 2 x 10 plot that shows one sample from each class with the top row containing the original images and the bottom row containing the reconstruction of these images
    - e.g., grab one image from each class and plot it, then, run each image through your autoencoder to get the reconstructed version, and plot it to compare how "well" your autoencoder reconstructs the image

* Your plot should look something like the image below:

<img src="Figures/Recon.png" alt="Autoencoder Reconstruction" style="width:800px;"/>

In [None]:
# Test the Autoencoder (write a quick function to grab the original images from the torch tensor, run them through the model, and generate the reconstructed images)
# Your function should return both the original images you used, and their reconstruction
def testAutoencoder(model, loader, numClasses = 10):
    model.eval()
    
    classesSeen = set()
    originalImages = []
    reconstructedImages = []
    
    with torch.no_grad():
        for data in loader:
            inputs, labels = data
            
            inputsFlat = inputs.view(inputs.size(0), -1)
            outputs = model(inputsFlat)
            
            for i in range(inputs.size(0)):
                label = labels[i].item()
                
                if label not in classesSeen:
                    originalImages.append(inputs[i].view(28, 28)) 
                    reconstructedImages.append(outputs[i].view(28, 28))
                    
                    classesSeen.add(label)
                    
                    if len(classesSeen) >= numClasses:
                        break 
            if len(classesSeen) >= numClasses:
                break
    
    return originalImages, reconstructedImages

In [None]:
# Visualize Original and Reconstructed Images (use your function to crate the subplot, etc.)
# Recall, your images are vectorized (785 dimensional) so you need to reshape them to visualize
# You might consider looking at the .view() method in torch - works similar to the .reshape method in numpy

# need the labels now 
testYTensor = torch.tensor(testY.values, dtype = torch.long)
testData = TensorDataset(testXTensor, testYTensor)
testLoader = DataLoader(testData, batch_size = 64, shuffle = False)

originalImages, reconstructedImages = testAutoencoder(autoencoder, testLoader)

fig, axes = plt.subplots(2, len(originalImages), figsize=(15, 5))
    
for i in range(len(originalImages)):
    # top row
    axes[0, i].imshow(originalImages[i].squeeze(), cmap='gray')
    axes[0, i].axis('off')
    
    # bottom row
    axes[1, i].imshow(reconstructedImages[i].squeeze(), cmap='gray')
    axes[1, i].axis('off')
    
plt.suptitle('Original Images (Top) vs Reconstructed Images (Bottom)')
plt.show()

5 (10pts). Using the reduced dimensional data, build a shallow classifier (either logistic regression, tree, svc, etc.) on the reduced dimensional data and evaluate the classification performance (classification report and a confusion matrix)

* Note that if we just use the encoder, we can effectively reduce the dimensionality of our data from 784 dimensions to the dimension of the latent space (64 in this case)
    - This is "similar" to PCA, however, recall PCA assumes a linear model, here we've introduced some nonlinear activation so we're effectively doing nonlinear dimensionality reduction
    - Note that the size of the latentspace dimension will have an effect on classification accuracy
        + Recall that for PCA we needed around 80 components to capture approx. 85% of the variance

#### **Important Note:** Your data (both input and output will be a torch tensor - numpy and matplot lib have no clue how to deal with that)
* You need to detach your data from the torch graph, then convert from a tensor to a numpy array
    - this is done via: my_numpy_data = my_tensor_data.detach().numpy() as a example

In [None]:
# Project the training data onto the latent space, then
# Convert the latent data from a torch tensor to a numpy array to build the sklearn classifier
# Note: you must first detach the data from the torch tensor network (e.g., my_torch_data.detach().numpy() returns the numpy equivalent)

autoEncoder.eval()
latentData = []
labels = []
trainYTensor = torch.tensor(trainY.values, dtype = torch.long)
trainData = TensorDataset(trainXTensor, trainYTensor)
trainLoader = DataLoader(trainData, batch_size = 64, shuffle = True)

with torch.no_grad():
    for inputs, targets in trainLoader:
        inputs = inputs.view(inputs.size(0), -1)
        encoded = autoEncoder.encoder(inputs)
        
        latentData.append(encoded.detach().numpy())  # detatching as told in example
        labels.append(targets.numpy())

latentData = np.vstack(latentData)
labels = np.hstack(labels) 

# training data should now be projected onto latent space
# did labels as well just to be safe

In [None]:
# Build the shallow classifier
from sklearn.linear_model import LogisticRegression

logisticRegression = LogisticRegression()
logisticRegression.fit(latentData, labels)

# need to do the same as before for the test set
autoEncoder.eval()
latentTest = []
testLabels = []

# already have the test loader from before

with torch.no_grad():
    for inputs, targets in testLoader:
        inputs = inputs.view(inputs.size(0), -1)
        encoded = autoEncoder.encoder(inputs)
        
        latentTest.append(encoded.detach().numpy())  # detatching as told in example
        labels.append(targets.numpy())

latentTest = np.vstack(latentTest)
testLabels = np.hstack(testLabels) 

testPredictions = logisticRegression.predict(latentTest)

In [None]:
# Let's look at the predicition accuracy 
# Get entire report of results #
from sklearn.metrics import classification_report

print("Classification Report:\n", classification_report(test_labels, test_predictions))

In [None]:
# plot the confusion matrix from the predictions
from sklearn.metrics import confusion_matrix

print("Confusion Matrix:\n", confusion_matrix(testLabels, testPredictions))

---
**Problem B (40pts)**

1 (20pts). Let's repeat the above experience but instead of doing any dimensionality reduction, we just use the MLP directly to do the classification, first step, build the MLP model

* Although we can't "prefectly" match the autoencoder network topology, let's try to stay somewhat close for an apples to apples comparison:
* **Hyperparameters**
    - Three hidden layers: input_dim -> 256 -> 128 -> 64 -> num_classes
    - Same learning rate, epochs, etc.
    - Activation: use ReLU for all hidden units, and softmax for the output unit
    - There are several loss functions you can use:  I used the CrossEntropy loss and it seems to do a pretty good job
    - Use the Adam optimizer since we have many nonlinearities over the three layers

In [None]:
# Need to modify our dataloader since we now need the labels (supervised learning)
from torch.utils.data import TensorDataset # useful to join the data and labels for the dataloader

# Now we need to convert the labels to torch tensors as well (supervised)


# Use the TensorDataset method to create a supervised training dataset (join the data)


# set up the dataloaders (only need the dataloader for the training set) #



In [None]:
# Define the MLP Model

    

In [None]:
# Define the training function (can also just wrap this into the training loop)

# Initialize Model, Loss, and Optimizer


# Function to Train the Autoencoder


In [None]:
# Train the model (e.g., write the Training Loop)


* Note: the network will produce a vector of 10 probabilities, you'll want to convert that to a class label, e.g., [0,1,2,3,4...] to match your testing labels and do the comparison

In [None]:
# Predict for all samples (print the classification report, etc)#


# Need to convert your predictions from a probabalistic output to a class label - np.argmax can help here


In [None]:
# Plot the confusion matrix


---

## Problem A and B Discussion (provide a detailed discussion of the differences and results from each problem above - to include which categories are difficult to classify for each method, etc):

---

---

## CSC 549 Students Only!

# Redo problem B but using a convolutional neural network (i.e., don't vectorize the images)

* You can play with the model arcitecture, hyperparameters, activation, strides, pooling, etc. 
* Then compare and contrast the three different variations of the model with regard to classification of the Fashion MNIST dataset