# P1: Nifty Neural Networks!


## Table Of Content

1. Introduction
2. Preliminaries
3. Software Setup
4. Implementation
5. Grading Rubric
6. Report guidelines

## 1. Introduction

Neural networks, at their core, function like any other mathematical function that can be evaluated. The process of evaluating a neural network is referred to as the forward pass. During this step, inputs are passed through the network layers, and outputs are generated.

To optimize the network's performance, its weights and biases need to be adjusted. This is done through a process called backward propagation (or backpropagation). In this step, the gradients of the loss function with respect to each parameter are calculated, and these gradients are subtracted from the corresponding weights and biases, allowing the network to learn and improve its predictions.

In this assignment, you will dive into the implementation of custom layers in PyTorch. Specifically, you will focus on coding the forward pass and computing the gradients necessary for the backward pass. Before you begin, make sure to review the grading rubric to understand the criteria for evaluation.

## 2. Preliminaries

### CIFAR10 Dataset

CIFAR-10 is a dataset consisting of 60000, 32×32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. More details about the datset can be found [here](http://www.cs.toronto.edu/~kriz/cifar.html).

Sample images from each class of the CIFAR-10 dataset is shown below:

![CIFAR 10](./artifacts/cifar10.png)

In this project, you will classify images into these 10 classes using the provided pipeline,loaders and helper classes.

Additionally, you are expected to generate a confusion matrix to evaluate your model's performance. For guidance on plotting a confusion matrix in PyTorch, please refer to this [resource](https://stackoverflow.com/questions/74020233/how-to-plot-confusion-matrix-in-pytorch).

### Linear Layer
A linear layer in a neural network performs a linear transformation of the input data. It is defined by the following components:

1. Weights
2. Biases

More details below,

[Pytorch Linear Layer](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html).

You can find information about the dimension of weights and biases in custom_layers.py

### Soft Max
The Softmax function is commonly used in neural networks for multi-class classification problems. It converts a vector of raw scores (logits) into probabilities, making it possible to interpret the output as the likelihood of each class.

[Sample implementation](https://stackoverflow.com/questions/34968722/how-to-implement-the-softmax-function-in-python)

More details [here](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html).

### Convolutional Layer

A convolutional layer is a fundamental building block in Convolutional Neural Networks (CNNs) used primarily for processing grid-like data such as images. It applies convolution operations to detect local features in the input.

Although it is called a convolutional layer, the PyTorch implementation of conv2d does not actually perform a convolution in the mathematical sense. Instead, it performs a cross-correlation operation, where the kernel is not flipped. This distinction is important to note, but for most deep learning projects including this one, cross-correlation is perfectly fine as the weights will automatically adjust during training.

For more details, refer to [P0](https://rbe549.github.io/rbe474x/fall2024a/proj/p0/).


## 3. Software Setup

Use a code editor like VSCode and open this entire folder.

For each part, you will be implementing the corresponding layers in custom_layers.py

The code will automatically be tested with test.py. 

To run the test, open a terminal in the current folder and run,

`pytest -s -v test.py`

## 4. Implementation 


### Part1 : Implement Your Custom Layers for Multi Layer Perceptron (MLP)

Open custom_layers.py and implement a fully connected, relu and softmax layer.

Verify it by running the below code. Feel free to modify the below snippet. But do not modify my test.py

For more information about supplying gradients, please refer to [examples_autograd](https://pytorch.org/tutorials/beginner/examples_autograd/two_layer_net_custom_function.html).

In [None]:
import importlib
import torch
import torch.nn as nn

import networks as net
importlib.reload(net)

print("\nLinear")
u = torch.rand((1, 10))
customLayer = net.CustomLinear(10, 2)
inbuiltLayer = nn.Linear(in_features=10, out_features=2)

inbuiltLayer.weight.data.copy_(customLayer.weight.data)
inbuiltLayer.bias.data.copy_(customLayer.bias.data)

y_custom = customLayer(u)
y_inbuilt = inbuiltLayer(u)
print("Inference for linear layer")
print(y_custom)
print(y_inbuilt)

lossFunc = nn.MSELoss()

loss_custom = lossFunc(y_custom, torch.zeros_like(y_custom))
loss_in = lossFunc(y_inbuilt, torch.zeros_like(y_inbuilt))

loss_custom.backward()
loss_in.backward()

print("\ngraidents for linear layer")
print(customLayer.weight.grad)
print(inbuiltLayer.weight.grad)

print(customLayer.bias.grad)
print(inbuiltLayer.bias.grad)

# RELU
print("\nRELU")
u1 = torch.rand((1, 10), requires_grad=True)
u2 = u1.detach().clone()
u2.requires_grad_()

customLayer = net.CustomReLU()
inbuiltLayer = nn.ReLU()

y_custom = customLayer(u1)
y_inbuilt = inbuiltLayer(u2)

loss_custom = lossFunc(y_custom, torch.zeros_like(y_custom))
loss_in = lossFunc(y_inbuilt, torch.zeros_like(y_inbuilt))

loss_custom.backward()
loss_in.backward()

print("inference")
print(y_custom, y_inbuilt)

print("gradients of loss relative to the input")
print(u1.grad)
print(u2.grad)

# SOFTMAX
print("\n SoftMax")

u1 = torch.rand((1, 3), requires_grad=True)
u2 = u1.detach().clone()
u2.requires_grad_()
customLayer = net.CustomSoftmax(1)
inbuiltLayer = nn.Softmax()

y_custom = customLayer(u1)
y_inbuilt = inbuiltLayer(u2)

print(y_custom)

loss_custom = lossFunc(y_custom, torch.zeros_like(y_custom))
loss_in = lossFunc(y_inbuilt, torch.zeros_like(y_inbuilt))

loss_custom.backward()
loss_in.backward()

print("gradients of loss relative to the input")
print(u1.grad)
print(u2.grad)


### Part 2: MLP Network Training

Now that you have implemented an MLP from scratch, it's time to train it and verify its ability to classify objects. This network is expected to achieve an accuracy of approximately 40%.

Additionally, you are required to save one of your best model checkpoints as mlp.pth in the current folder. This file will be used for automated testing.

Furthermore, please implement a confusion matrix in the utils file, specifically within the val_step method of the Pipeline class. You may use any available implementation of the confusion matrix, but ensure that all tests continue to pass.

In [None]:
# Lets train a CIFAR10 image classifier
import importlib
import torch
import torch.nn as nn
import numpy as np
import networks as net
import os
importlib.reload(net)

pipeline = net.Pipeline()
model = net.CustomMLP().to(pipeline.device)

optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)

home_path = os.path.expanduser("~")
JOB_FOLDER=os.path.join(home_path, "outputs/")
TRAINED_MDL_PATH = os.path.join(JOB_FOLDER, "cifar/mlp/")

import os
os.makedirs(JOB_FOLDER, exist_ok=True)
os.makedirs(TRAINED_MDL_PATH, exist_ok=True)

epochs = 40
trainLossList = []
valAccList = []
for eIndex in range(epochs):
    # print("Epoch count: ", eIndex)
    
    train_epochloss = pipeline.train_step(model, optimizer)
    val_acc = pipeline.val_step(model)

    print(eIndex, train_epochloss, val_acc)

    valAccList.append(val_acc)
    trainLossList.append(train_epochloss)

    trainedMdlPath = TRAINED_MDL_PATH + f"{eIndex}.pth"
    torch.save(model.state_dict(), trainedMdlPath)

trainLosses = np.array(trainLossList)
testAccuracies = np.array(valAccList)

np.savetxt("train.log", trainLosses)
np.savetxt("test.log", testAccuracies)

### Part 3: Implement Convolutional Neural Networks (CNN) Using PyTorch layers

CNNs excel in capturing local patterns and spatial hierarchies through convolutional filters, which makes them more effective for image and spatial data. They also use parameter sharing, reducing the number of parameters and computational cost compared to MLPs. Additionally, CNNs offer translation invariance and hierarchical feature learning, enabling them to recognize features across different spatial locations and build complex patterns efficiently.

Open networks.py and implement `RefCNN` using the inbuilt layers in pytorch. Make sure it is similar to CustomCNN() which uses custom layers.

Train and compare the train loss and validation accuracy against MLP. 

Please copy the best checkpoint file in current folder as cnn_inbuilt.pth for automated tests. It is expected to be higher than 50%.

In [None]:
# Lets train a CIFAR10 image classifier
import importlib
import torch
import torch.nn as nn
import numpy as np
import networks as net
import os
importlib.reload(net)

pipeline = net.Pipeline()
model = net.RefCNN().to(pipeline.device)

optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)

home_path = os.path.expanduser("~")
JOB_FOLDER=os.path.join(home_path, "outputs/")
TRAINED_MDL_PATH = os.path.join(JOB_FOLDER, "cifar/cnn_inbuilt_layers/")

import os
os.makedirs(JOB_FOLDER, exist_ok=True)
os.makedirs(TRAINED_MDL_PATH, exist_ok=True)

epochs = 40
trainLossList = []
valAccList = []
for eIndex in range(epochs):
    # print("Epoch count: ", eIndex)
    
    train_epochloss = pipeline.train_step(model, optimizer)
    print("train complete")
    val_acc = pipeline.val_step(model)

    print(eIndex, train_epochloss, val_acc)

    valAccList.append(val_acc)
    trainLossList.append(train_epochloss)

    trainedMdlPath = TRAINED_MDL_PATH + f"{eIndex}.pth"
    torch.save(model.state_dict(), trainedMdlPath)

trainLosses = np.array(trainLossList)
testAccuracies = np.array(valAccList)

np.savetxt("train.log", trainLosses)
np.savetxt("test.log", testAccuracies)


### Part 4: Implement Your Custom Layers for Convolutional Neural Networks (CNN)

Open custom_layers.py and implement the CustomConvLayer.

Verify it by running the below code. Feel free to modify the below snippet. But do not modify my test.py

For more information about supplying gradients, please refer to [examples_autograd](https://pytorch.org/tutorials/beginner/examples_autograd/two_layer_net_custom_function.html).

In [None]:
import importlib
import torch
import torch.nn as nn
import numpy as np
import networks as net
import os
importlib.reload(net)

inbuiltLayer = nn.Conv2d(2, 3, 3, stride=2, padding='valid')
customLayer = net.CustomConv2d(2, 3, 3, 2)

inbuiltLayer.weight.data.copy_(customLayer.weight.data)
inbuiltLayer.bias.data.copy_(customLayer.bias.data)

u1 = torch.rand((1, 2, 5, 5), requires_grad=True)
u2 = u1.detach().clone()
u2.requires_grad_()

y1 = inbuiltLayer(u1)
y2 = customLayer(u2)

print("Conv. Inference")
print(y1)
print(y2)

lossFunc = nn.MSELoss()
loss_custom = lossFunc(y2, torch.zeros_like(y2))
loss_in = lossFunc(y1, torch.zeros_like(y1))

loss_in.backward()
loss_custom.backward()

print("gradients of loss relative to the weights")
print(inbuiltLayer.weight.grad)
print(customLayer.weight.grad)

print("gradients of loss relative to the bias")
print(inbuiltLayer.bias.grad)
print(customLayer.bias.grad)

print("gradients of loss relative to the input")
print(u1.grad)
print(u2.grad)


### Part 5: CNN Network Training

Train and compare the train loss and validation accuracy against MLP and inbuilt conv layers. 

Please copy the best checkpoint file in current folder as `cnn_custom.pth` for automated tests. It is expected to be higher than 50%.

In [None]:
# Lets train a CIFAR10 image classifier
import importlib
import torch
import torch.nn as nn
import numpy as np
import networks as net
import os
importlib.reload(net)

pipeline = net.Pipeline()
model = net.CustomCNN().to(pipeline.device)

optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)

home_path = os.path.expanduser("~")
JOB_FOLDER=os.path.join(home_path, "outputs/")
TRAINED_MDL_PATH = os.path.join(JOB_FOLDER, "cifar/cnn_custom_layer/")

import os
os.makedirs(JOB_FOLDER, exist_ok=True)
os.makedirs(TRAINED_MDL_PATH, exist_ok=True)

epochs = 40
trainLossList = []
valAccList = []
for eIndex in range(epochs):
    # print("Epoch count: ", eIndex)
    
    train_epochloss = pipeline.train_step(model, optimizer)
    print("train complete")
    val_acc = pipeline.val_step(model)

    print(eIndex, train_epochloss, val_acc)

    valAccList.append(val_acc)
    trainLossList.append(train_epochloss)

    trainedMdlPath = TRAINED_MDL_PATH + f"{eIndex}.pth"
    torch.save(model.state_dict(), trainedMdlPath)

trainLosses = np.array(trainLossList)
testAccuracies = np.array(valAccList)

np.savetxt("train.log", trainLosses)
np.savetxt("test.log", testAccuracies)

## 5. Grading Rubric

- part 1 : 60
- part 2 : 10
- part 3 : 10
- part 4 : 10
- part 5 : 10

For RBE474X: part1 + part2 + part3 = 100% of the grade (80/80).
For RBE595-A01-SP: You are expected to implement part1-part5 for getting full credits (100/100).

Your code will be evaluated with test.py. Please run it and ensure that the tests pass before submitting. Instructions are in software setup section.

Please note that I will replace the test.py with my original test.py before evaluating.

Please do not submit the data folder that is downloaded while training the network. It is over 300 MB. Anyone submitting data will be penalized! Your submission should not be more than 20 MB.

## 6. Report Guidelines

Report must be in Latex.

Include the following,

1. Training loss curve (loss vs epoch count)
2. Confusion Matrix for validation set (val_step)
3. Accuracy comparison between MLP, CNN (torch layers) and CNN (custom_layers)
