# Homework 4: Perceptrons and Deep Learning [25 pts]
## Comp562 Summer II 2023

### Due 11:59pm July 25, 2023

In this assignment, you will use the perceptron and deep learning models discussed in class and experiment with some toy data. To avoid unexpected behavior with cached variables, always test your code with a fresh kernel. For hardware acceleration, use Colab with GPU enabled.

### Problem 1: Perceptron and MLP Model [5 pts]
**(1a)** Compare and contrast the perceptron model with SVM and linear regression. [3 pts]

The perceptron model is a basic neural network while SVM is a classification algorithm. They both find a hyperplane to seperate linearly seperable data, but perceptrons use iterative optimization while SVM maximizes the margin between support vectors. SVMs can handle non-linearly serperable data with the kernal trick, while perceptrons will not converge and you will need to force termination.

**(1b)** What is the purpose of nonlinear activations? [2 pts]

Without nonlinear activations, the entire network is just a linear transformation, and could only learn linear relationships in the data. The nonlinear activations make it able to represent more complex relationships by introducing nonlinearity. 

### Problem 2: LSTM [10 pts]
**(2a)** Why are long-term dependencies hard to model with standard recurrent neural networks? [2 pts]

The main problems are the gradient vanishing, where when the gradients go to zero, or the gradient exploding, where it goes to infinity. 

**(2b)** Submit your modified sonnet generation code in a separate Gradescope assignment [4 pts]. (Completion, no answer required here.)

The following questions relate to your sonnet generation model.

**(2c)** Justify your choice of model architectures and the changes you made to the model architecture from the provided code. [2 pts]

I increased the hidden layers to find more complex dependencies in the data, and increased the layers to add more recurrency, making it hopefully more coherent. I also introduced a scheduler to change the learning rate during training as I thought it would achieve a better convergence. 

**(2d)** Jusify any other changes you made to the sonnet generation code. [1 pt]

I increased the epochs to 1000 to train for a much longer time to further reduce the error. 

**(2e)** Analyze the sample sonnets generated from your model. Are they realistic? Given unlimited computing power, what modifications would you make to improve their quality? [1 pt]

The sonnets are not actually that bad, besides the fact that some of the words are not real the grammar and flow is actually pretty good, and are actually coherent. I would modify the code so that hopefully there would not be any fake words, maybe changing the randomness of the char generations, and with unlimited computing power, make the network deeper and train on many more epochs. This is assuming I had enough training data to not cause overfitting. 

### Problem 3: Convolutional Neural Networks [10 pts]

**(3a)** Load the torchvision [CIFAR10 dataset](https://pytorch.org/vision/stable/generated/torchvision.datasets.CIFAR10.html#torchvision.datasets.CIFAR10). Design and train a model to perform 10-class classification on this dataset. Implementations that use models loaded from torch hub will not recieve full credit. Your model should improve over a baseline accuracy of 40%. [6 pts]

In [None]:
# TODO: your code here
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms

class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 32, 3, padding=1)  
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1) 
        self.pool = nn.MaxPool2d(2, 2)
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(64 * 8 * 8, 512) 
        self.fc2 = nn.Linear(512, 10)
        self.softmax = nn.Softmax(dim=-1)

      # DO NOT CALL THIS FUNCTION DIRECTLY
    def forward(self,x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.flatten(x)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return self.softmax(x)

device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

# instantiate model on selected device
classifier = MyModel().to(device)
print(classifier)

import torch.optim as optim

# since this is a multi-class classification problem, cross entropy is appropriate
criterion = nn.CrossEntropyLoss()
# optimize with stochastic gradient descent, setting learning rate and momentum
optimizer = optim.SGD(classifier.parameters(), lr=0.001, momentum=0.9)

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize(mean=0.5, std=0.5)])

# define batch size
batch_size = 4

# download data
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
# wrap in iterable dataloader
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)

n_epochs = 10

train_loss = []
val_loss = []


for epoch in range(n_epochs):
    running_loss = 0.0

  # trainloader yields batched data
    for i, data in enumerate(trainloader, 0):
    # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)  # put on GPU if applicable
        # zero the parameter gradients
        optimizer.zero_grad()
        # forward + backward + optimize
        outputs = classifier(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

    # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
            running_loss = 0.0
            
    avg_train_loss = running_loss / len(trainloader)
    train_loss.append(avg_train_loss)
        
    with torch.no_grad():
        running_val_loss = 0.0
        for i, data in enumerate(testloader, 0):
            inputs, labels = data
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = classifier(inputs)
            loss = criterion(outputs, labels)
            running_val_loss += loss.item()

        avg_val_loss = running_val_loss / len(testloader)
        val_loss.append(avg_val_loss)

print('Finished Training')

**(3b)** Plot your training and validation losses. (Hint: provided code demonstrates usage of labels in plt plotting. You will likely want to modify the plotting function to scale validation appropriately.) [1 pt]

In [None]:
import matplotlib.pyplot as plt

# TODO: track losses for plotting
#added above

plt.figure()
plt.plot(train_loss, label='train')
plt.plot(val_loss, label='validation / 24')
plt.legend()
plt.show()


**(3c)** Evaluate your model on the test partition. [1 pt]

In [None]:
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = classifier(inputs)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy on the test partition: {(100 * correct / total):.2f}%')

**(3d)** Justify your choice of model architecture. [2 pts]

I used a convolutional neural network as they are great for image classification problems. The CNN layers detects patterns which helps the CNN recognize more complex patterns, which is useful for image classification. The CNN also recognizes features regardless of its position in the image, and it can use pooling. Overall, CNNs are known to be great at image classification.