# Lab 4 - Math 178, Spring 2024

You are encouraged to work in groups of up to 3 total students, but each student should make a submission on Canvas. (It's fine for everyone in the group to submit the same link.)

Put the full names of everyone in your group (even if you're working alone) here. This makes grading easier.

**Names**: Seth Abuhamdeh 34937889

## Train an XOR network using PyTorch

* Use PyTorch to train a neural network which produces a perfect (4 out of 4) prediction rate for XOR.  (This is similar to what you did "by hand" on question 2d on Homework 3.  You should not be manually setting the weights, but instead, should be using PyTorch to find weights.  Use a Binary Cross Entropy loss function.  Feel free to use a more complex Neural Network architecture than what you did by hand in the homework.  I was able to eventually get a small neural network architecture to work, but I had to re-run the code numerous times.)

Recommended references:
1. I primarily used the attached University of Washington notebook, which I downloaded from [Google Colab](https://colab.research.google.com/drive/1up-BwDyjNLISMtXKCMjNomyIOW96JlJh?usp=sharing).
2. I personally solved this exercise before reading through the [PyTorch tutorial](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html) which will be used below.  If I had started with that tutorial, maybe I would have used a different approach.  But that tutorial is fancier than what we need here, because of the data loaders etc.

Comment:
1.  The MNIST portion below is probably easier, in terms of what you need to do, but I'm putting this part first because the resulting neural network here is conceptually simpler.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
#build XOR dataset
X= torch.tensor([[0,0],[0,1],[1,0],[1,1]],dtype = torch.float32)
y = torch.tensor([[0],[1],[1],[0]], dtype = torch.float32)


In [None]:
#Neural Network
class XORNN(nn.Module):
    def __init__(self):
        super(XORNN,self).__init__()
        self.hidden1 = nn.Linear(2,4)
        self.hidden2 = nn.Linear(4,4)
        self.output = nn.Linear(4,1)
        self.sigmoid = nn.Sigmoid()
        self.relu = nn.ReLU()
        nn.init.xavier_uniform_(self.hidden1.weight)
        nn.init.xavier_uniform_(self.hidden2.weight)
        nn.init.xavier_uniform_(self.output.weight)

    def forward(self, x):
        x = self.hidden1(x)
        x = self.relu(x)
        x = self.hidden2(x)
        x = self.relu(x)
        x = self.output(x)
        x = self.sigmoid(x)
        return x

model = XORNN()

criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(),lr=0.1)

#train da model
epochs = 10000
for epoch in range(epochs):
    optimizer.zero_grad()
    outputs = model(X)
    loss =  criterion(outputs, y)
    loss.backward()
    optimizer.step()

    if (epoch+1) % 1000 == 0:
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')

with torch.no_grad():
    predictions = model(X)
    predicted = (predictions > 0.5).float()
    print(f'Predictions:\n{predicted}')
    print(f'Actual:\n{y}')


Epoch [1000/10000], Loss: 0.0000
Epoch [2000/10000], Loss: 0.0000
Epoch [3000/10000], Loss: 0.0000
Epoch [4000/10000], Loss: 0.0000
Epoch [5000/10000], Loss: 0.0000
Epoch [6000/10000], Loss: 0.0000
Epoch [7000/10000], Loss: 0.0000
Epoch [8000/10000], Loss: 0.0000
Epoch [9000/10000], Loss: 0.0000
Epoch [10000/10000], Loss: 0.0000
Predictions:
tensor([[0.],
        [1.],
        [1.],
        [0.]])
Actual:
tensor([[0.],
        [1.],
        [1.],
        [0.]])


## Train an MNIST network using PyTorch

* Adapt the code at [the PyTorch tutorial](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html) to train an MNIST neural network.  Adjust parameters as necessary to reach at least a 91% test accuracy.  (Be sure you're using `datasets.MNIST` rather than what's in the tutorial: `datasets.FashionMNIST`.  Most other parts of the tutorial should adapt easily.  I deleted the the GPU parts such as `if torch.cuda.is_available()` because I don't think they will work on Deepnote, but perhaps they are useful also here.)

In [2]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

# Check if GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Download training data from open datasets
training_data = datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets
test_data = datasets.MNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

batch_size = 64

# Create data loaders
train_dataloader = DataLoader(training_data, batch_size=batch_size, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

# Check the data loader
for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
print(model)

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")


Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)
Epoch 1
-------------------------------
loss: 2.306282  [   64/60000]
loss: 0.499791  [ 6464/60000]
loss: 0.341661  [12864/60000]
loss: 0.145393  [19264/60000]
loss: 0.105162  [25664/60000]
loss: 0.150128  [32064/60000]
loss: 0.045713  [38464/60000]
loss: 0.113160  [44864/60000]
loss: 0.095903  [51264/60000]
loss: 0.073280  [57664/60000]
Test Error: 
 Accuracy: 96.8%, Avg loss: 0.108078 

Epoch 2
-------------------------------
loss: 0.044254  [   64/60000]
loss: 0.039624  [ 6464/60000]
loss: 0.043773  [12864/60000]
loss: 0.219417  [19264/60000]
loss: 0.027558  [25664/60000]
loss:

## Submission

* Using the `Share` button at the top right, enable public sharing, and enable Comment privileges. Then submit the created link on Canvas.

## Possible extensions

* My code for the small XOR neural network only works about one out of ten times.  Can you produce code that always works, for example, using a `while` loop or some change to the hyperparameters (not including making the hidden layer bigger)?
* Neural networks are very prone to overfitting.  Does that happen with your MNIST code?  Can you plot a train and test error curve to demonstrate?  How can you combat overfitting, for example, using `nn.Dropout`?
* Can you use Python (don't try to do this by hand) the decision boundary for the XOR classifier from part 1?  Here is some sample code from Math 10 for drawing decision boundaries: https://christopherdavisuci.github.io/UCI-Math-10-S23/Week8/Week8-Wednesday.html

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=e1a44b28-bed7-470b-bf23-d274d60bed13' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>