<a href="https://colab.research.google.com/github/mlej8/ECSE552/blob/main/Tutorial2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Multi-Layer Perceptron Using PyTorch

Goals:
 - Learn PyTorch's Data Utilities
 - Learn PyTorch's built-in neural network functions instead of implementing them from scratch
 - Learn how to use the GPU for computation
 - Learn how to access the model's parameters

**NOTE:** to use the GPU in Colab, click Edit > Notebook Settings > GPU

## Part 0: Preparing Datasets in PyTorch

A lot of tutorials out there start by teaching you how to classify MNIST digits dataset using neural networks. However, there's a lot of confusion when it comes to using these in your own data. This is because most of the time, they jump to the part where they just load the data which are already split into train-test or train-validation-test sets. 

So here, let's create a simple dataset using numpy first, then we will transform them into tensors.

In [None]:
import numpy as np
import torch

We will create two classes, where each class came from a normal distribution centered at (-1, 1) and (1, -1), with stdev of 0.7.

In [None]:
x1 = np.random.normal(loc=(-1,1), scale=0.7, size=(100,2))
x2 = np.random.normal(loc=(1,-1), scale=0.7, size=(100,2))

Visualize the dataset just so we know we have created what we had in mind.

In [None]:
import matplotlib.pyplot as plt
plt.scatter(x1[:,0], x1[:,1], label='class0')
plt.scatter(x2[:,0], x2[:,1], label='class1', marker='s')
plt.legend()
plt.show()

In [None]:
x = np.concatenate([x1, x2], axis=0)
y = np.ones(200)
y[:100] = 0

Now, we are ready to prepare to create training splits using Pytorch

In [None]:
data = torch.utils.data.TensorDataset(
            torch.Tensor(x),
            torch.Tensor(y))
data[2]

In [None]:
train, val, test = torch.utils.data.random_split(data, lengths=[100, 50, 50])
print(len(train), len(val), len(test))

Now you could easily iterate through these dataset in batches but there is actually an easier way of iterating through your data without worrying about indexing, shuffling, and epochs.

In [None]:
train_loader = torch.utils.data.DataLoader(train, batch_size=10, shuffle=True)
val_loader = torch.utils.data.DataLoader(val, batch_size=25, shuffle=False)

There are 2 commons ways of iterating through dataloader: 
 - epoch-based (most common)
 - manual next() trigger

In [None]:
num_epoch = 3
for epoch in range(num_epoch):
    for i, (x, y) in enumerate(train_loader):
        print(i, x.shape)

In [None]:
num_steps = 30
trigger_steps = len(train_loader) # number of batches in one epoch
print(trigger_steps)
for step in range(num_steps):
    if step % trigger_steps == 0:
        print('trigger')
        tl = iter(train_loader)
        
    x, y = next(tl)
    print(step, x.shape)

## Part 1: Learning XOR

The XOR problem is a well-known example in which a perceptron is not able to learn the correct function.
However, a simple stacking of perceptrons could easily solve the problem.
In this tutorial we will show how to create a 2-layer perceptron and learn the XOR function.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

torch.manual_seed(1)

x_data = [[0,0], [0,1], [1,0], [1,1]]
y_data = [[0], [1], [1], [0]]

First, we prepare the dataset. We are trying to learn the XOR function which only has 4 possible datapoints. In this example we train and test using the same datapoints.

In [None]:
x_data = torch.Tensor(x_data)
y_data = torch.Tensor(y_data)
data = torch.utils.data.TensorDataset(x_data, y_data)
data = torch.utils.data.DataLoader(data, batch_size=1, shuffle=True)

Next, we create a model. Instead of implementing the intialization of the parameters and the matrix multiplications, we can just create a module. There are mulitple ways of defining your model, but this way will give you a lot of flexibility in the forward function.

In [None]:
class MLP(nn.Module):
    def __init__(self, n_in, n_hidden):
        super(MLP, self).__init__()
        self.layer1 = nn.Linear(n_in, n_hidden)
        self.layer2 = nn.Linear(n_hidden, 1)
    
    def forward(self, x):
        return self.layer2(torch.relu(self.layer1(x)))

model = MLP(n_in=2, n_hidden=4)

Then, we create an optimizer. For this example, we will use the Stochastic Gradient Descent

In [None]:
optimizer = torch.optim.SGD(model.parameters(), lr=0.1) # the lr is typically smaller than this

Now, we are ready to train the model.

In [None]:
n_epoch = 50

for epoch in range(n_epoch):
    # training
    model.train()
    for x,y in data:
        
        optimizer.zero_grad()
        y_hat = model(x)
        loss = F.mse_loss(y_hat, y)
        loss.backward()
        optimizer.step()

    # validation/test
    model.eval()
    with torch.no_grad():
        y_hat = model(x_data)
        loss = F.mse_loss(y_hat, y_data)
        acc = ((y_hat > 0.5) == y_data).float().mean()

    print('%d: XOR(0,0)=%.4f XOR(0,1)=%.4f XOR(1,0)=%.4f XOR(1,1)=%.4f cost=%.4f, accuracy=%.2f'\
        %(epoch+1, y_hat[0], y_hat[1], y_hat[2], y_hat[3], loss.item(), acc.item()))

## Part 2: MNIST

In [None]:
from torchvision import datasets
import torchvision.transforms as transforms

transform = transforms.Compose([transforms.ToTensor()])

mnist_trainset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
mnist_testset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

len(mnist_trainset), len(mnist_testset)

Let's split our training data to training and validation.

In [None]:
mnist_trainset, mnist_valset = torch.utils.data.random_split(mnist_trainset, lengths=[50000, 10000])

train_data = torch.utils.data.DataLoader(mnist_trainset, batch_size=256, shuffle=True)
val_data = torch.utils.data.DataLoader(mnist_valset, batch_size=256, shuffle=False)
test_data = torch.utils.data.DataLoader(mnist_testset, batch_size=256, shuffle=False)

Let's take a look at our data

In [None]:
samples = iter(test_data)
(x, y) = next(samples)
y

In [None]:
x.shape

In [None]:
fig = plt.figure()
for i in range(6):
  plt.subplot(2,3,i+1)
  plt.tight_layout()
  plt.imshow(x[i][0], cmap='gray', interpolation='none')
  plt.title("y = {}".format(y[i]))
  plt.xticks([])
  plt.yticks([])

# DIY \#1
Create a ``nn.Module`` class with the following architecture:
input &rightarrow; hidden &rightarrow; output

The model will have 10 outputs and the input and and hidden layers are parameters to be set at intialization (i.e. we the following will be called: ``model = MnistMLP(n_in=28*28, n_hidden=500)``)

In this example we also try to use the GPU (if it is available)

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(device)

In [None]:
model = MnistMLP(n_in=28*28, n_hidden=500).to(device)

Create the loss function and optimizer. In this case, since we did not add a ``LogSoftmax()`` or a ``Softmax()`` layer in our model, we will use the ``CrossEntropyLoss()``. This combines the ``LogSoftmax`` function and ``NLLLoss``, which we use in classification.

In [None]:
loss_func = nn.CrossEntropyLoss()
val_loss_func = nn.CrossEntropyLoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)

## DIY \# 2
Create the training loop. The validation loop has already been filled. It would be somewhat similar.

In [None]:
n_epoch = 10

for epoch in range(n_epoch):
    
    # training loop here
        
    model.eval() # signal evaluation phase
    with torch.no_grad():
        val_loss = 0
        val_acc = 0
        for x, y in val_data:
            x, y = x.to(device), y.to(device) # move the data to cuda
            y_hat = model(x.view(-1,28*28))
            val_loss += val_loss_func(y_hat, y)
            
            y_hat = torch.argmax(y_hat, axis=1)
            val_acc += (y_hat == y).float().sum()
        val_loss /= 10000
        val_acc /= 10000
    
    print("%d\tbatch-loss: %.4f\tbatch-acc: %.4f\tval-loss: %.4f\tval-acc: %.4f"%(
        epoch, batch_loss, batch_acc, val_loss, val_acc))
            
    

## Accessing Parameters and Plotting Embeddings

In [None]:
for params in model.parameters():
    print(params.shape)

In [None]:
class MnistMLPEmbed(nn.Module):
    def __init__(self, n_in, n_hidden):
        super(MnistMLPEmbed, self).__init__()
        self.layer1 = nn.Linear(n_in, n_hidden)
        # self.layer2 = nn.Linear(n_hidden, 10)
    
    def forward(self, x):
        return torch.relu(self.layer1(x))

In [None]:
embed_model = MnistMLPEmbed(28*28, 500)
embed_model.load_state_dict(model.state_dict(), strict=False)

In [None]:
samples = iter(test_data)
(x, y) = next(samples)
y = y.detach().numpy()

In [None]:
embedding = embed_model(x.view(-1,28*28))

In [None]:
embedding.shape

In [None]:
from sklearn.decomposition import PCA

x = PCA(n_components=2).fit_transform(embedding.detach().numpy())

In [None]:
for i in range(10):
  idx = np.where(y == i) 
  plt.scatter(x[idx,0], x[idx,1], label=i)
plt.legend()
plt.show()