# MLP with PyTorch

At this point, I know what MLPs are. We built one from scratch one of the previous folder. In this notebook, I want to implement an MLP again, but using PyTorch. This MLP will trained and evaluated on the MNIST dataset. 

# The MNIST dataset

Conveniently, the MNIST dataset is provided in PyTorch through the `torchvision` module, specifically through the `torchvision.dataset` module.

In the following cell, I import the `torchvision` and `transforms` modules. The second module, as the name suggests, let us perform **common transformations on image data**. According to the [documentation](https://pytroch.org/vision/stable/transforms.html), Transforms are common image transformations available in the `torchvision.transforms` module.

Another interesting feature is that transform operations can be **chained** together using `Compose`. We will use it in a couple of cells below.

In [6]:
import torch
import torch.nn as nn
import torchvision
from torchvision import transforms

With the modules loaded, I want to load the dataset itself, and specify hyperparameters such as the size of the training and testing sets, and size of the mini-batches.

In [2]:
image_path = './'

transform = transforms.Compose([
    transforms.ToTensor()
])

mnist_train_dataset = torchvision.datasets.MNIST(
    root=image_path, train=True,
    transform=transform, download=True 
)

mnist_test_dataset = torchvision.datasets.MNIST(
    root=image_path, train=False,
    transform=transform, download=True  
)

batch_size = 64
torch.manual_seed(1)

<torch._C.Generator at 0x1db9b400eb0>

Okay, what just happened? Since I want to download a dataset I created a `image_path` variable to store the path where I would like images to be stored, should they be downloaded or read from the filesystem, if I do not want the dataset to be downloaded.

Then I move on create a `transform` pipeline. Ours only has one operation: `transform.ToTensor()`. The `ToTensor()` method (1)converts the pixel features into a floating type tensor and (2) normalizes the pixel from range [0, 255] to range [0, 1].

After that is where I effectively create the training and testing dataset using the MNIST dataset. Since, I do not have it on my machine, I asked PyTorch to download it for me using the `download` paramater. I also want PyTorch to perform the `transform` we created earlier on the images being downloaded. I specify which operation to perform using the `transform` paramater.

I finish with specifying the batch size, and manually setting the seed number of random number generation.

With that being done, we cannot use the dataset just yet. We must pass the `Dataset` objects (`mnist_train_dataset` and `mnist_test_dataset`) into a dataset a `DataLoader` object. Remember through a `DataLoader`, we can properly iterate over a given dataset. Okay, let's do it:

In [3]:
from torch.utils.data import DataLoader
train_dl = DataLoader(mnist_train_dataset,
                      batch_size, shuffle=True)

We successfully created the data loader, with batches of 64 samples. Let's move on :)

# Building the Model

This section of the notebook is concerned with building the MLP to classify digits from the dataset we downloaded earlier.

Our MLP will have:

- an input layer
- a hidden layer (32 activation units)
- a hidden layer (16 activation units)
- an output layer

Let's define the above layers in code

In [8]:
hidden_units = [32, 16] #number of activation units in EACH hidden layer
image_size = mnist_train_dataset[0][0].shape #mnist_train_dataset[0] is a tuple (image[tensor], label)
input_size = image_size[0] * image_size[1] * image_size[2] #number of channels * image height * image width

# all the layers in the network 
all_layers = [nn.Flatten()]
for hidden_unit  in hidden_units:
    layer = nn.Linear(input_size, hidden_unit)
    all_layers.append(layer)
    all_layers.append(nn.ReLU())
    input_size = hidden_unit

all_layers.append(nn.Linear(hidden_units[-1], 10)) #output layer

We successfully created all the layers in the network, and we stored them all into the an array called: `all_layers`.

Let's now create a model containing the layers we just created.

In [9]:
model = nn.Sequential(*all_layers)

That's it. Done! Since each layer comes one after the other in an MLP, we use the `torch.nn.Sequential` module to place those layers *sequentially*.

Let's print the model and see all the layers we inserted.

In [10]:
model

Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Linear(in_features=784, out_features=32, bias=True)
  (2): ReLU()
  (3): Linear(in_features=32, out_features=16, bias=True)
  (4): ReLU()
  (5): Linear(in_features=16, out_features=10, bias=True)
)

When an image gets fed into the network $(1 \times 28 \times 28)$ the `Flatten` layer flattens the image to $(1 \times 784)$.

This flattened image goes through the first `Linear` layer. This layer computes the net input. It turns the $(1 \times 784)$ to a $(1 \times 32)$ matrix.

Right after that, the values go through a `RELU` activation function.

T