# Constructing a Simple Neural Network with PyTorch
PyTorch is an optimized tensor library for deep learning using GPUs and CPUs.

## Set-up

In [None]:
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.datasets as datasets
from torch.utils.data import DataLoader
from torchvision import transforms

## Get Train and Test Datasets

All torchvision datasets are subclasses of `torch.utils.data.Dataset`. Hence, they can all be passed to a `torch.utils.data.DataLoader` which can load multiple samples in parallell using torch.multiprocessing workers.

The MNIST database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST.

In [None]:
%%capture
mnist_train = datasets.MNIST('mnist', download=True, transform=transforms.PILToTensor(), train=True)
mnist_test = datasets.MNIST('mnist', download=True, transform=transforms.PILToTensor(), train=False)

In [None]:
mnist_train[0][0], mnist_train[0][1]

In [None]:
transforms.functional.to_pil_image(mnist_train[0][0])

## Create Neural Network Model with `nn.Sequential`

A sequential container. Modules will be added to it in the order they are passed in the constructor. Alternatively, an ordered dict of modules can also be passed in.

To make it easier to understand, here is a small example:

```
# Example of using Sequential
model = nn.Sequential(
          nn.Linear(32, 16),
          nn.ReLU(),
          nn.Linear(16, 1),
          nn.ReLU()
        )

# Example of using Sequential with OrderedDict
model = nn.Sequential(OrderedDict([
          ('linear1', nn.Linear(32, 16)),
          ('relu1', nn.ReLU()),
          ('linear2', nn.Linear(16, 1)),
          ('relu2', nn.ReLU())
        ]))
```

As an aside, note that `nn.Linear` applies a linear transformation to the incoming data: $y = xA^T + b$. ($xA^T$ represents a dot product.)

In [None]:
m = nn.Linear(5, 10)
input_ = torch.randn(32, 5)  # batch size == 32; number of features for each sample == 5
output = m(input_)
output.size()

Create *your own model* for the **MNIST** datatset here:

In [None]:
seq_model = nn.Sequential(
    # TODO: Add one or more nn.Linear layer arguments each followed by a
    # non-linearity (such as nn.ReLU). Your first `nn.Linear` layer should
    # accept 28 * 28 input features. Your final `nn.Linear` layer should 
    # have 10 output features (one for each number) and be followed by
    # `nn.Softmax(dim=-1)`, which will convert the `nn.Linear` outputs to
    # probabilities. (~4 lines of code or more)
    
).cuda()

In [None]:
seq_model  # what does your model look like?

Test your linear model on a flattened image to make sure all shapes are correct:

In [None]:
flattened_tensor = mnist_train[0][0].view(1, 1, -1) / 256.
seq_model(flattened_tensor.cuda())

## Create More Flexible Neural Network Models with `nn.Module`

`nn.Module` is the base class for all neural network modules. Your models should subclass this class. Modules can contain other Modules, allowing nesting in a tree structure. You can assign the submodules as regular instance attributes:

```
import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):

    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))
```

Submodules assigned in this way will be registered and will have their parameters included in the top-level Module's parameters. For example:

```
>>> model = Model()
>>> for name, param in model.named_parameters():
    print(name, '=>', param.shape)
conv1.weight => torch.Size([20, 1, 5, 5])   # a model.conv1 parameter
conv1.bias => torch.Size([20])              # another model.conv1 parameter
conv2.weight => torch.Size([20, 20, 5, 5])  # a model.conv2 parameter
conv2.bias => torch.Size([20])              # another model.conv2 parameter
```

*Now can you replicate the `seq_model` model you created above with `nn.Module`?*

In [None]:
seq_model  # show again what the `seq_model` looks like for replication

In [None]:
class Model(nn.Module):

    def __init__(self):
        """
        In the constructor, you declare all the layers you want to use.
        """
        super(Model, self).__init__()
        # TODO: Add linear layers to instance attributes here (~2 lines or more)

 
    def forward(self, x):
        """
        In the forward function, you define how your model is going to be run,
        from input to output. This method is run automatically when the instance
        is called.
        """
        # TODO: Use the linear layer (and other) attributes assigned above to 
        # calculate the `outputs`. (~4 lines)

        return outputs

In [None]:
m_model = Model().cuda()
m_model

## Create Loss Function, Optimizer, and DataLoader

**Criterion**: `nn.CrossEntropyLoss` is useful when training a classification problem with C classes. The *input* is expected to contain raw scores *for each class*. This criterion expects *a class index* in the range [0, C-1] as the *target* for each value of a 1D tensor of size minibatch.

**Optimizer**: `torch.optim` implements various optimization algorithms. To use `torch.optim`, you have to construct an optimizer object, which will hold the current state and will update the parameters based on the computed gradients. To construct an Optimizer you have to give it an iterable containing the parameters to optimize. You can also specify learning rate, weight decay, etc. We will use `optim.Adam` here, a good default.

**DataLoader**: At the heart of PyTorch data loading utility is the `torch.utils.data.DataLoader` class. It represents a Python iterable over a dataset. A DataLoader performs automatic batching. The DataLoader fetches a minibatch of data and collates them into batched samples. It generates Tensors with one dimension being the batch dimension (usually the first dimension).

In [None]:
criterion = # TODO: enter the loss function here (`nn.CrossEntropyLoss`).
optimizer = # TODO: enter the optimizer here (`optim.Adam`, learning rate of 1e-3).
dataloader = # TODO: enter the `DataLoader` here (with batch size 128 or so).

## Train the Model

Here is where our network organizes and optimizes itself. We simply have to loop over our data iterator, feed the inputs to the network, and optimize the network's weights (parameters).

In [None]:
def train(model, criterion, optimizer, dataloader, epochs=30, flatten=True):
    for epoch in range(1, epochs + 1):
        running_loss = 0.
        for batch, (X, y) in enumerate(dataloader):
            X = X.cuda() / 256.
            if flatten:
                X = X.view(X.shape[0], -1)  # flatten the images
            y = y.cuda()

            # TODO: Get the outputs of the model and assign them to the name `outputs`,
            # and then get the loss of the model using `criterion` and assign it the 
            # name `loss`. (2 lines of code)
            

            running_loss += loss

            # TODO: Use `zero_grad` to zero the gradient of the optimizer. Then
            # calculate the gradients using the `backward` method of loss, and
            # finally have the optimizer take a step. (3 lines of code)
            

            print('Epoch: {} Batch: {} Loss: {} Running Loss: {}'
                  .format(epoch, batch, loss, running_loss / (batch + 1)))
    return model

Now train our model by calling the `train` function. Feel free to use your `m_model` or your `seq_model`.

In [None]:
model = train(m_model, criterion, optimizer, dataloader, epochs=10, flatten=True)
# model = train(seq_model, criterion, optimizer, dataloader, epochs=10, flatten=True)

## Check the Model's Performance on Each Class

A confusion matrix is a specific table layout that allows visualization of the performance of an algorithm. Each row of the matrix represents the instances in a predicted class while each column represents the instances in an actual class (or vice versa). The name stems from the fact that it makes it easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another).

In [None]:
def generate_confusion_matrix(model, dataloader, num_classes=10, flatten=True):
    confusion_matrix = torch.zeros(num_classes, num_classes)
    with torch.no_grad():
        for batch, (X, y) in enumerate(dataloader):
            X = X.cuda() / 256.
            if flatten:
                X = X.view(X.shape[0], -1)  # flatten the images
            y = y.cuda()

            outputs = model(X)
            
            preds = torch.argmax(outputs, dim=-1)
            for t, p in zip(y.view(-1), preds.view(-1)):
                confusion_matrix[t.long(), p.long()] += 1

    return pd.DataFrame(confusion_matrix.numpy(), columns=range(10))

In [None]:
# get dataloader for the test set, which the model has never seen before
test_dataloader = DataLoader(mnist_test, batch_size=128)
# now generate confusion matrix
results = generate_confusion_matrix(seq_model, test_dataloader, num_classes=10)
results

Now check the accuracy for each class label (we could use f1 score etc. as well):

In [None]:
print(np.diag(results.values) / results.sum(1))

And finally check the overall accuracy score when each class is equally weighted:

In [None]:
print((np.diag(results) / results.sum()).mean())