# Build Model Tutorial

In [23]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
%matplotlib inline

import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

from torchsummary import summary

# Build the Neural Network

The [`torch.nn`](https://pytorch.org/docs/stable/nn.html) namespace provides all the building blocks needed to build our own neural network. Every module in PyTorch subclasses the [`nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html). A neural network is a module itself that consists of other modules (layers). This nested structure allows for building and managing complex architectures easily.

# Check GPU availability

We want to be able to train our model on a hardware accelerator like the GPU, if it is available. Let's check to see if [`torch.cuda`](https://pytorch.org/docs/stable/notes/cuda.html) is available, else we continue to use the CPU. 

In [2]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Using {} device'.format(device))

Using cpu device


# Understand Model Layers

Lets take a sample minibatch of 3 images of size 28 x 28 and see what happens to it as we pass it through the network. 

## Define input

In [5]:
input_image = torch.rand(3, 28, 28)
print(f"Size of input_image is {input_image.size()}")

Size of input_image is torch.Size([3, 28, 28])


## Flatten layer

[`nn.Flatten`](https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html) layer converts each 2D 28 x 28 image into a contiguous array of 784 pixel values (the minibatch dimension (at dim = 0) is maintained).

In [8]:
flatten = nn.Flatten() # Define flatten layer
flat_image = flatten(input_image) # Get output of flatten layer
print(f"Shape of flat_image is {flat_image.size()}")

Shape of flat_image is torch.Size([3, 784])


## Linear layer

The [`linear layer`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) is a module that applies a linear transformation on the input using it's stored weights and biases.

In [9]:
layer1 = nn.Linear(in_features = 28 * 28, out_features = 20)
hidden1 = layer1(flat_image)
print(f"Shape of hidden1 is {hidden1.size()}")

Shape of hidden1 is torch.Size([3, 20])


## ReLU

[`nn.ReLU`](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html).

In [10]:
print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")

Before ReLU: tensor([[-0.3560,  0.4836, -0.1674, -0.1033,  0.3106, -0.3254,  0.3303,  0.2531,
         -0.2452, -0.1073, -0.3930,  0.3560,  0.4534, -0.1133, -0.0222,  0.1346,
         -0.3885, -0.3978, -0.2721, -0.1094],
        [-0.3520,  0.4087, -0.2760, -0.0027,  0.1285, -0.3497,  0.0465,  0.6129,
         -0.3857, -0.2560, -0.6055,  0.1396,  0.6784, -0.0702,  0.2206,  0.1016,
         -0.2497, -0.1468, -0.3845,  0.1320],
        [-0.0162,  0.5943,  0.3098,  0.0744,  0.4060, -0.6053,  0.3440,  0.3050,
         -0.4018, -0.2303, -0.2049, -0.0671,  0.6359, -0.3214,  0.3185, -0.1236,
         -0.5617,  0.0919, -0.1434,  0.0260]], grad_fn=<AddmmBackward>)


After ReLU: tensor([[0.0000, 0.4836, 0.0000, 0.0000, 0.3106, 0.0000, 0.3303, 0.2531, 0.0000,
         0.0000, 0.0000, 0.3560, 0.4534, 0.0000, 0.0000, 0.1346, 0.0000, 0.0000,
         0.0000, 0.0000],
        [0.0000, 0.4087, 0.0000, 0.0000, 0.1285, 0.0000, 0.0465, 0.6129, 0.0000,
         0.0000, 0.0000, 0.1396, 0.6784, 0.0000, 0.220

# Understand Sequential model

[`nn.Sequential`](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html) is an ordered 
container of modules. The data is passed through all the modules in the same order as defined. We can use sequential containers to put together a quick network like ``seq_modules``.

In [11]:
seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)
input_image = torch.rand(3, 28, 28)
logits = seq_modules(input_image)
print(f"Shape of logits is {logits.shape}")

Shape of logits is torch.Size([3, 10])


## Softmax layer

The last linear layer of the neural network returns `logits` - raw values in [-\infty, \infty] - which are passed to the [`nn.Softmax`](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html) module. The logits are scaled to values [0, 1] representing the model's predicted probabilities for each class. ``dim`` parameter indicates the dimension along which the values must sum to 1. 

In [20]:
softmax = nn.Softmax(dim = 1)
pred_probab = softmax(logits)
print(f"Shape of pred_probab is {pred_probab.shape}")
print(f"Value of pred_probab[0] is {pred_probab[0]}")

Shape of pred_probab is torch.Size([3, 10])
Value of pred_probab[0] is tensor([0.0986, 0.0830, 0.0724, 0.1274, 0.1200, 0.0969, 0.1057, 0.0987, 0.1036,
        0.0938], grad_fn=<SelectBackward>)


# Define the NN Class

We define our neural network by subclassing ``nn.Module``, and initialize the neural network layers in ``__init__``. Every ``nn.Module`` subclass implements the operations on input data in the ``forward`` method. 

In [21]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28 * 28, 512), # First hidden layer with 512 output neurons
            nn.ReLU(), # ReLU activation for first hidden layer
            nn.Linear(512, 512), # Second hidden layer with 512 output neurons
            nn.ReLU(), # ReLU activation for second hidden layer
            nn.Linear(512, 10), # Output Layer with 10 output neurons
            nn.ReLU() # ReLU activation for output layer
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

# Define model

We create an instance of ``NeuralNetwork``, and move it to the ``device``, and print its structure.

In [27]:
model = NeuralNetwork().to(device)
print(summary(model, input_size = (1, 28, 28)))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
           Flatten-1                  [-1, 784]               0
            Linear-2                  [-1, 512]         401,920
              ReLU-3                  [-1, 512]               0
            Linear-4                  [-1, 512]         262,656
              ReLU-5                  [-1, 512]               0
            Linear-6                   [-1, 10]           5,130
              ReLU-7                   [-1, 10]               0
Total params: 669,706
Trainable params: 669,706
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.02
Params size (MB): 2.55
Estimated Total Size (MB): 2.58
----------------------------------------------------------------
None


# Forward pass on model

To use the model, we pass it the input data. This executes the model's ``forward``, along with some [`background operations`](https://github.com/pytorch/pytorch/blob/270111b7b611d174967ed204776985cefca9c144/torch/nn/modules/module.py#L866)

Do not call ``model.forward()`` directly!

Calling the model on the input returns a 10-dimensional tensor with raw predicted values for each class. We get the prediction probabilities by passing it through an instance of the ``nn.Softmax`` module.

In [30]:
X = torch.rand(1, 28, 28, device = device)
logits = model(X)
print(f"Shape of logits is {logits.shape}")
pred_probab = nn.Softmax(dim = 1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

Shape of logits is torch.Size([1, 10])
Predicted class: tensor([5])


# Model Parameters

Many layers inside a neural network are *parameterized*, i.e. have associated weights and biases that are optimized during training. Subclassing ``nn.Module`` automatically tracks all fields defined inside the model object, and makes all parameters accessible using our model's ``parameters()`` or ``named_parameters()`` methods. 

In the below section, we iterate over each parameter, and print its size and a preview of its values. 

In [31]:
# First print model structure
print("Model Structure:")
print(model)

Model Structure:
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
    (5): ReLU()
  )
)


In [41]:
for name, param in model.named_parameters():
    print(f"Name of layer.parameter is {name}")
    print(f"Size of layer.parameter is {param.size()}")
    print()  

Name of layer.parameter is linear_relu_stack.0.weight
Size of layer.parameter is torch.Size([512, 784])

Name of layer.parameter is linear_relu_stack.0.bias
Size of layer.parameter is torch.Size([512])

Name of layer.parameter is linear_relu_stack.2.weight
Size of layer.parameter is torch.Size([512, 512])

Name of layer.parameter is linear_relu_stack.2.bias
Size of layer.parameter is torch.Size([512])

Name of layer.parameter is linear_relu_stack.4.weight
Size of layer.parameter is torch.Size([10, 512])

Name of layer.parameter is linear_relu_stack.4.bias
Size of layer.parameter is torch.Size([10])



# Further Reading

- [`torch.nn API`](https://pytorch.org/docs/stable/nn.html)