<a href="https://colab.research.google.com/github/qfx4yk/PyTorch/blob/main/PyTorch_Building_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Building the Neural Network

Neural networks comprise of layers/modules that perform operations on data. The torch.nn namespace provides all the building blocks you need to build your own neural network. Every module in PyTorch subclasses the nn.Module. A neural network is a module itself that consists of other modules (layers). This nested structure allows for building and managing complex architectures easily.

In [1]:
import torch # for all things PyTorch
import torch.nn as nn # for torch.nn.Module, the parent object for PyTorch models;
# contains the neural network layers that we are going to compose into our model as well as the parent class of the model itself
import torch.nn.functional as F # for the activation functions and max pooling functions we can use to connect the layers

# LeNet-5

LeNet-5 is one of the earliest convolutional neural networks and one of the drivers of the explosion in deep learning. It was built to read small images of handwritten numbers (the MNIST dataset) and correctly classified which digit was represented in the image.

Here is an abdriged version of how it works:



*   Layer C1 is a convolutional layer, meaning that it scans the input image for features it learned during training. It outputs a map of where it saw each of its learned features in the image. This "activation map" is downsampled in layer S2.
*   Layer C3 is another convolutional layer, this time scanning C1's activation map for combinations of features. It also puts out an activation map describing the spatial locations of these feature combinations, which is downsampled in layer S4.
*   Finally, the fully-connected layers at the end, F5, F6, and OUTPUT, are a classifier that takes the final activation map, and classifies it into one of ten bind representing the 10 digits.  








In [2]:
class LeNet(nn.Module):
  def __init__(self):
    super(LeNet, self).__init__()
    # 1 input image channel (black & white), 6 output channels, 3x3 square convolution
    # kernel
    self.conv1 = nn.Conv2d(1,6,3)
    self.conv2 = nn.Conv2d(6,16,3)
    # an affine operation: y = Wx + b
    self.fc1 = nn.Linear(16*6*6, 120) # 6*6 from image dimension
    self.fc2 = nn.Linear(120, 84)
    self.fc3 = nn.Linear(84, 10)

  def forward(self, x):
    # Max pooling over a (2,2) window
    x = F.max_pool2d(F.relu(self.conv1(x)), (2,2))
    # If the size is a square you can only specify a single number
    x = F.max_pool2d(F.relu(self.conv2(x)), 2)
    x = x.view(-1, self.num_flat_features(x))
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    x = self.fc3(x)
    return x

  def num_flat_features(self, x):
    size = x.size()[1:] # all dimensions except the batch dimension
    num_features = 1
    for s in size:
      num_features *= s
    return num_features

The code above represents the structure of a typical PyTorch model:
1. It inherits from torch.nn.Module and modules may be nested. In fact, even the Conv2d and Linear layers are subclasses of torch.nn.Module.
2. Every model will have an __init__ where it constructs the layers that it will compose into its computation graph and loads any data artifacts it might need (e.g. an NLP model might load a vocabulary).
3. A model will have a forward() function. This is where the actual computation happens. An input is passed through the network layers and various functions to generate an output (prediction).
4. In addition, you can build your model class like any other Python class, adding whatever properties and methods you need to support your model's computation.

In [3]:
# Running a sample input thru the model above
net = LeNet()
print(net)

input = torch.randn(1,1,32,32)
print('\nImage batch shape:')
print(input.shape)

output = net(input)
print('\nRaw output:')
print(output)
print(output.shape)

LeNet(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

Image batch shape:
torch.Size([1, 1, 32, 32])

Raw output:
tensor([[-0.0749, -0.0541, -0.1712, -0.1167, -0.0448, -0.0576, -0.1349,  0.0034,
          0.1703, -0.0719]], grad_fn=<AddmmBackward0>)
torch.Size([1, 10])


First, we create an instance of LeNet, and we print the net object. A subclass of torch.nn.Module will report the layers it has created and their shapes and parameters. This can provide a handy overview of a model if you want to get the gist of its processing.

Then, we creare a dummy input representing a 32x32 image with 1 color channel. Normally, you would load an image tile and convert it to a tensor of this shape.There is an extra dimension in the tensor - the *batch* dimension. PyTorch models assume they are working on *batches* of data. For example: a batch of 16 of our image tiles would have the shape (16, 1, 32, 32). Since we are only using one image, we create a batch of 1 with shape (1,1,32,32).

We ask the model for an inference by calling it like a function: net(input). The output of this call represents the model's confidence that the input represents a particular digit. (Since this instance of the model has not learned anything yet, we should not expect to see any signal in the output.). Looking at the shape of the output, we can see that it also has a batch dimension the size of which should always match the input batch dimension.

# Neural Network for FashionMNIST

In [4]:
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

## Get Device for Training

We want to be able to train our model on a hardware accelerator like the GPU or MPS, if available. Let’s check to see if torch.cuda or torch.backends.mps are available, otherwise we use the CPU.

In [5]:
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

Using cpu device


## Define the Class

We define our neural network by subclassing nn.Module, and initialize the neural network layers in __init__. Every nn.Module subclass implements the operations on input data in the forward method.

In [11]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

In [12]:
# We create an instance of NeuralNetwork, and move it to the device, and print its structure.
model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


**Note: To use the model, we pass it the input data. This executes the model’s forward, along with some background operations. Do not call model.forward() directly!**

In [13]:
X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

Predicted class: tensor([4])


Calling the model on the input returns a 2-dimensional tensor with dim=0 corresponding to each output of 10 raw predicted values for each class, and dim=1 corresponding to the individual values of each output. We get the prediction probabilities by passing it through an instance of the nn.Softmax module.

# Model Layers

In [14]:
# We will take a sample minibatch of 3 images of size 28x28
input_image = torch.rand(3,28,28)
print(input_image.size())

torch.Size([3, 28, 28])


### nn.Flatten

In [15]:
# We initialize the nn.Flatten layer to convert each 2D 28x28 image into a contiguous array of 784 pixel values
flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

torch.Size([3, 784])


### nn.Linear

In [16]:
# The linear layer is a module that applies a linear transformation on the input using its stored weights and biases
layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())

torch.Size([3, 20])


### nn.ReLU

Non-linear activations are what create the complex mappings between the model’s inputs and outputs. They are applied after linear transformations to introduce nonlinearity, helping neural networks learn a wide variety of phenomena.

In this model, we use nn.ReLU between our linear layers, but there’s other activations to introduce non-linearity in your model.

In [17]:
print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")

Before ReLU: tensor([[-0.2255,  0.3069, -0.4549,  0.1189,  0.2371,  0.9187,  0.5445, -0.1907,
          0.3097,  0.3343, -0.4944, -0.0820, -0.3132,  0.5316,  0.0838, -0.0755,
         -0.2348, -0.2486,  0.2889, -0.7953],
        [ 0.1777,  0.4944, -0.7440,  0.0620,  0.0016,  0.4225,  0.0955, -0.2760,
         -0.2505,  0.3827, -0.3239, -0.0269, -0.7322,  0.8209,  0.2649, -0.0144,
         -0.3590, -0.0383,  0.2358, -0.5863],
        [ 0.1774,  0.6033, -0.7095,  0.1282,  0.1001,  0.3616,  0.1147, -0.0727,
          0.0123,  0.2914, -0.1166, -0.1367, -0.3130,  0.6286,  0.1271, -0.0160,
         -0.3586, -0.2379,  0.1306, -0.5698]], grad_fn=<AddmmBackward0>)


After ReLU: tensor([[0.0000, 0.3069, 0.0000, 0.1189, 0.2371, 0.9187, 0.5445, 0.0000, 0.3097,
         0.3343, 0.0000, 0.0000, 0.0000, 0.5316, 0.0838, 0.0000, 0.0000, 0.0000,
         0.2889, 0.0000],
        [0.1777, 0.4944, 0.0000, 0.0620, 0.0016, 0.4225, 0.0955, 0.0000, 0.0000,
         0.3827, 0.0000, 0.0000, 0.0000, 0.8209, 0.26

### nn.Sequential

In [18]:
# nn.Sequential is an ordered container of modules. The data is passed through all the modules in the same order as defined.
seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)

### nn.Softmax

The last linear layer of the neural network returns logits - raw values in [-infty, infty] - which are passed to the nn.Softmax module. The logits are scaled to values [0, 1] representing the model’s predicted probabilities for each class. dim parameter indicates the dimension along which the values must sum to 1.

In [20]:
softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)

# Model Parameters

Many layers inside a neural network are parameterized, i.e. have associated weights and biases that are optimized during training. Subclassing nn.Module automatically tracks all fields defined inside your model object, and makes all parameters accessible using your model’s parameters() or named_parameters() methods.

In [21]:
# Iterating over each paramter and printing its size and a preview of its values
print(f"Model structure: {model}\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

Model structure: NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[-0.0212,  0.0320, -0.0260,  ..., -0.0205,  0.0037, -0.0315],
        [ 0.0030, -0.0328,  0.0037,  ...,  0.0293, -0.0042, -0.0185]],
       grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([-0.0096,  0.0020], grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[ 0.0116, -0.0165, -0.0098,  ...,  0.0404, -0.0134, -0.0062],
        [ 0.0307, -0.0307,  0.0260,  ..., -0.0268, -0.0049, -0.0070]],
       grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.bias | 