# Implementing the Forward Pass

Let's use the ```__call__()``` module within the layers of the network to implement the forward pass.

In [126]:
import torch
import torchvision
import numpy as np

from torch import nn as nn
from torch.nn import functional as F
from torchvision import transforms as transforms
from torch.utils.data import Dataset, DataLoader

torch.set_printoptions(linewidth=120)

Let's use the ```nn.Module``` class and initialise the parent class using the ```super()``` constructor. We will define a convolutional network with:
- Convolutional layer with 6 output channels
- Convolutional layer with 12 output channels
- Dense layer with 120 output nodes
- Dense layer with 60 output nodes
- Dense layer with 10 output nodes

Let us also equip each layer with the ReLU activation function, that is available from ```torch.nn.functional```, and each convolutional layer with a maxpooling operation.

In [127]:
class ConvNetwork(nn.Module):
    
    def __init__(self):
        super(ConvNetwork, self).__init__()
        
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
        
        self.dense1 = nn.Linear(in_features=12*4*4, out_features=120)
        self.dense2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)
        
    def forward(self, x):
        # (1) hidden conv layer
        x = self.conv1(x)
        x = F.relu(x)
        x = F.max_pool2d(x, kernel_size=2, stride=2)
        
        # (2) hidden conv layer
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, kernel_size=2, stride=2)
        
        # (3) hidden linear layer
        x = x.reshape(-1, 12*4*4)
        x = self.dense1(x)
        x = F.relu(x)
        
        # (4) hidden linear layer
        x = self.dense2(x)
        x = F.relu(x)
        
        # (5) output layer
        x = self.out(x)
        # x = F.softmax(x, dim=1) # This will be implemented within the loss function
        
        return x

Notice that we use ```reshape()``` to give the right size of the flattened output of the convolutional layers to the linear layers. This is obtained as follows:

Suppose
- the input to convolutional layer is of size $n\times n$,
- each filter convolutional layer is of size $k\times k$,
- the convolution is padded with $p$, and has stride $s$.

The output channel of the convolutional layer has size $m\times m$, where:
$$
    m = \frac{n-k+2p}{s} + 1
$$

Let's use the ```forward()``` method to see what happens to an input to the network. We start with importing the FashionMNIST dataset.

In [128]:
train_set = torchvision.datasets.FashionMNIST(
    root='./../datasets',
    train=True,
    download=True,
    transform=transforms.Compose([
        transforms.ToTensor()
    ])
)

We create an instance of the ```ConvNetwork``` class to use ```images``` as the input to the instance of the network. Before we instantiate, we will turn the ```autograd``` tags of the variable to ```False``` to stop the dynamic creation of the computational graph for backpropagation. We will see backpropagation later.

In [129]:
torch.set_grad_enabled(False)

<torch.autograd.grad_mode.set_grad_enabled at 0x7fa4a87603d0>

In [130]:
network = ConvNetwork()
network

ConvNetwork(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))
  (dense1): Linear(in_features=192, out_features=120, bias=True)
  (dense2): Linear(in_features=120, out_features=60, bias=True)
  (out): Linear(in_features=60, out_features=10, bias=True)
)

We pick one sample (image, label) pair, to see the output prediction of the network. The ```network``` is now callable that invokes the ```forward()``` method that we defined within the class.

In [131]:
sample = next(iter(train_set))
images, labels = sample

print(images.shape)

torch.Size([1, 28, 28])


Notice the ```images``` object has a tensor with rank $3$, i.e., it has one image with one channel with that contains a $28\times 28$ image. To load into the network, we need a rank $4$ tensor that tell the size of the batch. We use the ```unsqueeze()``` function for this.

In [132]:
prediction = network(images.unsqueeze(0))
print(F.softmax(prediction, dim=1).sum()) # Check distribution
print(F.softmax(prediction, dim=1))

tensor(1.0000)
tensor([[0.0928, 0.1121, 0.0983, 0.0834, 0.1159, 0.0981, 0.1035, 0.0883, 0.0902, 0.1173]])


Notice the predictions are all around $10\%$, which is reasonable since the weights are randomly initialised and the prediction, given the input under randomly initialised parameters is uniform.

In [133]:
print(labels)
print(F.softmax(prediction, dim=1).argmax(dim=1))

9
tensor([9])


We can now do the same with a batch of images that can be created using the ```DataLoader```.

In [134]:
batch_size = 10
train_loader = DataLoader(
    train_set,
    batch_size=batch_size
)

In [135]:
batch = next(iter(train_loader))
images, labels = batch
print(images.shape)

torch.Size([10, 1, 28, 28])


In [136]:
predictions = network(images)
print(F.softmax(predictions, dim=1).sum(dim=1)) # Check distribution
print(F.softmax(predictions, dim=1))

tensor([1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000])
tensor([[0.0928, 0.1121, 0.0983, 0.0834, 0.1159, 0.0981, 0.1035, 0.0883, 0.0902, 0.1173],
        [0.0931, 0.1121, 0.0989, 0.0829, 0.1160, 0.0983, 0.1047, 0.0886, 0.0890, 0.1165],
        [0.0924, 0.1108, 0.0974, 0.0847, 0.1133, 0.0986, 0.1041, 0.0901, 0.0917, 0.1168],
        [0.0926, 0.1111, 0.0975, 0.0845, 0.1142, 0.0987, 0.1042, 0.0898, 0.0905, 0.1169],
        [0.0919, 0.1118, 0.0977, 0.0838, 0.1155, 0.0996, 0.1039, 0.0900, 0.0893, 0.1165],
        [0.0924, 0.1112, 0.0989, 0.0831, 0.1166, 0.0980, 0.1047, 0.0891, 0.0887, 0.1172],
        [0.0929, 0.1123, 0.0982, 0.0838, 0.1146, 0.0982, 0.1041, 0.0880, 0.0909, 0.1170],
        [0.0930, 0.1116, 0.0990, 0.0834, 0.1171, 0.0973, 0.1042, 0.0889, 0.0882, 0.1172],
        [0.0924, 0.1114, 0.0979, 0.0846, 0.1137, 0.0976, 0.1046, 0.0887, 0.0926, 0.1164],
        [0.0929, 0.1125, 0.0992, 0.0837, 0.1139, 0.0974, 0.1055, 0.0875, 0.0912, 0.1162]])


In [137]:
F.softmax(predictions, dim=1).argmax(dim=1)

tensor([9, 9, 9, 9, 9, 9, 9, 9, 9, 9])

In [138]:
labels

tensor([9, 0, 0, 3, 0, 2, 7, 2, 5, 5])

Let's check how the predictions match with the labels.

In [139]:
F.softmax(predictions, dim=1).argmax(dim=1).eq(labels)*1

tensor([1, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [140]:
def get_num_correct(predictions, labels):
    return F.softmax(predictions, dim=1).argmax(dim=1).eq(labels).sum().item()

In [141]:
get_num_correct(predictions, labels)

1