In [0]:
import numpy as np
import torch
import torch.nn as nn

# The Simplest Neural Net

Let's try to implement a simple, one hidden-layered fully connected neural network using torch.nn package. For our example, we consider a 10-dimensional input and 2-dimensional output with the hidden layer of 50 neurons.

In [2]:
input_dim = 10
output_dim = 2
hidden_dim = 50

# Sequential is a container that can hold sequentially occurring layers
model = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, output_dim))

# Let's try to print the model and see what turns up
print(model)

# Let's also see the 'learnable' network parameters. This will be used later
for i, param  in enumerate(model.parameters()):
    print(param.shape)

Sequential(
  (0): Linear(in_features=10, out_features=50, bias=True)
  (1): ReLU()
  (2): Linear(in_features=50, out_features=2, bias=True)
)
torch.Size([50, 10])
torch.Size([50])
torch.Size([2, 50])
torch.Size([2])


Notice, that for each layer, say the first layer, we have a weight matrix of shape (50 x 10) and a bias vector of shape 50. Likewise for the second layer.
Let's try to pass some inputs to the model and check the outputs. While defining the model, we need not care about the batch (or rather minibatch) size of the input. The first dimension of the input is always considered to be the batch dimension.

In [3]:
batch_size = 32
input = torch.rand(batch_size, input_dim)
output = model(input)

# Let's check the size of the output
print(output.size())

torch.Size([32, 2])


## **Intro to nn.Module of PyTorch**

In this module we'll introduce you to PyTorch's nn.Module class, and how you can use them to introduce complicated architectures. <br>

It is not possible to create all forms of architecture using nn.Sequential, many architectures such as ResNet and Inception do not have a linear sequence in which data is passed through. <br>

For example following is a ResNet block, which can not be implemented using nn.Sequential.
![image](resnet_block.png)

However its easy to implement such blocks using nn.Module. <br>

In this class, we'll first introduce how to form a simple fully connected architecture and CNN using nn.Module.

Then we'll use the power of nn.Module to show how more complex architectrues can be formed, like the ResNet block above.

In [0]:
## Let's start by constructing a fully connected nn.Module, equivalent to one we constructed in previous class using
## nn.Sequential

## When defining a model using nn.Moudule two methods need to implemented i.e __init__ and forward.
## nn.Module takes care of the backward pass for you.
class TwoLayerNet(nn.Module):
    def __init__(self, D_in, H, D_out):
        '''
        D_in: Dimensionality of the input
        H: Hidden layer dimensionality
        D_out: Dimensionality of the output
        '''
        super(TwoLayerNet, self).__init__()
        
        # Define all layers which have weights here.
        self.linear1 = nn.Linear(D_in, H)  
        self.linear2 = nn.Linear(H, D_out)
        self.relu = nn.ReLU()
        
    def forward(self, x):
        '''
        x: input tensor of dimensionality (b, D_in), where b is batch size and D_in is input dimension.
        
        returns: y_pred, of shape (b, D_out), where D_out is output dimension.
        '''
        h_relu = self.relu(self.linear1(x)) # nn.Module layers can be called as a function on the input
        
        # Using for displaying the shape of hidden layer
        # DON'T WRITE THIS when actually implementing your models
        print("\nHidden Activation Layer has shape: {}".format(h_relu.shape)) 
        
        y_pred = self.linear2(h_relu) 
        return y_pred

In [5]:
## In the block above we defined a simple 2-layer fully connected network, now let's pass an input to it.

# N is batch size; D_in is input dimension;
# H is the dimension of the hidden layer; D_out is output dimension.
N, D_in, H, D_out = 32, 100, 50, 10

# Construct our model by instantiating the class defined above
model = TwoLayerNet(D_in, H, D_out)

# Lets check the shapes of each layer
print(model)


# Create random Tensors to hold inputs and outputs, and wrap them in Variables
x = torch.randn(N, D_in)  # dim: 32 x 100

# Lets check shape of x and its type
print("\nx has shape {}".format(x.shape))




# Forward pass: Compute predicted y by passing x to the model
# nn.Module model can be used on input by directly calling the model on x.
y_pred = model(x)   # dim: 32 x 10

# Lets check the shape of each y , the output

print("\ny_pred has shape: {}".format(y_pred.shape))

TwoLayerNet(
  (linear1): Linear(in_features=100, out_features=50, bias=True)
  (linear2): Linear(in_features=50, out_features=10, bias=True)
  (relu): ReLU()
)

x has shape torch.Size([32, 100])

Hidden Activation Layer has shape: torch.Size([32, 50])

y_pred has shape: torch.Size([32, 10])


Notice how easy it is to construct a network using nn.Module, we only need to define how the forward pass of our function looks like, and we can use the data from intermediate layers as well. Next lets' construct a Basic Convolutional Network

## Basic Convolutional Network using nn.Module

Now that we know how to implement a simple neural network using nn.Module we'll next construct a basic convnet using nn.Module and also show how nn.Module takes care of the back prop on its own.

In [6]:
class BasicConvNet(nn.Module):

    def __init__(self):
        super(BasicConvNet, self).__init__()
        
       
        # Again define all layers with weights here.
        
        self.conv1 = nn.Conv2d(1, 6, 5) # 1 input image channel, 6 output channels, 5x5 square convolution kernel
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        self.relu = nn.ReLU()
        self.max_pool2d = nn.MaxPool2d(2)

    def forward(self, x):
        
        # Input-> Conv1 -> ReLU
        x = self.relu(self.conv1(x))
        print("Conv1 Activation has output size: {}\n".format(x.shape))
        
        # Max pooling over a (2, 2) window
        x = self.max_pool2d(x)  
        print("MaxPool1 has output size: {}\n".format(x.shape))
        
        # If the size is a square you can only specify a single number
        x = self.relu(self.conv2(x))
        print("Conv2 Activation has output size: {}\n".format(x.shape))
        
        x = self.max_pool2d(x)
        print("MaxPool2 has output size: {}\n".format(x.shape))
        
        # x.view is used to flatten the output. if -1 is used, the given dimension is inferred.
        
        x = x.view(x.size(0), -1) 
        print("Flattened output has shape: {}\n".format(x.shape))
        
        x = self.relu(self.fc1(x))
        print("Linear1 Activation has shape: {}\n".format(x.shape))
        
        x = self.relu(self.fc2(x))
        print("Linear2 Activation has shape: {}\n".format(x.shape))
        
        x = self.fc3(x)
        return x


net = BasicConvNet()
print(net)

BasicConvNet(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
  (relu): ReLU()
  (max_pool2d): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)


In [7]:
params = list(net.parameters()) # Retrieve the parameters of net.
print("Conv1's weights: {}".format(params[0].size()))  # conv1's .weight

Conv1's weights: torch.Size([6, 1, 5, 5])


In [8]:
input = torch.randn(1, 1, 32, 32) # Input shape is (batch_size, number_of_channels, height, width)
out = net(input) # Get the output

print("\nShape of output: {}\n".format(out.shape))
print("Output is : \n{}".format(out)) 

Conv1 Activation has output size: torch.Size([1, 6, 28, 28])

MaxPool1 has output size: torch.Size([1, 6, 14, 14])

Conv2 Activation has output size: torch.Size([1, 16, 10, 10])

MaxPool2 has output size: torch.Size([1, 16, 5, 5])

Flattened output has shape: torch.Size([1, 400])

Linear1 Activation has shape: torch.Size([1, 120])

Linear2 Activation has shape: torch.Size([1, 84])


Shape of output: torch.Size([1, 10])

Output is : 
tensor([[-0.0241,  0.0893, -0.0318,  0.0329,  0.0392, -0.0038, -0.0329, -0.0719,
         -0.0481,  0.0742]], grad_fn=<AddmmBackward>)


In [9]:
# Lets now see how the backward pass of nn.Module can be computed directly.

net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

out.backward(torch.randn(1, 10)) # Pass a random gradient in the buffer

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)
!pwd

conv1.bias.grad before backward
None
conv1.bias.grad after backward
tensor([-0.0455, -0.0152,  0.0326,  0.0486, -0.0575, -0.0077])
/content


## ResNet Block

Now we have seen how to construct basic architecture's using nn.Module, we will next see modules which cannot be constructed using nn.Sequential, as the data flow within these architectures is not linear, we can re-use our intermediate layers directly. We will implement ResNet Block using nn.Module now as shown below.
<br>
![ResNet Block](https://miro.medium.com/max/1140/1*D0F3UitQ2l5Q0Ak-tjEdJg.png)


In a single resnet block a skip connection is present between the input layers and output layers of the block.

In [10]:
class BasicBlock(nn.Module):

    def __init__(self, in_planes, out_planes, stride=1, downsample=None):
        '''
        in_planes: number of input channels
        out_planes: number of output channels
        stride: stride to set for conv layers
        downsample: function used to downsample before skip-connection 
        '''
        super(BasicBlock, self).__init__()
        
        self.conv1 = nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                     padding=1, bias=False)
        
        
        self.relu = nn.ReLU(inplace=True)
        
        self.conv2 = nn.Conv2d(out_planes, out_planes, kernel_size=3, stride=stride,
                     padding=1, bias=False)
                               
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.relu(out)
        out = self.conv2(out)
               
        # Notice the skip connection!! such a connection isn't possible with nn.Sequential.
        out += identity 
        out = self.relu(out)

        return out

    
block = BasicBlock(2, 2)
print(block)

BasicBlock(
  (conv1): Conv2d(2, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (relu): ReLU(inplace=True)
  (conv2): Conv2d(2, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
)


In [11]:
input = torch.randn(1, 2, 5, 5)

output = block(input)


print("Shape of the output layer: {}\n".format(output.shape)) 

Shape of the output layer: torch.Size([1, 2, 5, 5])



Here we have shown how to construct complex architectures using nn.Module, which allows you to access the internal activations of the network and pass them onto other layers. We displayed this functionality ResNet block. <br>
As you become more advanced with PyTorch