<a href="https://colab.research.google.com/github/satyajitghana/TSAI-DeepVision-EVA4.0/blob/master/NeuralArchitecture.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
# call upon all the gods
from __future__ import print_function
import torch
import torch.nn as nn # torch neural network
import torch.nn.functional as F # torch functions
import torch.optim as optim # optimizer
from torchvision import datasets, transforms # datasets and transforms

In [0]:
"""nn.Module: Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:
"""

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()                     # format - <channels> x <rows> x <cols>
        self.conv1 = nn.Conv2d(1, 32, 3, padding=1)     # input - 1x28x28   | output - 32x28x28     | RF - 3x3
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1)    # input - 32x28x28  | output - 64x28x28     | RF - 5x5
        self.pool1 = nn.MaxPool2d(2, 2)                 # input - 64x28x28  | output - 64x14x14     | RF - 10x10
        self.conv3 = nn.Conv2d(64, 128, 3, padding=1)   # input - 64x14x14  | output - 128x14x14    | RF - 12x12
        self.conv4 = nn.Conv2d(128, 256, 3, padding=1)  # input - 128x14x14 | output - 256x14x14    | RF - 14x14
        self.pool2 = nn.MaxPool2d(2, 2)                 # input - 256x14x14 | output - 256x7x7      | RF - 28x28
        self.conv5 = nn.Conv2d(256, 512, 3)             # input - 256x7x7   | output - 512x5x5      | RF - 30x30
        self.conv6 = nn.Conv2d(512, 1024, 3)            # input - 512x5x5   | output - 1024x3x3     | RF - 32x32
        self.conv7 = nn.Conv2d(1024, 10, 3)             # input - 1024x3x3  | output - 10x1x1       | RF - 34x34

# code that i added
        # self.fc1 = nn.Linear(9216, 128)
        # self.fc2 = nn.Linear(128, 10)

    """forward
    Defines the computation performed at every call.

    Args:
        x: the input

    Returns:
        log_softmax(x)
    """
    def forward(self, x):
        x = self.pool1(F.relu(self.conv2(F.relu(self.conv1(x))))) # performs conv1 -> relu -> conv2 -> relu -> pool1
        x = self.pool2(F.relu(self.conv4(F.relu(self.conv3(x))))) # performs conv3 -> relu -> conv4 -> relu -> pool2
        x = F.relu(self.conv6(F.relu(self.conv5(x))))             # performs conv5 -> relu -> conv6 -> relu
        x = F.sigmoid(self.conv7(x))                                 # performs conv7 -> relu
        x = x.view(-1, 10)                                        # similar to reshape in numpy

# code that i added
        # x = torch.flatten(x, 1)
        # x = F.relu(self.fc1(x))
        # x = self.fc2(x)
        return F.log_softmax(x, dim=0)                            # perform a softmax along the 10 outputs

In [0]:
class NetV2(nn.Module):
    def __init__(self):
        super(NetV2, self).__init__()                   # format - <channels> x <rows> x <cols>
        self.conv1 = nn.Conv2d(1, 32, 3, padding=1)     # input - 1x28x28   | output - 32x28x28     | RF - 3x3
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1)    # input - 32x28x28  | output - 64x28x28     | RF - 5x5
        self.pool1 = nn.MaxPool2d(2, 2)                 # input - 64x28x28  | output - 64x14x14     | RF - 10x10
        self.conv3 = nn.Conv2d(64, 128, 3, padding=1)   # input - 64x14x14  | output - 128x14x14    | RF - 12x12
        self.pool2 = nn.MaxPool2d(2, 2)                 # input - 128x14x14 | output - 128x7x7      | RF - 24x24
        self.conv4 = nn.Conv2d(128, 256, 3)             # input - 128x7x7   | output - 256x5x5      | RF - 26x26
        self.conv5 = nn.Conv2d(256, 10, 5)              # input - 256x5x5   | output - 10x1x1       | RF - 28x38

    def forward(self, x):
        x = self.pool1(F.relu(self.conv2(F.relu(self.conv1(x)))))
        x = self.pool2(F.relu(self.conv3(x)))
        x = F.relu(self.conv4(x))
        x = F.relu(self.conv5(x))
        x = x.view(-1, 10)
        return F.log_softmax(x, dim=0)

In [21]:
!pip install torchsummary



In [22]:
from torchsummary import summary

# check if nvidia cuda gpu is available
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
# transfer the model to the device chosen above
model = Net().to(device)

# print the summary of the model
summary(model, input_size=(1, 28, 28), batch_size=-1, device='cuda')

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 32, 28, 28]             320
            Conv2d-2           [-1, 64, 28, 28]          18,496
         MaxPool2d-3           [-1, 64, 14, 14]               0
            Conv2d-4          [-1, 128, 14, 14]          73,856
            Conv2d-5          [-1, 256, 14, 14]         295,168
         MaxPool2d-6            [-1, 256, 7, 7]               0
            Conv2d-7            [-1, 512, 5, 5]       1,180,160
            Conv2d-8           [-1, 1024, 3, 3]       4,719,616
            Conv2d-9             [-1, 10, 1, 1]          92,170
Total params: 6,379,786
Trainable params: 6,379,786
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 1.51
Params size (MB): 24.34
Estimated Total Size (MB): 25.85
-------------------------------------



In [0]:
# seed the model to obtain consistent results
torch.manual_seed(1)

# set the batch size, preferably to 2^x values
batch_size = 128

# this makes sure that the data stays in the memory
kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}

# load the train data and perform standard normalization
# Normalize does the following for each channel:
# image = (image - mean) / std
# The parameters mean, std

train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                    transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ])),
    batch_size=batch_size, shuffle=True, **kwargs)

# load the test data and perform standard normalization
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False, transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ])),
    batch_size=batch_size, shuffle=True, **kwargs)


In [0]:
from tqdm import tqdm
# from tqdm.auto import tqdm, trange

"""trains the model

Args
    model: the model to be trained
    device: the device on which to be trained, cpu/gpu
    train_loader: the train data loader from torch.utils.data.DataLoader
    optimizer: the optimizer to use for training
    epoch: the number of epoch to run for

Returns
    None
"""
def train(model, device, train_loader, optimizer, epoch):
    # set the model on train mode
    model.train()
    pbar = tqdm(train_loader)
    for batch_idx, (data, target) in enumerate(pbar):

        # move the data to the device
        data, target = data.to(device), target.to(device)

        # zero the gradients
        optimizer.zero_grad()

        # get the model output for the data
        output = model(data)

        # loss is negative log likeli-hood
        loss = F.nll_loss(output, target)

        # flow the gradients backward
        loss.backward()

        # optimizer.step is performs a parameter update based on the current gradient (stored in .grad attribute of a parameter) and the update rule
        optimizer.step()

        # this is just for pretty printing
        pbar.set_description(desc= f'loss={loss.item():.9f} batch_id={batch_idx:05d}')

"""tests the model

Args
    model: the model to test
    device: the device to use
    test_loader: the test data loader from torch.utils.data.DataLoader
"""
def test(model, device, test_loader):

    # set the model on eval mode
    model.eval()

    # set the test loss to zero
    test_loss = 0

    # number of correct classifications
    correct = 0

    # turn off gradients, since we are in test mode
    with torch.no_grad():
        for data, target in test_loader:
            # move the data to device
            data, target = data.to(device), target.to(device)

            # get the model output
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

In [25]:
# move the model to device
model = Net().to(device)
# stochastic gradient descent with model parameters, learning rate and momentum
optimizer = optim.SGD(model.parameters(), lr=0.05, momentum=0.9)

# run the model for range number of times
for epoch in range(1, 2):
    # train the model
    train(model, device, train_loader, optimizer, epoch)

    # test the model
    test(model, device, test_loader)

loss=3.763615608 batch_id=00468: 100%|██████████| 469/469 [00:18<00:00, 25.67it/s]



Test set: Average loss: 4.0605, Accuracy: 9566/10000 (96%)



# Memory Pinning

Host to GPU copies are much faster when they originate from pinned (page-locked) memory. See Use pinned memory buffers for more details on when and how to use pinned memory generally.

For data loading, passing ``pin_memory=True`` to a DataLoader will automatically put the fetched data Tensors in pinned memory, and thus enables faster data transfer to CUDA-enabled GPUs

Read on how a FC layer can be converted to a CONV layer : http://cs231n.github.io/convolutional-networks/#convert

Converting FC layers to CONV layers
It is worth noting that the only difference between FC and CONV layers is that the neurons in the CONV layer are connected only to a local region in the input, and that many of the neurons in a CONV volume share parameters. However, the neurons in both layers still compute dot products, so their functional form is identical. Therefore, it turns out that it’s possible to convert between FC and CONV layers:

For any CONV layer there is an FC layer that implements the same forward function. The weight matrix would be a large matrix that is mostly zero except for at certain blocks (due to local connectivity) where the weights in many of the blocks are equal (due to parameter sharing).
Conversely, any FC layer can be converted to a CONV layer. For example, an FC layer with K=4096 that is looking at some input volume of size 7×7×512 can be equivalently expressed as a CONV layer with F=7,P=0,S=1,K=4096. In other words, we are setting the filter size to be exactly the size of the input volume, and hence the output will simply be 1×1×4096 since only a single depth column “fits” across the input volume, giving identical result as the initial FC layer.
FC->CONV conversion. Of these two conversions, the ability to convert an FC layer to a CONV layer is particularly useful in practice. Consider a ConvNet architecture that takes a 224x224x3 image, and then uses a series of CONV layers and POOL layers to reduce the image to an activations volume of size 7x7x512 (in an AlexNet architecture that we’ll see later, this is done by use of 5 pooling layers that downsample the input spatially by a factor of two each time, making the final spatial size 224/2/2/2/2/2 = 7). From there, an AlexNet uses two FC layers of size 4096 and finally the last FC layers with 1000 neurons that compute the class scores. We can convert each of these three FC layers to CONV layers as described above:

Replace the first FC layer that looks at [7x7x512] volume with a CONV layer that uses filter size F=7, giving output volume [1x1x4096].
Replace the second FC layer with a CONV layer that uses filter size F=1, giving output volume [1x1x4096]
Replace the last FC layer similarly, with F=1, giving final output [1x1x1000]
Each of these conversions could in practice involve manipulating (e.g. reshaping) the weight matrix W in each FC layer into CONV layer filters. It turns out that this conversion allows us to “slide” the original ConvNet very efficiently across many spatial positions in a larger image, in a single forward pass.

For example, if 224x224 image gives a volume of size [7x7x512] - i.e. a reduction by 32, then forwarding an image of size 384x384 through the converted architecture would give the equivalent volume in size [12x12x512], since 384/32 = 12. Following through with the next 3 CONV layers that we just converted from FC layers would now give the final volume of size [6x6x1000], since (12 - 7)/1 + 1 = 6. Note that instead of a single vector of class scores of size [1x1x1000], we’re now getting an entire 6x6 array of class scores across the 384x384 image.

Evaluating the original ConvNet (with FC layers) independently across 224x224 crops of the 384x384 image in strides of 32 pixels gives an identical result to forwarding the converted ConvNet one time.

Naturally, forwarding the converted ConvNet a single time is much more efficient than iterating the original ConvNet over all those 36 locations, since the 36 evaluations share computation. This trick is often used in practice to get better performance, where for example, it is common to resize an image to make it bigger, use a converted ConvNet to evaluate the class scores at many spatial positions and then average the class scores.