#### **Welcome to Assignment 3 on Deep Learning for Computer Vision.**
<!-- This assignment consists of three parts. Part-1 is based on the content you learned in Week-3 of course and Part-2 is based on the content you learned in Week-4 of the course. Part-3 is **un-graded** and mainly designed to help you flex the Deep Learning muscles grown in Part-2. 

Unlike the first two parts, you'll have to implement everything from scratch in Part-3. If you find answers to questions in Part-3, feel free to head out to the forums and discuss them with your classmates! -->

#### **Instructions**
1. Use Python 3.x to run this notebook
2. Write your code only in between the lines 'YOUR CODE STARTS HERE' and 'YOUR CODE ENDS HERE'.
you should not change anything else in the code cells, if you do, the answers you are supposed to get at the end of this assignment might be wrong.
3. Read documentation of each function carefully.
4. All the Best!


### Part-1: Resnet-18 from scratch

In this question, you'll have to code Resnet-18 from scratch (we have provided a lot of starter code), this'll help you get a hold on how to code an architecture with skip connections and blocks of layers.

It's suggested you first briefly understand how the Resnet architecture is defined originally before you start with this question. We do take inspiration from the original Pytorch implementation, but if you try peeking into the original source code in the library, it'll confuse you more than helping!

**Sidenote:** As this assignment is mainly focused on learning things, we train the models only for a small number of epochs and don't focus on hyper-parameter tuning. When you start using deep learning in real-world applications and competitions, hyper-parameter tuning plays a decent role!

In [1]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
import torchvision
import torch.nn.functional as F
import timeit
import unittest

## Please DONOT remove these lines. 
torch.manual_seed(0)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(0)

In [7]:
# check availability of GPU and set the device accordingly
device = torch.device("cuda")
#print(device)

# define a set of transforms for preparing the dataset
transform_train = transforms.Compose([
        # use random crop with image size fo 32 and padding of 8 
        # flip the image horizontally (use pytorch random horizontal flip)
        # convert the image to a pytorch tensor
        # normalise the images with mean and std of the dataset 
        # mean: (0.4914, 0.4822, 0.4465) std: (0.2023, 0.1994, 0.2010)
        transforms.RandomCrop(32, padding=8),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.4914, 0.4822, 0.4465],
                              std=[0.2023, 0.1994, 0.2010])])


# define transforms for the test data: Should they be same as the one used for train? 
transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.4914, 0.4822, 0.4465],
                         std=[0.2023, 0.1994, 0.2010]) #sequence of means and stds for each channel
])
        
    
        # convert the image to a pytorch tensor
        # normalise the images with mean and std of the dataset 
        # mean: (0.4914, 0.4822, 0.4465) std: (0.2023, 0.1994, 0.2010)

use_cuda = True # if you have acess to a GPU, enable it to speed the training 

cuda


In [11]:
# Load the CIFAR-10 training, test datasets using `torchvision.datasets.CIFAR10`
#### YOUR CODE STARTS HERE ####
train_dataset = torchvision.datasets.CIFAR10(root = "./data", train = True, download=True, transform=transform_train)
test_dataset = torchvision.datasets.CIFAR10(root = "./data", train = False, download=True, transform=transform_test)

print(len(train_dataset))
print(len(test_dataset))
print(train_dataset[0][0].size())
#### YOUR CODE ENDS HERE ####

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data\cifar-10-python.tar.gz



HBox(children=(HTML(value=''), FloatProgress(value=1.0, bar_style='info', layout=Layout(width='20px'), max=1.0…

Extracting ./data\cifar-10-python.tar.gz to ./data
Files already downloaded and verified
50000
10000
torch.Size([3, 32, 32])


In [26]:
# create dataloaders for training and test datasets
# use a batch size of 32 and set shuffle=True for the training set
#### YOUR CODE STARTS HERE ####
train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_dataloader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=True)
#### YOUR CODE ENDS HERE ####

In [32]:
def conv3x3(in_planes, out_planes, stride=1, groups=1):
    # define a convolutional layer with a kernel size of 3x3
    # use stride, groups values passed to the function along with a padding of 1 and dilation of 1
    # set bias to False
    #### YOUR CODE STARTS HERE ####
    layer = nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=1, dilation=1, bias = False)
    #### YOUR CODE ENDS HERE ####
    return layer


def conv1x1(in_planes, out_planes, stride=1):
    # define a convolutional layer with a kernel size of 1x1
    # use stride value passed to the function
    # set bias to False
    # leave all other parameters to default values
    #### YOUR CODE STARTS HERE ####
    layer = nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)
    #### YOUR CODE ENDS HERE ####
    return layer

class BasicBlock(nn.Module):
    expansion = 1
    def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
                 base_width=64):
        super(BasicBlock, self).__init__()
        #### YOUR CODE STARTS HERE ####
        # define batch-norm layer to for easy use (you don't have to call it here)
        norm_layer = nn.BatchNorm2d
        # define a 3x3 convolution layer with inplanes as in-channels and planes and out_channels, use the passed value of stride
        self.conv1 = conv3x3(inplanes, planes, stride, groups)
        # define a batchnorm layer (use the norm_layer defined above)
        self.bn1 = norm_layer(planes)
        # define a relu layer with inplace set to True
        self.relu = nn.ReLU(inplace=True) # modifies the input directly without allocating any output
        # define a 3x3 convolution layer with inplanes as in-channels and planes and out_channels
        self.conv2 = conv3x3(planes, planes)
        # define a batchnorm layer (use the norm_layer defined above)
        self.bn2 = norm_layer(planes)
        #### YOUR CODE ENDS HERE ####
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        #### YOUR CODE STARTS HERE ####
        # make a copy of nput (for using them in skip connections)
        identity = x

        # pass the input through, conv1->bn1->relu->conv2->bn2
        x1 = self.conv1(x)
        x1 = self.bn1(x1)
        x1 = self.relu(x1)
        x1 = self.conv2(x1)
        x1 = self.bn2(x1)
        #### YOUR CODE ENDS HERE ####

        if self.downsample is not None:
            identity = self.downsample(x)

        #### YOUR CODE STARTS HERE ####
        # add the skip connection
        out = x1 + identity
        out = self.relu(out)
        # use a relu activation on `out`

        #### YOUR CODE ENDS HERE ####

        return out

In [24]:
class ResNet18(nn.Module):
  # first start with make_layer method followed by __init__, forward methods
    def __init__(self, block, num_classes=10, groups=1):
        super(ResNet18, self).__init__()
        
        # define batch-norm layer to for easy use (you don't have to call it here)
        #### YOUR CODE STARTS HERE ####
        norm_layer = nn.BatchNorm2d
        #### YOUR CODE ENDS HERE ####
        self._norm_layer = norm_layer
        self.inplanes = 64
        self.dilation = 1

        self.groups = groups
        self.base_width = 64
        #### YOUR CODE STARTS HERE ####
        # define a conv layer with number of image channels as in-channels and inplanes ans out-channles,
        # use a kernel size of 7, stride of 2, padding of 3 and set bias to False 
        self.conv1 = nn.Conv2d(3, self.inplanes, 7, stride=2, padding=3, bias=False)
        # define a batchnorm layer (use the norm_layer defined above)
        self.bn1 = norm_layer(self.inplanes)
        # define a relu layer with inplace set to True
        self.relu = nn.ReLU(inplace=True)
        # define a maxpool layer with kernel size of 3, stride of 2, padding of 1
        self.maxpool = nn.MaxPool2d(3, stride=2, padding=1)
        # complete the make layer method below and use it with the block value passed to init
        # with 64 planes and 2 blocks
        self.layer1 = self._make_layer(block, 64, 2, stride =2)
        # use  make layer method to define a second set of layers with the block value passed to init
        # with 128 planes and 2 blocks and a stride value of 2
        self.layer2 = self._make_layer(block, 128, 2, stride=2)
        # use  make layer method to define a second set of layers with the block value passed to init
        # with 256 planes and 2 blocks and a stride value of 2
        self.layer3 = self._make_layer(block, 256, 2, stride=2)
        # use  make layer method to define a second set of layers with the block value passed to init
        # with 512 planes and 2 blocks and a stride value of 2
        self.layer4 = self._make_layer(block, 512, 2, stride=2)
        # define  adaptive avergae pooling layer with output size (1, 1)
        self.avgpool = nn.AdaptiveAvgPool2d((1,1))
        #### YOUR CODE ENDS HERE ####
        self.fc = nn.Linear(512 * block.expansion, num_classes)

        #### YOUR CODE STARTS HERE ####        
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                # initialise the weights with kaiming normal, set mode to fan out and 
                # non_linearity to the activation function you used above
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, nn.BatchNorm2d):
                # initialise weights with a value of 1 and bias with a value of 0
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
         #### YOUR CODE ENDS HERE ####

    def _make_layer(self, block, planes, blocks, stride=1):
        norm_layer = self._norm_layer
        downsample = None
        previous_dilation = self.dilation
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                conv1x1(self.inplanes, planes * block.expansion, stride),
                norm_layer(planes * block.expansion),
            )
        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample, self.groups,
                            self.base_width))
        self.inplanes = planes * block.expansion
        for _ in range(1, blocks):
            #### YOUR CODE STARTS HERE ####
            # append the blocks to layers, leave stride and downsample to default values
            layers.append(block(self.inplanes, planes, groups=self.groups,
                                base_width=self.base_width))
            #### YOUR CODE ENDS HERE ####
        
        return nn.Sequential(*layers)

    def forward(self, x):
        #### YOUR CODE STARTS HERE ####
        # complete the forward pass
        # order of layers: conv1->bn1->relu->maxpool->layer1->layer2->layer3->layer4->avgpool->fc
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)
        #### YOUR CODE ENDS HERE ####
        return x

In [36]:
def train(model, device, train_loader, criterion, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
      #### YOUR CODE STARTS HERE ####
        # send the image, target to the device
        data = data.to(device)
        target = target.to(device)
        # flush out the gradients stored in optimizer
        model.zero_grad()
        # pass the image to the model and assign the output to variable named output
        output = model(data)
        # calculate the loss (use cross entropy in pytorch)
        loss = nn.CrossEntropyLoss()(output, target)
        # do a backward pass
        loss.backward()
        # update the weights
        optimizer.step()
      #### YOUR CODE ENDS HERE ####
        if batch_idx % 20 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

In [37]:
def test(model, device, test_loader, criterion):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
          ### YOUR CODE STARTS HERE ####
            # send the image, target to the device
            data = data.to(device)
            target = target.to(device)
            # pass the image to the model and assign the output to variable named output
            output = model(data)
            test_loss += nn.CrossEntropyLoss()(output, target).item()
          #### YOUR CODE ENDS HERE ####
            test_loss += criterion(output, target).item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

In [39]:
model = ResNet18(BasicBlock, num_classes=10).to(device)
criterion = nn.CrossEntropyLoss().cuda()
## Define Adam Optimiser with a learning rate of 0.01
optimizer = optim.Adam(model.parameters(), lr=0.01)

start = timeit.default_timer()
for epoch in range(1, 11):
    train(model, device, train_dataloader, criterion, optimizer, epoch)
    test(model, device, test_dataloader, criterion)
stop = timeit.default_timer()
print('Total time taken: {} seconds'.format(int(stop - start)) )


Test set: Average loss: 0.1218, Accuracy: 3371/10000 (34%)


Test set: Average loss: 0.0941, Accuracy: 4323/10000 (43%)




Test set: Average loss: 0.0831, Accuracy: 5170/10000 (52%)


Test set: Average loss: 0.0734, Accuracy: 5816/10000 (58%)




Test set: Average loss: 0.0709, Accuracy: 6073/10000 (61%)


Test set: Average loss: 0.0628, Accuracy: 6532/10000 (65%)




Test set: Average loss: 0.0558, Accuracy: 6873/10000 (69%)


Test set: Average loss: 0.0527, Accuracy: 7098/10000 (71%)




Test set: Average loss: 0.0524, Accuracy: 7163/10000 (72%)


Test set: Average loss: 0.0478, Accuracy: 7341/10000 (73%)

Total time taken: 390 seconds


#### Question-1

Report the final test accuracy displayed above (If you are not getting the exact number shown in options, please report the closest number).
1. 94%
2. 76%
3. 48%
4. 85%

### Part-2: Transfer Learning - ResNet50



### Download and prepare the Tiny-Imagenet dataset


In [None]:
!ls

In [None]:
!wget http://cs231n.stanford.edu/tiny-imagenet-200.zip && unzip -qq tiny-imagenet-200.zip && rm tiny-imagenet-200.zip

In [None]:
## DONOT modify the code in this cell!
## For the curiosu: We're re-organising the files into standard format for easier dataloading

import pandas as pd
import os
import shutil
import glob
import copy

categories = os.listdir('tiny-imagenet-200/train/')
assert len(categories) == 200
for each in categories:
    os.mkdir(f'tiny-imagenet-200/val/{each}')

df = pd.read_csv('tiny-imagenet-200/val/val_annotations.txt', delimiter='\t', header=None)

label_to_cat = dict(zip(df[0], df[1]))

for each in glob.glob('tiny-imagenet-200/val/images/*.JPEG'):
    src = copy.copy(each)
    fl_name = each.split('/')[-1]
    dest = each.replace('images', label_to_cat[fl_name])
    shutil.move(src, dest)

In [None]:
!rm -rf tiny-imagenet-200/val/images/ tiny-imagenet-200/val/val_annotations.txt tiny-imagenet-200/test/

In [None]:
# This is shold return 10000
!cd tiny-imagenet-200/val/ && find . -type f | wc -l

### Model building


In [None]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
import torchvision
import torch.nn.functional as F
import timeit
import unittest

## Please DONOT remove these lines. 
torch.manual_seed(0)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(0)

In [None]:
# check availability of GPU and set the device accordingly
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# define a set of transforms for preparing the dataset
# use mean and std of imagenet dataset
normalize = transforms.Normalize()

transform_train = transforms.Compose([
        transforms.RandomCrop(32, padding=8),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.4914, 0.4822, 0.4465],
                              std=[0.2023, 0.1994, 0.2010])]) 
          # use random-resized-crop with a image size of 224
          # flip the image horizontally (use pytorch random horizontal flip)
          # convert the image to a pytorch tensor
          # normalise the image
            
# define transforms for the test data: Should they be same as the one used for train? 
transform_test = transforms.Compose([
          transforms.Resize((256,256)),
          transforms.CenterCrop((224,224)),
          transforms.ToTensor(),
          transforms.Normalize(mean=[0.4914, 0.4822, 0.4465],
                              std=[0.2023, 0.1994, 0.2010])])
          # re-size the images to 256x256
          # center-crop the 256 images to 224x224
          # convert the image to a pytorch tensor
          # normalise the image
  

use_cuda = torch.cuda.is_available() # if you have acess to a GPU, enabble it to speed the training 

In [None]:
!ls # You should see tiny-imagenet-200 folder 

In [None]:
# Load the training, test datasets using `torchvision.datasets.ImageFolder`
#### YOUR CODE STARTS HERE ####
train_dataset =  
test_dataset = 
#### YOUR CODE ENDS HERE ####

In [None]:
# create dataloaders for training and test datasets
# use a batch size of 32 and set shuffle=True for the training set
#### YOUR CODE STARTS HERE ####
train_dataloader = 
test_dataloader = 
#### YOUR CODE ENDS HERE ####

In [None]:
def train(model, device, train_loader, criterion, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
      #### YOUR CODE STARTS HERE ####
        # send the image, target to the device

        # flush out the gradients stored in optimizer

        # pass the image to the model and assign the output to variable named output

        # calculate the loss (use cross entropy in pytorch)

        # do a backward pass

        # update the weights

      #### YOUR CODE ENDS HERE ####
        if batch_idx % 20 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

In [None]:
def test(model, device, test_loader, criterion):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
          ### YOUR CODE STARTS HERE ####
            # send the image, target to the device

            # pass the image to the model and assign the output to variable named output

          #### YOUR CODE ENDS HERE ####
            test_loss += criterion(output, target).item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

### Question-2

What are the number of input features for the final FC layer in Resnet-50? (Hint: Use the code below)

1. 1024
2. 512
3. 784
4. 2048


In [None]:
# use the resnet50 model provided by pytorch with pre-trained parameter set to true
# detach the final FC layer of Resnet-50 and attach a layer with 200 output nodes (number of classes in tiny-imagenet)
### YOUR CODE STARTS HERE ####
model = 

### YOUR CODE ENDS HERE ####
model = model.to(device)

criterion = nn.CrossEntropyLoss().cuda()
## Define Adam Optimiser with a learning rate of 0.01 (You should add the FC layer parameters only)
### YOUR CODE STARTS HERE ####
optimizer = 
### YOUR CODE ENDS HERE ####

start = timeit.default_timer()
for epoch in range(1, 5):
    train(model, device, train_dataloader, criterion, optimizer, epoch)
    test(model, device, test_dataloader, criterion)
stop = timeit.default_timer()
print('Total time taken: {} seconds'.format(int(stop - start)) )

#### Question-3

Report the final test accuracy displayed above (If you are not getting the exact number shown in options, please report the closest number).

1. 83%
2. 35%
3. 70%
4. 94%