<a href="https://colab.research.google.com/github/hemanthkumar17/DL4CV2020/blob/main/DL4CV_Assignment_4_Week_7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### **Welcome to Assignment 3 on Deep Learning for Computer Vision.**
This notebook consists of two parts. In Part-1 you'll have to code a Siamese Network, for Part-2 you need to go through a official PyTorch tutorial, understand it and answer some questions.
  
#### **Instructions**
1. Use Python 3.x to run this notebook
2. Write your code only in between the lines 'YOUR CODE STARTS HERE' and 'YOUR CODE ENDS HERE'.
you should not change anything else in the code cells, if you do, the answers you are supposed to get at the end of this assignment might be wrong.
3. Read documentation of each function carefully.
4. All the Best!

# Part-1

In [1]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
import torch.nn.functional as F
from torch.utils.data import Dataset
from torch.utils.data.sampler import BatchSampler
from torch.optim import lr_scheduler
from PIL import Image
import timeit

## Please DONOT remove these lines. 
torch.manual_seed(0)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(0)
########################

#### YOUR CODE STARTS HERE ####
# Check availability of GPU and set the device accordingly
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
#### YOUR CODE ENDS HERE ####


#### Prepare the dataset for Siamese Network

In [11]:
class SiameseDataset(Dataset):
    def __init__(self, train=True):
        
        self.train = train
        #### YOUR CODE STARTS HERE ####
        # Define a set of transforms for preparing the dataset
        mean, std = 0.1307, 0.3081
        self.transform =  transforms.Compose([
                                              transforms.ToTensor(),
                                              transforms.Normalize(mean=mean, std=std)
        ])# convert the image to a pytorch tensor
                          # normalise the images with mean and std of the dataset
        
        # Load the MNIST training, test datasets using `torchvision.datasets.MNIST
        # Set the train parameter to self.train and transform parameter to self.transform
        self.dataset = datasets.MNIST('./DL4CV/Week4/', train = self.train, transform = self.transform, download=True)

        #### YOUR CODE ENDS HERE ####
        if self.train:
            #### YOUR CODE STARTS HERE ####
            # assign input (x-values) of training data 
            self.train_data = self.dataset.train_data
            # assign labels of training data 
            self.train_labels = self.dataset.train_labels 
            # get the set of all the labels in the dataset
            self.labels_all = set(self.train_labels.numpy())
            self.label_to_idx = {label: np.where(self.train_labels.numpy() == label)[0]
                                     for label in self.labels_all} # assign a unique index to all labels in the dataset and store them in a dictionary 

            #### YOUR CODE ENDS HERE ####
        else:
            #### YOUR CODE STARTS HERE ####
            # assign input (x-values) of test data 
            self.test_data = self.dataset.test_data
            # assign labels of test data 
            self.test_labels = self.dataset.test_labels
            # get the set of all the labels in the dataset
            self.labels_all = set(self.test_labels.numpy())
            self.label_to_idx = {label: np.where(self.test_labels.numpy() == label)[0]
                                     for label in self.labels_all} # assign a unique index to all labels in the dataset and store them in a dictionary 

            #### YOUR CODE ENDS HERE ####
            # DONOT change this line of code  
            random_state = np.random.RandomState(0)

            positive_samples = [] # this will be a list of lists
            for ind in range(0, len(self.test_data), 2):
              positive_samples.append([ind, random_state.choice(self.label_to_idx[self.test_labels[ind].item()]), 1])
            
            negative_samples = []
            for ind in range(1, len(self.test_data), 2):
              negative_samples.append([ind, random_state.choice(self.label_to_idx[np.random.choice(
                                                           list(self.labels_all - set([self.test_labels[ind].item()])))]), 0])
            
            # combine both positive and negative samples into a single variable
            #### YOUR CODE STARTS HERE ####
            self.test_samples = positive_samples + negative_samples 
            #### YOUR CODE ENDS HERE ####
    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, index):
        if self.train:
            target = np.random.randint(0, 2)
            first_image, first_label = self.train_data[index], self.train_labels[index].item()
            if target == 1:
                siamese_index = index
                while siamese_index == index:
                    siamese_index = np.random.choice(self.label_to_idx[first_label])
            else:
                siamese_label = np.random.choice(list(self.labels_all - set([first_label])))
                siamese_index = np.random.choice(self.label_to_idx[siamese_label])
            second_image = self.train_data[siamese_index]
        else:
            first_image = self.test_data[self.test_samples[index][0]]
            second_image = self.test_data[self.test_samples[index][1]]
            target = self.test_samples[index][2]
        first_image = Image.fromarray(first_image.numpy(), mode='L')
        second_image = Image.fromarray(second_image.numpy(), mode='L')
        first_image = self.transform(first_image)
        second_image = self.transform(second_image)
        return (first_image, second_image), target


In [35]:
class EmbeddingNet(nn.Module):
    def __init__(self):
        super(EmbeddingNet, self).__init__()
        #### YOUR CODE STARTS HERE ####
        # Define a sequential block as per the instructions below:
        # Build three blocks with each block containing: Conv->PReLU->Maxpool layers
        # Three conv layers should have 16, 32, 64 output channels respectively
        # Use convolution kernel size 3
        # For maxpool use a kernel size of 2 and stride of 2

        self.convnet = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3),
            nn.PReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3),
            nn.PReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3),
            nn.PReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )


        # Define linear->PReLU->linear->PReLU->linear
        # The first two linear layers should have 256 and 128 output nodes
        # The final FC layer should have 2 nodes
        self.fc =nn.Sequential(
            nn.Linear(in_features=64 * 4 * 4, out_features=256),
            nn.PReLU(),
            nn.Linear(in_features=256, out_features=128),
            nn.PReLU(),
            nn.Linear(in_features=128, out_features=2),
        )

        #### YOUR CODE ENDS HERE ####

    def forward(self, x):
      #### YOUR CODE STARTS HERE ####
        # Define the forward pass, convnet -> fc
        output = self.convnet(x)
        output = output.view(output.size()[0], -1)
        output = self.fc(output)
        #### YOUR CODE ENDS HERE ####
        return output

In [36]:
class SiameseNetwork(nn.Module):
    def __init__(self, embedding_net):
        super(SiameseNetwork, self).__init__()
        self.embedding_net = embedding_net

    def forward(self, x1, x2):
        # Call the embedding network for both the inputs and return the output
        #### YOUR CODE ENDS HERE ####
        op1 = self.embedding_net(x1)
        op2 = self.embedding_net(x2)
        #### YOUR CODE ENDS HERE ####
        return op1, op2

$$
L\left(x_{0}, x_{1}, y\right)=\frac{1}{2} y\left\|f\left(x_{0}\right)-f\left(x_{1}\right)\right\|_{2}^{2}+\frac{1}{2}(1-y)\left\{\max \left(0, m-\left\|(f\left(x_{0}\right)-f\left(x_{1}) + \epsilon\right)\right\|_{2}\right)\right\}^{2}
$$

In [37]:
class ContrastiveLossSiamese(nn.Module):

    def __init__(self, margin):
        super(ContrastiveLossSiamese, self).__init__()
        self.margin = margin
        self.eps = 1e-9

    def forward(self, output1, output2, target):
        # Use the equation mentioned above to define the loss
        #### YOUR CODE STARTS HERE ####
        d = ((output2 - output1).pow(2)).sum(1)
        loss_value = 0.5 * target * d + 0.5 * (1 - target).float() * max(0, self.margin - (s + self.eps).sqrt()).pow(2)
        #### YOUR CODE ENDS HERE ####
        loss_value = loss_value.mean()

        return loss_value


In [38]:
def train(model, train_loader, device, optimizer, criterion, epoch):
    model.train()
    losses = []
    total_loss = 0

    for batch_idx, (data, target) in enumerate(train_loader):
        target = target if len(target) > 0 else None
        #### YOUR CODE STARTS HERE ####
        # send the image, target to the device
        # data is not a single value here,
        # ensure datatype of variable `data` is tuple
        target = target.to(device)
        data = tuple(d.cuda() for d in data)

        # flush out the gradients stored in optimizer
        optimizer.zero_grad()

        # pass the image to the model and assign the output to variable named outputs
        # python star operator will be useful here
        # if the datatype of outputs is not a tuple, make it to a tuple

        outputs = model(data)

        # create inputs to the contrastive loss
        # datatype of target should be tuple
        loss_inputs = outputs
        if target is not None:
            target = (target,)
            loss_inputs += target

        
        # calculate the loss using criterion 
        loss = criterion(loss_inputs)

        # append the loss to losses list and update the total_loss variable
        losses.append(loss)
        total_loss += losses

        # do a backward pass
        loss.backward()

        # update the weights
        optimizer.step()


        #### YOUR CODE ENDS HERE ####

        if batch_idx % 20 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data[0]), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), np.mean(losses)))  
    total_loss /= (batch_idx + 1)
    print('Average loss on training set: {:.6f}'.format(total_loss))

def test(model, test_loader, device, criterion):
    model.eval()
    test_loss = 0
    with torch.no_grad():
        for batch_idx, (data, target) in enumerate(test_loader):
          target = target if len(target) > 0 else None
          if not type(data) in (tuple, list):
              data = (data,)
          #### YOUR CODE STARTS HERE ####
          # send the image, target to the device
          # data is not a single value here,
          # ensure datatype of variable `data` is tuple
          data = tuple(d.cuda() for d in data)
          target = target.to(device)
          # pass the image to the model and assign the output to variable named outputs
          # python star operator will be useful here
          # if the datatype of outputs is not a tuple, make it to a tuple
          outputs = model(data)

          # create inputs to the contrastive loss
          # datatype of target should be tuple
          loss_inputs = outputs

          # calculate the loss
          loss = criterion(loss_inputs)

          # update the test+loss variable
          test_loss += loss
          
          #### YOUR CODE ENDS HERE ####

    test_loss /= len(test_loader)
    print('Average loss on test set: {:.6f}'.format(test_loss))


In [39]:
# define the training and test sets
# use SiameseDataset
mean, std = 0.1307, 0.3081

train_dataset = SiameseDataset(train=True)
test_dataset = SiameseDataset(train=False)

# create dataloaders for training and test datasets
# use a batch size of 128 and set shuffle=True for the training set, set num_workers to 2 and pin_memory to True
train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size = 128, shuffle=True, num_workers=2, pin_memory=True)
test_dataloader = torch.utils.data.DataLoader(test_dataset, batch_size = 128, shuffle=False, num_workers=2, pin_memory=True)

margin = 1.
# create a instance of the embedding network and pass it as input to Siamese network
embedding_net = EmbeddingNet()
model = SiameseNetwork(embedding_net)
model.to(device)
# define the contrative loss with the specified margin
criterion = ContrastiveLossSiamese(margin)
optimizer = optim.Adam(model.parameters())



In [41]:
start = timeit.default_timer()
for epoch in range(1, 5):
  train(model, train_dataloader, device, optimizer, criterion, epoch)
  test(model, test_dataloader, device, criterion)

stop = timeit.default_timer()
print('Total time taken: {} seconds'.format(int(stop - start)) )

TypeError: ignored

### Question 1

Run the code cell above and report the average loss on the test set loss (If you are not getting the exact number shown in options, please report the closest number).
1. 0.03
2. 0.3
3. 0.001
4. 1

# Part-2

For Part-2, go through the [Torchvision Object Detection Tutorial](https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html) and ensure you understand the tutorial completely!

After you have completely gone through the tutorial answer the following questions!

### Question 2

Consider the metrics `AP@IoU=0.5` and `AP@IoU=0.75` used in the tutorial. Which of the following statements is True?  

1. `IoU@0.75` will always be less than `IoU@0.5`
2. `IoU@0.75` will always be  greater than `IoU@0.5` 
3. `IoU@0.75` need not be always be less than `IoU@0.5`
4. `IoU@0.75` need not always be  greater than `IoU@0.5` 

### Question 3

Tutorial uses a network that is pre-trained on COCO dataset. Will training this model from scratch improve the performance? (Hint: You don't really have to re-train the model for this)

1. Yes
2. No

In [None]:
print(torch.version.__version__)