# Assignment 3

# Instructions

1. You have to use only this notebook for all your code.
2. All the results and plots should be mentioned in this notebook.
3. For final submission, submit this notebook along with the report ( usual 2-4 pages, latex typeset, which includes the challenges faces and details of additional steps, if any)
4. Marking scheme
    -  **60%**: Your code should be able to detect bounding boxes using resnet 18, correct data loading and preprocessing. Plot any 5 correct and 5 incorrect sample detections from the test set in this notebook for both the approached (1 layer and 2 layer detection), so total of 20 plots.
    -  **20%**: Use two layers (multi-scale feature maps) to detect objects independently as in SSD (https://arxiv.org/abs/1512.02325).  In this method, 1st detection will be through the last layer of Resnet18 and the 2nd detection could be through any layer before the last layer. SSD uses lower resolution layers to detect larger scale objects. 
    -  **20%**: Implement Non-maximum suppression (NMS) (should not be imported from any library) on the candidate bounding boxes.
    
5. Report AP for each of the three class and mAP score for the complete test set.

In [281]:
from __future__ import division, print_function, unicode_literals
import numpy as np
import torch
from torch import optim
from torch.optim import lr_scheduler
import torch.utils.data
import torch.nn as nn
from torchvision import transforms, datasets
import torchvision.models as models
from torch.autograd import Variable
from torchsummary import summary
import torch.nn.functional as F
import matplotlib.pyplot as plt
%matplotlib inline
plt.ion()
# Import other modules if required
import os
import sys
import xml.etree.ElementTree as ET
from PIL import Image
import imageio
from scipy.misc import imresize
from pathlib import Path
import copy
import time
from IPython.display import clear_output as clr
# Can use other libraries as well

resnet_input = 224#size of resnet18 input images

In [312]:
# Choose your hyper-parameters using validation data
batch_size = 5
num_epochs = 5
learning_rate =  0.001
hyp_momentum = 0.9

## Build the data
Use the following links to locally download the data:
<br/>Training and validation:
<br/>http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
<br/>Testing data:
<br/>http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
<br/>The dataset consists of images from 20 classes, with detection annotations included. The JPEGImages folder houses the images, and the Annotations folder has the object-wise labels for the objects in one xml file per image. You have to extract the object information, i.e. the [xmin, ymin] (the top left x,y co-ordinates) and the [xmax, ymax] (the bottom right x,y co-ordinates) of only the objects belonging to the three classes(aeroplane, bottle, chair). For parsing the xml file, you can import xml.etree.ElementTree for you. <br/>
<br/> Organize the data as follows:
<br/> For every image in the dataset, extract/crop the object patch from the image one by one using their respective co-ordinates:[xmin, ymin, xmax, ymax], resize the image to resnet_input, and store it with its class label information. Do the same for training/validation and test datasets. <br/>
##### Important
You also have to collect data for an extra background class which stands for the class of an object which is not a part of any of the 20 classes. For this, you can crop and resize any random patches from an image. A good idea is to extract patches that have low "intersection over union" with any object present in the image frame from the 20 Pascal VOC classes. The number of background images should be roughly around those of other class objects' images. Hence the total classes turn out to be four. This is important for applying the sliding window method later.


In [313]:
trainval_path = "VOCtrainval_06-Nov-2007/VOCdevkit/VOC2007"
test_path = "VOCtest_06-Nov-2007/VOCdevkit/VOC2007"
anno = '/Annotations'
imgs = '/ImageSets'
jpeg = '/JPEGImages'
segc = '/SegmentationClass'
sego = '/SegmentationObject'
save = "/classes"
try:
    os.stat(trainval_path + save)
except:
    os.mkdir(trainval_path + save)
    
try:
    os.stat(test_path + save)
except:
    os.mkdir(test_path + save)

In [314]:
classes = ('__background__',
           'aeroplane',
           'bottle',
           'chair'
           )

class_labels = {
    '__background__' : 0,
    'aeroplane' : 1,
    'bottle' : 2,
    'chair' : 3,
}

In [315]:
def build_dataset(path):
    # Begin
    count = np.zeros((len(classes)))
    
    ## part 1 Image class based cropping and resizing
    for xml_path in os.listdir(path + anno):
        tree = ET.parse(path + anno + "/" + xml_path)
        image = Image.open(path + jpeg + "/" + tree.getroot()[1].text)
        image_arr = np.array(image)
        y0 = 0
        x0 = 0
        y1 = image_arr.shape[0]
        x1 = image_arr.shape[1]
        
        xout = 0
        xin  = 10000
        yout = 0
        yin  = 10000
        img_n_obj = 0
        
        for obj in tree.getroot().findall('object'):
            class_name = obj[0].text
            if(class_name in classes):
                img_n_obj+=1
                xmin = int(obj[4][0].text)
                ymin = int(obj[4][1].text)
                xmax = int(obj[4][2].text)
                ymax = int(obj[4][3].text)
                
                xout = max(xout, xmax)
                xin  = min(xin,  xmin)
                yout = max(yout, ymax)
                yin  = min(yin,  ymin)
                
                subimg = np.array(
                    Image.fromarray(
                        np.array(image)[ymin:ymax,xmin:xmax]
                    ).resize((resnet_input,resnet_input)))
                label = class_labels[class_name]
                try:
                    os.stat(path + save+"/"+str(label))
                except:
                    os.mkdir(path + save+"/"+str(label))
                subpath = path+save+"/"+str(label)+'/'+str(int(count[label]))+'.jpg'
                imageio.imwrite(subpath, subimg)
                count[label] += 1
           
        
         ## part 2 Image hard negative mining for negative class (Background)
        r1 = image_arr[yout:y1, :]
        r2 = image_arr[y0:yin, :]
        r3 = image_arr[: , x0:xin]
        r4 = image_arr[: , xout:x1]
        
        p1 = y1-yout
        p2 = yin-y0
        p3 = xin-x0
        p4 = x1-xout
        
        pvec = np.array([p1,p2,p3,p4])
        pvec = pvec/np.sum(pvec)
        
        bcount = 0
        infcount = 0
        while (bcount <= (img_n_obj//4) and infcount < 10):
            infcount += 1
            try:
                bcount += 1
                img = np.random.choice([r1,r2,r3,r4], p = pvec)

                a0 = np.random.randint(0,img.shape[1])
                a1 = min(np.random.randint(a0,img.shape[1])+96,img.shape[1])
                b0 = np.random.randint(0,img.shape[0])
                b1 = min(np.random.randint(b0,img.shape[1])+96,img.shape[0])
#                 b1 = np.random.randint(b0,img.shape[0])

                back_img = np.array(
                        Image.fromarray(
                            img[b0:b1, a0:a1]
                        ).resize((resnet_input,resnet_input)))

                label = class_labels["__background__"]
                try:
                    os.stat(path + save+"/"+str(label))
                except:
                    os.mkdir(path + save+"/"+str(label))
                back_path = path+save+"/"+str(label)+'/'+str(int(count[label]))+'.jpg'
                imageio.imwrite(back_path, back_img)
                count[label] += 1
            except:
                pass

In [316]:
# build_dataset(trainval_path)
# build_dataset(test_path)

In [326]:
class voc_dataset(torch.utils.data.Dataset): # Extend PyTorch's Dataset class
    def __init__(self, root_dir, train, transform=None):
        # Begin
        self.root = os.path.expanduser(root_dir)
        self.transform = transform
        self.train = train
        
        self.images = []
        self.labels = []
        
        folder_class = self.root + save
        for lbl in os.listdir(folder_class):
            folder_img = folder_class + "/" + lbl
            for img_name in os.listdir(folder_img):
                img_path = folder_img + "/" + img_name
                self.images.append(img_path)
                self.labels.append(int(lbl))
        
        
    def __len__(self):
        # Begin
        return len(self.labels)
        
    def __getitem__(self, idx):
       # Begin
        img = Image.open(self.images[idx]).convert('RGB')
        target = self.labels[idx]
        y_onehot = torch.tensor(target)
#         y_onehot = (np.arange(4) == y_onehot).astype(np.float32)
#         y_onehot = torch.from_numpy(y_onehot)
        
        if self.transform is not None:
            img = self.transform(img)
            
            
        return img,y_onehot
        pass

## Train the netwok
<br/>You can train the network on the created dataset. This will yield a classification network on the 4 classes of the VOC dataset. 

In [327]:
composed_transform = transforms.Compose([transforms.Resize((resnet_input,resnet_input)),
                                         transforms.RandomRotation(45),
                                         transforms.RandomHorizontalFlip(),
                                         transforms.ToTensor(),
                                        ])
train_dataset = voc_dataset(root_dir=trainval_path, train=True, transform=composed_transform) # Supply proper root_dir
test_dataset = voc_dataset(root_dir=test_path, train=False, transform=composed_transform) # Supply proper root_dir

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)


####

dataloaders = {
    'train' : train_loader,
    'test' : test_loader
}

In [329]:
from sklearn.utils import class_weight as cw

In [334]:
class_weights = cw.compute_class_weight('balanced',np.unique(train_dataset.labels), train_dataset.labels)

In [343]:
c_weights = torch.tensor(class_weights).float()

### Fine-tuning
Use the pre-trained network to fine-tune the network in the following section:

In [319]:
resnet18 = models.resnet18(pretrained=True)
resnet18.fc = nn.Linear(resnet18.fc.in_features, 4)
model = nn.Sequential(resnet18, nn.Softmax(1))
# Add code for using CUDA here
device = 'cpu'

In [344]:
criterion = nn.CrossEntropyLoss(weight=c_weights)
# Update if any errors occur
optimizer = optim.SGD(resnet18.parameters(), learning_rate, hyp_momentum)
scheduler = lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)

In [345]:
summary(model, (3,224,224))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 112, 112]           9,408
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              ReLU-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
            Conv2d-5           [-1, 64, 56, 56]          36,864
       BatchNorm2d-6           [-1, 64, 56, 56]             128
              ReLU-7           [-1, 64, 56, 56]               0
            Conv2d-8           [-1, 64, 56, 56]          36,864
       BatchNorm2d-9           [-1, 64, 56, 56]             128
             ReLU-10           [-1, 64, 56, 56]               0
       BasicBlock-11           [-1, 64, 56, 56]               0
           Conv2d-12           [-1, 64, 56, 56]          36,864
      BatchNorm2d-13           [-1, 64, 56, 56]             128
             ReLU-14           [-1, 64,

In [346]:
#One Layer Detection
def train(model, criterion, optimizer, scheduler, num_epochs = 100):
    # Begin
    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0
    
    for epoch in range(num_epochs):
        start = time.time()
#         print('Current Epoch :', epoch, 'remaining :', num_epochs - epoch)
              
        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                scheduler.step()
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0
            
            count = 0
            
            lnh = dataloaders[phase].__len__()
            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                
                print('Current Epoch :', epoch, 'remaining :', num_epochs - epoch)
                print(phase + "ing on batch no :", count, "out of", lnh)
                count+=1
                clr(wait=True)
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(
                phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

    time_elapsed = time.time() - start
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model

In [347]:
%time train(model, criterion, optimizer, scheduler)

KeyboardInterrupt: 

In [268]:
#Two Layer Detection (SSD)
def train():
    # Begin
    pass

In [269]:
%time train()

CPU times: user 8 µs, sys: 1 µs, total: 9 µs
Wall time: 15.3 µs


# Testing and Accuracy Calculation
For applying detection, use a slding window method to test the above trained trained network on the detection task:<br/>
Take some windows of varying size and aspect ratios and slide it through the test image (considering some stride of pixels) from left to right, and top to bottom, detect the class scores for each of the window, and keep only those which are above a certain threshold value. There is a similar approach used in the paper -Faster RCNN by Ross Girshick, where he uses three diferent scales/sizes and three different aspect ratios, making a total of nine windows per pixel to slide. You need to write the code and use it in testing code to find the predicted boxes and their classes.

In [None]:
def sliding_window():
    # Begin

Apply non_maximum_supression to reduce the number of boxes. You are free to choose the threshold value for non maximum supression, but choose wisely [0,1].

In [None]:
def non_maximum_supression(boxes,threshold = 0.3):
    # 

Test the trained model on the test dataset.

In [None]:
#One Layer Detection
def test(resnet18):
    # Write loops for testing the model on the test set
    # Also print out the accuracy of the model

In [339]:
%time test(resnet18)

NameError: name 'test' is not defined

In [None]:
#Two Layer Detection
def test(resnet18):
    # Write loops for testing the model on the test set
    # Also print out the accuracy of the model

In [None]:
%time test(resnet18)