### TRANSFER LEARNING IN DeepDriving DATASET

Along this notebook, we will use different backbones and pretrained models (FasterRCNN and RetinaNet) to do transfer learning in our dataset. In particular, it is done following the steps:

1. Download pretrained model and backbone
2. Train this model again using DeepDriving data so the models learn the new labels and specific features of our dataset
3. Do inference with these models
4. Compare results


First of all, we will download the data that will be used and the packages needed to run the whole code

In [1]:
!rm -rf DeepDriving
!rm -rf predictions

In [2]:
!wget https://github.com/hemahecodes/AIDL_SelfDrivingProject/raw/dev/transfer_learning/data/deepdriving.zip
!unzip deepdriving.zip > /dev/null

--2022-03-15 18:45:39--  https://github.com/hemahecodes/AIDL_SelfDrivingProject/raw/dev/transfer_learning/data/deepdriving.zip
Resolving github.com (github.com)... 140.82.114.4
Connecting to github.com (github.com)|140.82.114.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/hemahecodes/AIDL_SelfDrivingProject/dev/transfer_learning/data/deepdriving.zip [following]
--2022-03-15 18:45:39--  https://raw.githubusercontent.com/hemahecodes/AIDL_SelfDrivingProject/dev/transfer_learning/data/deepdriving.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8344272 (8.0M) [application/zip]
Saving to: ‘deepdriving.zip.2’


2022-03-15 18:45:39 (58.7 MB/s) - ‘deepdriving.zip.2’ saved [8344272/8344272]



In [3]:
#Importing needed packages
import torch
import torchvision
from torchvision.transforms.functional import to_tensor
from torchvision.transforms.functional import to_pil_image
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
from torchvision import transforms as T
import os
import json
from PIL import Image, ImageDraw
from torchvision import transforms
from os import listdir
import random
import numpy as np


#If we have GPU, we will use it. Otherwise, not.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

As a very first step, we will define the class of DeepDrivingDataset in order to correctly read the data and train the pretrained models.

In [4]:
#We define a class for Berkeley Deep Driving dataset. This class will be specific for training the model because it is in the format that the pretrained model needs.
class DeepDrivingDataset(object):
    label2idx = {"other vehicle": 0,"person": 1,"traffic light": 2,"traffic sign": 3,"truck": 4,"train": 5,"other person": 6,"bus": 7,"car": 8,"rider": 9, "motor": 10, "bike": 11, "trailer": 12}
    def __init__(self, train = True):
        # load all image files, sorting them to
        # ensure that they are aligned
        self.train = train
        if self.train:
          self.img_dir = os.path.join("DeepDriving","train") #use os.path.join
        else:
          self.img_dir = os.path.join("DeepDriving","val")
        json_file = os.path.join(self.img_dir, "labels_TL.json")
        with open(json_file) as f:
          imgs_anns = json.load(f)

        self.imgs = []
        self.annotations = []
        for idx, v in enumerate(imgs_anns.values()):
          filename = os.path.join(self.img_dir, v["name"])
          self.imgs.append(filename)
          self.annotations.append(v["labels"])

    def __getitem__(self, idx):
        # load images
        img_path = self.imgs[idx]
        img = Image.open(img_path).convert("RGB")

        # get bounding box coordinates for each object detected
        boxes = []
        categories = []
        for labels in self.annotations[idx]:
          if 'box2d' in labels:
            annotation = labels['box2d']
            lab = labels['category']
            categories.append(self.label2idx[lab])
            #select the corners of the boxes for each axis. it should be a list with 4 values: 2 coordinates.
            boxes.append([annotation["x1"],annotation["y1"],annotation["x2"],annotation["y2"]]) 
          else:
            continue
          
        # convert everything into a torch.Tensor
        boxes = torch.as_tensor(boxes, dtype=torch.float32, device=device)
        boxes.to(device)
        labels = torch.tensor(categories, dtype=torch.int64, device=device)
        labels.to(device)

        target = {}
        target["boxes"] = boxes
        target["labels"] = labels

        img = to_tensor(img).to(device)
        
        return img, target

    def __len__(self):
        return len(self.imgs)
def collate_fn(batch):
    images = []
    targets = []
    for i, t in batch:
        images.append(i)
        targets.append(t)
    return images, targets



For Faster RCNN, there are 3 different backbones available:
1. MobileNetV3
2. ResNet50
3. MobileNetV3-320

So, taking this parameter into account, we will be able to define our **model**. It is important to specify the number of classes (13), so the model is adapted to our data.

In [16]:
#backb will be the backbone used, we will start with MobileNetV3 (backb = 1)
#num_epochs will be the number of epochs that we want to use to train the model with DeepDriving data

backb = 3
num_classes = 13
num_epochs = 20
if backb == 1:
  backbone = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_fpn(pretrained=True)
  backb_used = "MobileNetV3"
elif backb == 2:
  backbone = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
  backb_used = "ResNet 50"
elif backb == 3:
  backbone = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_320_fpn(pretrained=True)
  backb_used = "MobileNetV3-320"

# Now, we can define our model
# Function that will give us the model
def get_model_object_detection(num_classes):
    # load an object detection model pre-trained on COCO
    model = backbone
    # get number of input features for the classifier
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # replace the pre-trained head with a new one
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

    return model 

# Get the model using our helper function
model = get_model_object_detection(num_classes)
model = model.to(device) # move model to the right device


We have already finished the first step (defining the pretrained model adapted to our dataset), so now we are going to train it with DeepDriving Dataset. 

In order to do that, we are going to define the **training** loop which will be very easy doing:
1. Set optimizer zero grad
2. Save loss from model
3. Perform backpropagation
4. Do an step of the optimizer

In [17]:
def train(data_train):
  for batch_idx, (img_data, target_data) in enumerate(data_train):
      optimizer.zero_grad()
      loss_dict = model(img_data, target_data)
      loss = sum(loss for loss in loss_dict.values())
      loss.backward()
      optimizer.step()
      if batch_idx%2 == 0:
          loss_dict_printable = {k: f"{v.item():.2f}" for k, v in loss_dict.items()}
          print(f"[{batch_idx}/{len(data_train)}] loss: {loss_dict_printable}")


The **evaluation** loop will be a little bit more complicated.

In this one, we are going to compute the average precision for each epoch. In order to do that we should keep in mind that the average precision is defined as the area beyond the precision-recall curve. So, the steps followed to do this computations are:

1. Identify the different boxes predicted and use Non-Maximum-Supresion (nms) with an IoU self-defined (0.2 for example). On this step, the idea is remove the predicted bounding boxes that are overlapping with other predicted bboxes.
2. After that, we will loop through every categories and for each category we will:

  2.1. Compare the predicted bounding boxes of these categories with the GT ones. If the IoU is higher than our threshold, we will have a TP, otherwise, we will have a FP

  2.2. Once a GT box is used (a bbox is used when it has the maximum IoU with a predicted bbox) it cannot be used again, so it should be discarded for the following comparisons

  2.3. Finally we will sum up all the TP and all the FP on other side and we will then compute precision and recall

In [18]:
def evaluate(data_test):
  category_list = ["other vehicle", "person", "traffic light", "traffic sign","truck", "train", "other person", "bus", "car", "rider", "motor","bike", "trailer"]
  # Defining hyperparameters:
  hparams = {
      'num_epochs': 10,
      'batch_size': 5,
      'channels': 3,
      'learning_rate': 0.0001,
      'classes': len(category_list),
      'nsamples': 25000,
      'grid_size': 14
  }
  label2idx = {"other vehicle": 0,"person": 1,"traffic light": 2,"traffic sign": 3,"truck": 4,"train": 5,"other person": 6,"bus": 7,"car": 8,"rider": 9, "motor": 10, "bike": 11, "trailer": 12}
  idx2label = {v: k for k, v in label2idx.items()}
  iou_threshold = 0.2
  score_threshold = 0.4
  total_AP = []
  print("DATA IS BEING VALIDATED FOR A NEW EPOCH")
  print("")
  img_number = 0
  for batch_idx, (img_data, target_data) in enumerate(data_test):
      # img_data = img_data.to(device) #Image loaded, converted as a tensor and resized to 448x448
      # target_data = target_data.to(device) #Labels
      prediction = model(img_data)
      epsilon = 1e-6
      
      for i in range(hparams['batch_size']):
          precisions = [0]*len(category_list)
          recalls = [0]*len(category_list)
          im_AP = []
          im = to_pil_image(img_data[i])
          draw = ImageDraw.Draw(im)
          classes_target = target_data[i]["labels"]
          boxes_target = target_data[i]["boxes"]
          total_boxes_target = len(boxes_target) #Total quantity of bboxes on the GT
          true_boxes_used = torch.zeros(total_boxes_target) #We will be checking each bbox used (used means compared with a bbox detected)
          true_boxes_counted = torch.zeros(total_boxes_target) #Needed for defining the total number of bbox of a specific class in GT
          keep_idx = torchvision.ops.nms(prediction[i]['boxes'], prediction[i]['scores'], iou_threshold) #Performs non-maximum suppression (NMS) on the boxes according to their IoU
          #We keep only the predicted bboxes, sxores and labels that we obtain after NMS
          boxes = [b for i, b in enumerate(prediction[i]["boxes"]) if i in keep_idx] 
          scores = [s for i, s in enumerate(prediction[i]["scores"]) if i in keep_idx]
          labels = [l for i, l in enumerate(prediction[i]["labels"]) if i in keep_idx]
          #Loop by classes in order to compute TP, FP, recall, precision per class
          for c in range(len(category_list)):
              boxes_pred = []
              scores_pred = []
              for l in range(len(boxes)):
                  if labels[l] == c and scores[l] > score_threshold:
                      #Resizing the predictions so they are not on images (448,448) but on the real size
                      x1_pred = boxes[l][0]
                      x2_pred = boxes[l][2]
                      y1_pred = boxes[l][1]
                      y2_pred = boxes[l][3]
                      box_pred = torch.Tensor([x1_pred,y1_pred, x2_pred, y2_pred])
                      boxes_pred.append(box_pred)
                      scores_pred.append(scores[l])
              
              #Each prediction will be a True Positive or a False Positive
              TP = torch.zeros((len(boxes_pred)))
              FP = torch.zeros((len(boxes_pred)))
              total_boxes_target_class = 0
              
              #We loop over the boxes predicted
              for det_idx, p in enumerate(boxes_pred):
                  iou_max = 0
                  #For each box predicted, we will look for the best (highest IoU) GT box and then GT box will be checked as used.
                  for idx, t in enumerate(boxes_target):
                      if classes_target[idx] == c:
                          if true_boxes_counted[idx] == 0:
                              total_boxes_target_class = total_boxes_target_class + 1
                              true_boxes_counted[idx] = 1
                          x1 = torch.max(t[0], p[0])
                          y1 = torch.max(t[1], p[1])
                          x2 = torch.min(t[2], p[2])
                          y2 = torch.min(t[3], p[3])
                          # .clamp(0) is for the case when they do not intersect
                          intersection = (x2 - x1).clamp(0) * (y2 - y1).clamp(0)

                          box1_area = abs((t[2] - t[0]) * (t[3] - t[1]))
                          box2_area = abs((p[2] - p[0]) * (p[3] - p[1]))

                          iou = intersection / (box1_area + box2_area - intersection + 1e-6)

                          if iou >= iou_max:
                              iou_max = iou
                              true_index = idx
                  #If the maximum IoU is greater than the threshold and the GT bbox is not used yet, we have a TP
                  if iou_max > iou_threshold:
                      if true_boxes_used[idx] == 0:
                          TP[det_idx] = 1
                          true_boxes_used[true_index] = 1
                          coords = p.cpu().tolist()
                          draw.rectangle(coords, width = 3, outline = "blue") 
                          text = f"{idx2label[c]} {scores_pred[det_idx]*100:.2f}%"
                          draw.text([coords[0], coords[1]-15], text)
                      else:
                          FP[det_idx] = 1
                          coords = p.cpu().tolist()
                          draw.rectangle(coords, width = 3, outline = "blue") 
                          text = f"{idx2label[c]} {scores_pred[det_idx]*100:.2f}%"
                          draw.text([coords[0], coords[1]-15], text)
                  else:
                      FP[det_idx] = 1
                      coords = p.cpu().tolist()
                      draw.rectangle(coords, width = 3, outline = "blue") 
                      text = f"{idx2label[c]} {scores_pred[det_idx]*100:.2f}%"
                      draw.text([coords[0], coords[1]-15], text)
              if total_boxes_target_class == 0:
                  continue
              else:
                  #Sum of all TP and FP to compute recall and precision for each class
                  TP_cumsum = torch.cumsum(TP, dim = 0)
                  FP_cumsum = torch.cumsum(FP, dim = 0)
                  recalls = TP_cumsum / (total_boxes_target_class + epsilon)
                  precisions = torch.divide(TP_cumsum, (TP_cumsum + FP_cumsum + epsilon))
                  precisions = torch.cat((torch.tensor([1]), precisions))
                  recalls = torch.cat((torch.tensor([0]), recalls))
                  #Average precision is the area under the curve of the precision-recall (we approximate that with trapezoide rule)
                  im_AP.append(torch.trapz(precisions, recalls))
          if len(im_AP) == 0:
            continue
          for GTbox in boxes_target:
              coords = GTbox.cpu().tolist()
              draw.rectangle(coords, width = 3, outline = "green") 
              # text = f"{idx2label[c]} {scores_pred[det_idx]*100:.2f}%"
              # draw.text([coords[0], coords[1]-15], text)

          print("Average precision of this image: ", sum(im_AP) / len(im_AP))
          img_name = "predictions/prediction_" + str(img_number) + ".png"
          im = im.save(img_name)
          img_number = img_number + 1
          total_AP.append(sum(im_AP) / len(im_AP))

  
  print("Mean Average precision of this epoch: ", np.mean(total_AP) )                        


And now that we have already defined the training and evaluating loops, we can use them on the main loop in order to have a control of the loss and mAP epoch by epoch.

In [19]:
# Optimizer used: SGD
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005, momentum=0.9, weight_decay=0.0005)
# LR scheduler
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.1)

def train_eval(json_train, img_train, jsons_p_val, imgs_p_val):
    category_list = ["other vehicle", "person", "traffic light", "traffic sign","truck", "train", "other person", "bus", "car", "rider", "motor","bike", "trailer"]
    # Defining hyperparameters:
    hparams = {
        'num_epochs': 10,
        'batch_size': 5,
        'channels': 3,
        'learning_rate': 0.0001,
        'classes': len(category_list),
        'nsamples': 25000,
        'grid_size': 14
    }
    use_gpu = True
    data_train = DeepDrivingDataset(train=True)
    training_dataloader = torch.utils.data.DataLoader(data_train, batch_size=hparams['batch_size'], num_workers=0, collate_fn=collate_fn)

    data_test = DeepDrivingDataset(train=False)
    testing_dataloader = torch.utils.data.DataLoader(data_test, batch_size=hparams['batch_size'], num_workers=0, collate_fn=collate_fn)
    
    for epoch in range(hparams['num_epochs']):


        total_AP_test = []
        print("")
        # data_train.LoadFiles()  # Resets the Training DataLoader for a new epoch
        # data_test.LoadFiles()  # Resets the Validation DataLoader for a new epoch

        model.train()
        train(training_dataloader)
        model_name = "models/Pretrained FasterRCNN with MobileNetv3 - 360.pth"
        torch.save({'model_state_dict': model.state_dict()}, model_name)
        model.eval()
        evaluate(testing_dataloader)

train_eval("DeepDriving/train/labels_TL.json", "DeepDriving/train/", "DeepDriving/val/labels_TL.json", "DeepDriving/val/")



[0/25] loss: {'loss_classifier': '2.65', 'loss_box_reg': '0.37', 'loss_objectness': '0.14', 'loss_rpn_box_reg': '0.20'}
[2/25] loss: {'loss_classifier': '1.00', 'loss_box_reg': '0.25', 'loss_objectness': '0.08', 'loss_rpn_box_reg': '0.09'}
[4/25] loss: {'loss_classifier': '0.87', 'loss_box_reg': '0.29', 'loss_objectness': '0.12', 'loss_rpn_box_reg': '0.12'}
[6/25] loss: {'loss_classifier': '0.78', 'loss_box_reg': '0.27', 'loss_objectness': '0.10', 'loss_rpn_box_reg': '0.16'}
[8/25] loss: {'loss_classifier': '0.60', 'loss_box_reg': '0.37', 'loss_objectness': '0.12', 'loss_rpn_box_reg': '0.29'}
[10/25] loss: {'loss_classifier': '0.51', 'loss_box_reg': '0.30', 'loss_objectness': '0.10', 'loss_rpn_box_reg': '0.13'}
[12/25] loss: {'loss_classifier': '0.55', 'loss_box_reg': '0.42', 'loss_objectness': '0.14', 'loss_rpn_box_reg': '0.19'}
[14/25] loss: {'loss_classifier': '0.49', 'loss_box_reg': '0.33', 'loss_objectness': '0.13', 'loss_rpn_box_reg': '0.16'}
[16/25] loss: {'loss_classifier': '0

KeyboardInterrupt: ignored