<a href="https://colab.research.google.com/github/sayan0506/SSD-Custom-Object-Detection-Using-Pytorch/blob/main/SSD_Custom_Object_Detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Implement SSD for Object Detection using Pytorch**

Here, in this notebook we will try to implement SSD(Single Shot multi-box Detection) algorithm for object detection, this code is inspired from the book - "[Modern Computer Vision with PyTorch](https://www.packtpub.com/product/modern-computer-vision-with-pytorch/9781839213472)". FOr this experiment, we will try to utilize the object detection from image size 300, thus we will use SSD300 variant of the SSD series

## **Install Dependencies**

* [torch_snippets](https://github.com/sizhky/torch_snippets) - contains the utility functions for simple supporting tasks regardinfg tor ch implementations
* [How to run wget quietly](http://oliviertech.com/linux/how-to-run-wget-quietly/#:~:text=The%20wget%20command%20is%20used,are%20writtent%20in%20the%20output.) - wget function is used to download files from given url, and "wget -q", here -q parameter helps to download from wget quietly, so that the output/download status will not print in the console/terminal

In [1]:
# installs torch_snippets and utils quietly
!pip install -q torch_snippets

[?25l[K     |███████▋                        | 10 kB 23.4 MB/s eta 0:00:01[K     |███████████████▏                | 20 kB 25.3 MB/s eta 0:00:01[K     |██████████████████████▉         | 30 kB 11.5 MB/s eta 0:00:01[K     |██████████████████████████████▍ | 40 kB 9.1 MB/s eta 0:00:01[K     |████████████████████████████████| 43 kB 867 kB/s 
[K     |████████████████████████████████| 57 kB 4.3 MB/s 
[K     |████████████████████████████████| 56 kB 4.3 MB/s 
[K     |████████████████████████████████| 212 kB 33.1 MB/s 
[K     |████████████████████████████████| 10.1 MB 39.7 MB/s 
[K     |████████████████████████████████| 51 kB 6.6 MB/s 
[?25h

## **Import Libraries**

In [2]:
import os, collections
from torch_snippets import *
import torch
from torchvision import transforms
import pandas as pd
import matplotlib.pyplot as plt
from PIL import Image
import glob

from sklearn.model_selection import train_test_split
from torch.utils.data import DataLoader

#### **Environment setup**

Checks whether the torch uses the gpu or not, and the currrent device assign and checks device properties

In [3]:
# asign device
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
#print(f'Available devices {torch.cuda.get_}')
# torch.cuda.current_device() returns device id of the current device from available list
# torch.cuda.get_device_name(id) returns the device name corresponding to the id
print(f'Current device - {torch.cuda.get_device_properties(torch.cuda.current_device())}')

Current device - _CudaDeviceProperties(name='Tesla K80', major=3, minor=7, total_memory=11441MB, multi_processor_count=13)


## **Data Download**

* Download the bus-truck images dataset for object detection from dropbox link.
* Clone ssd utils - [SSD-utils](https://github.com/sizhky/ssd-utils/)

In [4]:
# project folder
project_path = 'open-images-bus-trucks'

# checks whether the data is downloaded or not 
if not os.path.exists(project_path):
  # download the tar.xz zip file from dropbox
  !wget --quiet https://www.dropbox.com/s/agmzwk95v96ihic/\
open-images-bus-trucks.tar.xz
  # extract the tar file
  !tar -xf open-images-bus-trucks.tar.xz
  # remove the zip file to  save space
  !rm open-images-bus-trucks.tar.xz
  # clone the SSD utils repo
  !git clone https://github.com/sizhky/ssd-utils/

Cloning into 'ssd-utils'...
remote: Enumerating objects: 9, done.[K
remote: Counting objects: 100% (9/9), done.[K
remote: Compressing objects: 100% (8/8), done.[K
remote: Total 9 (delta 0), reused 0 (delta 0), pack-reused 0[K
Unpacking objects: 100% (9/9), done.


## **Data Preperation**

Data investigation is done here, create dataframes and check the available images and annotation files|

In [5]:
# define data folder paths
data_root = os.path.join('/content/', project_path)
image_root = os.path.join(data_root, 'images')

# load the csv file containing data info in a pandas dataframe
# raw dataframe
df_raw = pd.read_csv(os.path.join(data_root, 'df.csv'))
# copy the dataset
df = df_raw.copy()

**Checking duplicate samples**


In [6]:
# checking the duplicate imageid
df_d = df[df.duplicated('ImageID')]
df_d

Unnamed: 0,ImageID,Source,LabelName,Confidence,XMin,XMax,YMin,YMax,IsOccluded,IsTruncated,IsGroupOf,IsDepiction,IsInside,XClick1X,XClick2X,XClick3X,XClick4X,XClick1Y,XClick2Y,XClick3Y,XClick4Y
2,00006bdb1eb5cd74,xclick,Truck,1,0.702500,0.999167,0.204261,0.409774,1,1,0,0,0,0.849167,0.702500,0.906667,0.999167,0.204261,0.398496,0.409774,0.295739
8,0004d5a9dd44ab6a,xclick,Truck,1,0.094375,0.897500,0.147014,0.934150,0,0,0,0,0,0.365000,0.094375,0.333750,0.897500,0.147014,0.609495,0.934150,0.822358
9,0004d5a9dd44ab6a,xclick,Truck,1,0.860625,0.999375,0.249617,0.390505,1,0,0,0,0,0.992500,0.999375,0.860625,0.921250,0.249617,0.294028,0.325421,0.390505
15,0007eeeabf3c5e5c,xclick,Bus,1,0.620625,0.999375,0.406667,0.647778,1,1,0,0,0,0.920000,0.999375,0.871875,0.620625,0.406667,0.458889,0.647778,0.432222
19,000924a411c24d25,xclick,Bus,1,0.000000,0.086875,0.429644,0.702627,1,1,0,0,0,0.033125,0.086875,0.008750,0.000000,0.429644,0.632270,0.702627,0.636023
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
24043,ffc0475ac3f403de,xclick,Truck,1,0.626250,0.999375,0.245779,0.999062,1,0,0,0,0,0.826250,0.626250,0.999375,0.999375,0.245779,0.814259,0.999062,0.999062
24045,ffc67982d4790275,xclick,Bus,1,0.456277,0.999134,0.392991,0.735920,0,1,0,0,0,0.568831,0.456277,0.669264,0.999134,0.392991,0.563204,0.735920,0.486859
24050,ffd1093b9f7d3e13,xclick,Bus,1,0.942500,0.999375,0.468105,0.768293,0,1,0,0,0,0.995625,0.942500,0.978750,0.999375,0.468105,0.608818,0.768293,0.583490
24059,fff376d20410e4c9,xclick,Bus,1,0.348125,0.701250,0.423333,0.744167,0,0,0,1,0,0.478750,0.493125,0.348125,0.701250,0.423333,0.744167,0.537500,0.523333


**Define targets**

In [7]:
# this ensures whether the image ids are within unique imageid
df = df[df['ImageID'].isin(df['ImageID'].unique().tolist())]
# create target dictionary from unique labels
label2target = {l:t+1 for t,l in enumerate(df_raw['LabelName'].unique())}
# define additional target 'background' having id  = 0, used in SSD(-ve class)
label2target['background'] = 0
# define target2label dict
target2label = {t:l for l,t in label2target.items()}

num_classes = len(label2target)

print(f'Dataset contains total {num_classes} classes, having below categories {target2label}')
print(label2target)

Dataset contains total 3 classes, having below categories {1: 'Bus', 2: 'Truck', 0: 'background'}
{'Bus': 1, 'Truck': 2, 'background': 0}


## **Data Pre-processing**

Define functions for data pre-processing
* In norm we normalize the individual channels by specifying individual means
* 0.485, 0.456, 0.406 and 0.229, 0.224, 0.225 std for R,G,B channels respectively
* normalize formula : value_norm = (value - mean)/std
* denormalize(return to original raw value : 
val_dnorm = (value_norm/(1/std) - ((-mean/std)/(1/std)) = (value_norm*std - mean)
* Generaly img tensor is fetched by torch as (H, W, C), but for torchvision operation and NN receives images of shape (C, H, W)
* torch.permute() does this operation

In [8]:
# normalize 
norm = transforms.Normalize(
    mean = [0.485, 0.456, 0.406],
    std = [0.229, 0.224, 0.225]
)

# de-normalization function
de_norm = transforms.Normalize(
    mean = [-0.485/0.229, -0.456/0.224, -0.406/0.255],
    std = [1/0.229, 1/0.224, 1/0.255]
)

# define data pre-processing function
def preprocess_img(img):
  # reshape the img tensor
  img = torch.tensor(img).permute(2,0,1)
  img = normalize(img)
  return img.to(device)

**Define torch dataset**

In [110]:
class OpenDataset(torch.utils.data.Dataset):
  # set the resolutions of the dataset to 300
  w,h = 300, 300
  def __init__(self, df, img_dir = image_root):
    self.img_dir = img_dir
    self.files = glob.glob(self.img_dir+'/*')
    self.df = df
    self.img_infos = df.ImageID.unique()
    # helps to log the info to the terminal or console
    logger.info(f'{len(self.img_infos)} items loaded!')
  def __getitem__(self, ix):
    # load images and masks
    # loads image ids from infos fetched from df
    img_id = self.img_infos[ix]
    # find path from glob paths corresponding to img ids
    img_path = find(img_id, self.files)
    img = Image.open(img_path).convert("RGB")
    # convert images to np.array with resized shape using Bicubic interpolation
    img = np.array(img.resize((self.w, self.h), 
                   resample = Image.BICUBIC))/255.
    # now fetch row info correponding to te image id
    data = self.df[self.df['ImageID'] == img_id]
    # fetch list of labels correwsponding to that image id bcz of multi-object
    labels = data['LabelName'].values.tolist()
    # fetch the bbox regression infos
    data = data[['XMin', 'YMin', 'XMax', 'YMax']].values
    # multiply width to normalized xmin and xmax col
    data[:, [0,2]] *= self.w
    # multiply height to normalized ymin and ymax col
    data[:, [1,3]] *= self.h
    # define box parameter
    boxes = data.astype(np.uint32).tolist()
    # convert to abosolute coordinate
    img = img.astype(np.float32)
    return img, boxes, labels
  # collate fn helps to create batches
  def collate_fn(self, batch):
    images, boxes, labels = [], [], []
    # batch contain batch of data
    # each sample corresponding to batch contain img array, label, bbox
    # each sample is called here as item
    for item in batch:
      img, img_boxes, img_labels = item
      # that None index adds 1 dimension at the start of the shape
      #(3,300,300) to (1, 3, 300, 300)
      # else the batch shape will be like (4*3,300,300)
      img = preprocess_img(img)[None]
      images.append(img)
      # we normalize every box parameters to 300
      # as resize img dimension is 300
      boxes.append(torch.tensor(img_boxes).float().to(device)/300.)
      # pic labels class from labelstotarget list
      labels.append(torch.tensor([label2target[c] for c in img_labels]).long().to(device))
    # convert images list to torch tensor and load to device
    images = torch.cat(images).to(device)  
    return images, boxes, labels
    # in torch the dataset fn makes the __len__ callable with the object of dataset class
    # also the getitem is callable with object
    # so no need to call the getitem seperately after declaring the dataset object
  def __len__(self):
    return len(self.img_infos)


#### **Define train, validation, test split**

* We used split ratio of 0.1 for test data
* Splitted the image ids, and based on that fetched the coresponding info from dataframe for train adn test

In [103]:
# fetch train and valid ids
train_ids, valid_ids = train_test_split(df.ImageID.unique(), test_size = 0.2, random_state = 99)
# fetch valid and test
valid_ids, test_ids = train_test_split(df.ImageID.unique(), test_size = 0.5, random_state = 48)

# fetch info from df
train_df, valid_df, test_df = df[df['ImageID'].isin(train_ids)],df[df['ImageID'].isin(valid_ids)],df[df['ImageID'].isin(test_ids)]

print('Train, valid, Test info-')
print(f'Train- {train_df.shape}')
print(f'Valid- {valid_df.shape}')
print(f'Test- {test_df.shape}')

Train, valid, Test info-
Train- (19207, 21)
Valid- (12008, 21)
Test- (12054, 21)


#### **Obtain the Dataset and Dataloader**

In [116]:
batch_size = 32

In [117]:
# obtain train, valid, test dataset
train_ds = OpenDataset(train_df)
valid_ds = OpenDataset(valid_df)
test_ds = OpenDataset(test_df)

# dataloader helps to load the dataset in the device/gpu here
train_dl = DataLoader(train_ds, batch_size=batch_size, 
                      # we pass the fn name in the argument so as to print that ewhile dataloader is in use
                      # else it would ask for 
                      collate_fn = train_ds.collate_fn,# helps to process the batch items inside the loader
                      drop_last = True # drops the last incomplete batch if is set to true
                      )

valid_dl = DataLoader(valid_ds, batch_size=batch_size, 
                      collate_fn = valid_ds.collate_fn,# helps to process the batch items inside the loader
                      drop_last = True # drops the last incomplete batch if is set to true
                      )

test_dl = DataLoader(test_ds, batch_size=batch_size, 
                      collate_fn = test_ds.collate_fn,# helps to process the batch items inside the loader
                      drop_last = True # drops the last incomplete batch if is set to true
                      )

2021-11-02 21:17:12.638 | INFO     | __main__:__init__:10 - 12180 items loaded!
2021-11-02 21:17:12.685 | INFO     | __main__:__init__:10 - 7612 items loaded!
2021-11-02 21:17:12.731 | INFO     | __main__:__init__:10 - 7613 items loaded!


## **Define SSD300 Model requisities**

Import SSD300 model, Multibox loss, Detection script

In [118]:
# navigate to ssdutils path
%cd /content/ssd-utils

# import ssd dependencies
from model import SSD300, MultiBoxLoss
from detect import *

/content/ssd-utils


**Initialize Model and Loss functions objects**

In [119]:
# model object
model = SSD300(num_classes, device)

# model loss criterion
# model.priors_cxcy returns the loss predicted values
criterion = MultiBoxLoss(priors_cxcy= model.priors_cxcy, device = device)


Loaded base model.





## **Model Training**

Define model training functions for custom data

**Define Training Metadata**

* weight decay is related to L2 regularization(lambda constant for L2)

In [120]:
# epochs
n_epochs = 10

# define optimizer
# model.parameters() methods returns the model parameters
optimizer = torch.optim.Adam(model.parameters(), lr = 1e-04, weight_decay= 1e-05)

# define training log object
log = Report(n_epochs=n_epochs)
logs_to_print = 5

#### **Define Training methods**

In [121]:
# define train methods
def train_batch(inputs, model, criterion, optimizer):
  # open model in train mode so tha the dropout and all the training aspects are active
  model.train()
  n = len(train_dl)
  images, boxes, labels = inputs
  # predictions regression parameters, labels
  _regr, _clss = model(images)
  # evaluate loss for both boxes and labels
  loss = criterion(_regr, _clss, boxes, labels)
  # make the gradients variables to zero, so that it can obtain gradients based on current loss
  optimizer.zero_grad()
  # backprop to obtain the gradients
  loss.backward()
  # update the parameters based using the optimizer
  optimizer.step()
  return loss

## validation method
# this decorator initializes that the method no grad calculation will be done using torch utilities while computing val_loss
#@torch.no_grad()
def val_batch(inputs, model, criterion, optimizer):
  # update model parameters based on validation data too
  model.train()
  n = len(valid_dl)
  images, boxes, labels = inputs
  _regr, _label = model(images)
  loss = criterion(_regr, _label, _boxes, labels)
  optimizer.zero_grad()
  loss.backward()
  optimizer.step()
  return loss

**Train the model**

In [None]:
# train and validation loop
for epoch in range(n_epochs):
  # when we call train loader that will fetch data using train dataset
  _n = len(train_dl)
  for ix, inputs in enumerate(train_dl):
    loss = train_batch(inputs, model, criterion, optimizer)
    # calculate training step in epoch or position of training
    pos = (epoch + (ix+1)/_n)
    log.record(pos, trn_loss = loss.item(), end = '\r')

  # validation step
  for ix, inputs in enumerate(valid_dl):
    val_loss = val(inputs, model, criterion, optimizer)
    pos = (epoch + (ix+1)/_n)
    log.record(pos, val_loss = val_loss.item(), end = '\r')

EPOCH: 0.053	trn_loss: 5.478	(232.95s - 44027.65s remaining)