<a href="https://colab.research.google.com/github/rahiakela/pytorch-computer-vision-cookbook/blob/main/5-multi-object-detection/multi_object_detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Multi-Object Detection

Object detection is the process of locating and classifying existing objects in an image. Identified objects are shown with bounding boxes in the image. There are two methods for general object detection: region proposal-based and regression/classification-based. 

In this notebook, we will use a regression/classification-based method called YOLO.we will learn how to implement the YOLO-v3 algorithm and train and
deploy it for object detection using PyTorch.


## Setup

In [1]:
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms.functional as TF
from torchvision.transforms.functional import to_pil_image
from torch import optim
from torch.optim.lr_scheduler import ReduceLROnPlateau


import torch
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(torch.__version__)

from PIL import Image, ImageDraw, ImageFont

import copy
import os
import random
import numpy as np
import matplotlib.pylab as plt

%matplotlib inline

1.7.0+cu101


## Creating datasets

We will need to download the COCO dataset.

In [None]:
%%shell

# Download the following GitHub repository
git clone https://github.com/pjreddie/darknet

# Create a folder named data
mkdir data

# copy the get_coco_dataset.sh file
cp darknet/scripts/get_coco_dataset.sh data

# execute the get_coco_dataset.sh file
chmod 755 data/get_coco_dataset.sh
./data/get_coco_dataset.sh

# Create a folder named config
mkdir data/config
# copy the yolov3.cfg file
cp darknet/cfg/yolov3.cfg data/config/

# Finally, download the coco.names file and put it in the data folder
wget https://github.com/pjreddie/darknet/blob/master/data/coco.names
cp coco.names data/

### Creating a custom COCO dataset

Now that we've downloaded the COCO dataset, we will create training and validation datasets and dataloaders using PyTorch's Dataset and Dataloader classes.

we will define the CocoDataset class and show some sample images from
the training and validation datasets.

In [2]:
class CocoDataset(Dataset):

  def __init__(self, files_path, transform=None, trans_params=None):
    # get list of images
    with opne(files_path, "r") as file:
      self.img_path = file.readlines()
    # get list of labels
    self.label_path = [path.replace("images", "labels").replace(".png", "txt").replace(".jpg", ".txt") for path in self.img_path]
    self.trans_params = trans_params 
    self.transform = transform 

  def __len__(self):
    return len(self.img_path)

  def __getitem__(self, index):
    img_path = self.img_path[index % len(self.img_path)].rstrip()
    img = Image.open(img_path).convert("RGB")
    label_path = self.label_path[index % len(self.img_path)].rstrip()

    labels = None
    if os.path.exists(label_path):
      labels = np.loadtxt(label_path).replace(-1, 5)
    if self.transform:
      img, labels = self.transform(img, labels, self.trans_params)

    return img, labels, img_path

Next, we will create an object of the CocoDataset class for the validation data:

In [None]:
root_data = "./data/coco"
train_file_path = os.path.join(root_data, "trainvalno5k.txt")
coco_train = CocoDataset(train_file_path)
print(len(coco_train))

In [None]:
# Get a sample item from coco_val:
img, labels, img_path = coco_train[1] 
print("image size:", img.size, type(img))
print("labels shape:", labels.shape, type(labels))
print("labels \n", labels)

Let's display a sample image from the coco_train and coco_val datasets.

In [None]:
val_file_path = os.path.join(root_data, "5k.txt")
coco_val = CocoDataset(val_file_path, transform=None, trans_params=None)
print(len(coco_val))

In [None]:
# Get a sample item from coco_val:
img, labels, img_path = coco_val[7] 
print("image size:", img.size, type(img))
print("labels shape:", labels.shape, type(labels))
print("labels \n", labels)

Let's display a sample image from the coco_train and coco_val datasets.

In [None]:
# Get a list of COCO object names
coco_names_path="./data/coco.names"
fp = open(coco_names_path, "r")
coco_names = fp.read().split("\n")[:-1]
print("number of classese:", len(coco_names))
print(coco_names)

In [None]:
# Define a rescale_bbox helper function to rescale normalized bounding boxes to the original image size
def rescale_bbox(bb, W, H):
  x, y, w, h = bb
  return [x * W, y * H, w * W, h * H]

In [None]:
# Define the show_img_bbox helper function to show an image with object bounding boxes
COLORS = np.random.randint(0, 255, size=(80, 3),dtype="uint8")
# if the font that's passed to ImageFont.truetype is not available
# Alternatively, you may use a more common font
# fnt = ImageFont.truetype('arial.ttf', 16)
fnt = ImageFont.truetype('Pillow/Tests/fonts/FreeMono.ttf', 16)

def show_img_bbox(img, targets):
  if torch.is_tensor(img):
      img=to_pil_image(img)
  if torch.is_tensor(targets):
      targets=targets.numpy()[:,1:]
      
  W, H=img.size
  draw = ImageDraw.Draw(img)
  
  for target in targets:
      id_=int(target[0])
      bbox=target[1:]
      bbox=rescale_bbox(bbox,W,H)
      xc, yc, w, h=bbox
      
      color = [int(c) for c in COLORS[id_]]
      name=coco_names[id_]
      
      draw.rectangle(((xc-w/2, yc-h/2), (xc+w/2, yc+h/2)), outline=tuple(color), width=3)
      draw.text((xc-w/2, yc-h/2), name, font=fnt, fill=(255, 255, 255, 0))
  plt.imshow(np.array(img))

In [None]:
# Call the show_img_bbox helper function to show a sample image from coco_train
np.random.seed(2)
rnd_ind=np.random.randint(len(coco_train))
img, labels, img_path = coco_train[rnd_ind] 
print(img.size, labels.shape)

plt.rcParams['figure.figsize'] = (20, 10)
show_img_bbox(img, labels)

In [None]:
# Call the show_img_bbox helper function to show a sample image from coco_val
np.random.seed(0)
rnd_ind=np.random.randint(len(coco_val))
img, labels, img_path = coco_val[rnd_ind] 
print(img.size, labels.shape)

plt.rcParams['figure.figsize'] = (20, 10)
show_img_bbox(img, labels)

### Transforming data

In this section, we will define a transform function and the parameters to be passed to the CocoDataset class.

In the Transforming the data subsection, we defined the functions required for data transformation. These transformations were required to resize images, augment the data, or convert the data into PyTorch tensors.

In [3]:
# First, we will define a pad_to_square helper function
def pad_to_square(img, boxes, pad_value=0, normalized_labels=True):
  """
  img: A PIL image
  boxes: A numpy array with a shape of (n, 5) that contains n bounding boxes
  pad_value: The pixel fill value, which defaults to zero
  normalized_labels: A flag to show whether the bounding boxes were normalized to the range [0, 1]
  """
  w, h = img.size
  w_factor, h_factor = (w, h) if normalized_labels else (1, 1)

  # calculate the padding size and divided it into two values: pad1 and pad2
  dim_diff = np.abs(h - w)
  pad1 = dim_diff // 2
  pad2 = dim_diff - pad1

  if h <= w:
    left, top, right, bottom = 0, pad1, 0, pad2
  else:
    left, top, right, bottom = pad1, 0, pad2, 0
  padding = (left, top, right, bottom)

  # calculate the padding size on each side of the image
  img_padded = TF.pad(img, padding=padding, fill=pad_value)
  w_padded, h_padded = img_padded.size

  # adjust the bounding box coordinates based on the padding size.
  x1 = w_factor * (boxes[:, 1] - boxes[:, 3] / 2)
  y1 = h_factor * (boxes[:, 2] - boxes[:, 4] / 2)
  x2 = w_factor * (boxes[:, 1] + boxes[:, 3] / 2)
  y2 = h_factor * (boxes[:, 2] + boxes[:, 4] / 2)

  # Then, we adjusted x1, y1, x2, y2 by adding the padding sizes.
  x1 += padding[0]   # left
  y1 += padding[1]   # top
  x2 += padding[2]   # right
  y2 += padding[3]   # bottom

  # calculate the bounding boxes using the adjusted values of x1, y1, x2, y2.
  # Note that we normalized the labels again to the range of [0, 1].
  boxes[:, 1] = ((x1 + x2) / 2) / w_padded
  boxes[:, 2] = ((y1 + y2) / 2) / h_padded
  boxes[:, 3] *= w_factor / w_padded
  boxes[:, 4] *= h_factor / h_padded

  return img_padded, boxes

In [5]:
# Define the hflip helper function to horizontally flip images
def hflip(image, labels):
  image = TF.hflip(image)
  labels[:, 1] = 1.0 - labels[:, 1]

  return image, labels

In [4]:
# Define the transformer function
def transformer(image, labels, params):
  """
  image: A PIL image
  labels: Bounding boxes as a numpy array that's (n, 5) in size
  params: A Python dictionary containing the transformation parameters
  """
  if params["pad2square"] is True:
    image, labels = pad_to_square(image, labels)
  image = TF.resize(image, params["target_size"])

  if random.random() < params["p_hflip"]:
    image, labels = hflip(image, labels)   # randomly flip the image for data augmentation

  image = TF.to_tensor(image)              # convert the PIL image into a PyTorch tensor
  targets = torch.zeros((len(labels), 6))  # also convert into a PyTorch tensor of size n*6. The extra dimension will be used to index images in a mini-batch.
  targets[:, 1:] = torch.from_numpy(labels)

  return image, targets

We redefined `coco_train`; however, this time, we passed transformer and
`trans_params_train` to the CocoDataset class. To force the horizontal flip, we set the `p_hflip` probability to 1.0. In practice, we usually set the probability to 0.5. You can see the effect of the transformations on the sample image. The image has been zero-padded from the top and bottom, resized to 416*416, and horizontally flipped.

In [None]:
# Now, let's create an object of CocoDataset for training data by passing the transformer
trans_params_train = {
    "target_size": (416, 416),
    "pad2square": True,
    "p_hflip": 1.0,
    "normalized_labels": True
}

coco_train = CocoDataset(train_file_path, transform=transformer, trans_params=trans_params_train)

In [None]:
np.random.seed(2)
rnd_ind = np.random.randint(len(coco_train))
img, targets, img_path = coco_train[rnd_ind]
print("image shape:", img.shape)
print("labels shape:", targets.shape)

plt.rcParams['figure.figsize'] = (20, 10)
COLORS = np.random.randint(0, 255, size=(80, 3),dtype="uint8")
show_img_bbox(img,targets)

Similarly, we redefined coco_val. We did not need data augmentation for the
validation data, so we set the probability of p_hflip to 0.0. Check out the transformed sample size. It has been zero-padded from the top and bottom and resized to 416*416 but not flipped.

In [None]:
# Similarly, we will define an object of CocoDataset by passing the transformer to validate the data
trans_params_val = {
    "target_size": (416, 416),
    "pad2square": True,
    "p_hflip": 0.0,
    "normalized_labels": True
}

coco_val = CocoDataset(val_file_path, transform=transformer, trans_params=trans_params_val)

In [None]:
np.random.seed(2)
rnd_ind = np.random.randint(len(coco_val))
img, targets, img_path = coco_val[rnd_ind]
print("image shape:", img.shape)
print("labels shape:", targets.shape)

plt.rcParams['figure.figsize'] = (20, 10)
COLORS = np.random.randint(0, 255, size=(80, 3),dtype="uint8")
show_img_bbox(img,targets)

### Defining the Dataloaders

We will define two dataloaders for training and validation of datasets so we
can get mini-batches of data from coco_train and coco_val.

We also defined the collate_fn function to process a mini-batch and return
PyTorch tensors. The function was given as an argument to the Dataloader class so that the process happens on the fly. In the function, we grouped the images, targets, and paths in the mini-batch using zip(*iterateble). Then, we removed any empty bounding boxes in the targets. Next, we set the sample index in the mini-batch. Finally, we concatenated the images and targets as PyTorch tensors. To see how this works, we extracted a mini-batch from train_dl and val_dl and printed the shape of the returned tensors.

Define an object of the Dataloader class for the training data.

In [6]:
batch_size = 8

def collate_fn(batch):
  imgs, targets, paths = list(zip(*batch))

  # Remove empty boxes
  targets = [boxes for boxes in targets if boxes is not None]

  # set the sample index
  for b_i, boxes in enumerate(targets):
    boxes[:, 0] = b_i
  targets = torch.cat(targets, 0)
  imgs = torch.stack([img for img in imgs])

  return imgs, targets, paths

In [None]:
train_dataloader = DataLoader(coco_train, batch_size=batch_size, shuffle=True, num_workers=0, pin_memory=True, collate_fn=collate_fn)

Let's extract a mini-batch from train_dataloader.

In [None]:
torch.manual_seed(0)

for imgs_batch, target_batch, path_batch in train_dataloader:
  break

print(imgs_batch.shape)
print(target_batch.shape, target_batch.dtype)

Define an object of the Dataloader class for the validation data.

In [None]:
val_dataloader = DataLoader(coco_val, batch_size=batch_size, shuffle=False, num_workers=0, pin_memory=True, collate_fn=collate_fn)

Let's extract a mini-batch from val_dataloader:

In [None]:
torch.manual_seed(0)

for imgs_batch, target_batch, path_batch in val_dataloader:
  break

print(imgs_batch.shape)
print(target_batch.shape, target_batch.dtype)

## Creating a YOLO-v3 model

The YOLO-v3 network is built of convolutional layers with stride 2, skip connections, and up-sampling layers. There are no pooling layers. The network receives an image whose size is 416*416 as input and provides three YOLO outputs.

<img src='https://github.com/rahiakela/img-repo/blob/master/object-detection-images/yolo-v3.png?raw=1' width='800'/>

The network down-samples the input image by a factor of 32 to a feature map of
size `13*13`, where `yolo-out1` is provided. To improve the detection performance, the `13*13` feature map is up-sampled to `26*26` and `52*52`, where we have `yolo-out2` and `yolo-out3`, respectively. A cell in a feature map predicts three bounding boxes that correspond to three predefined anchors. As a result, the network predicts `13*13*3+26*26*3+52*52*3=10647` bounding boxes in total.

A bounding box is defined using 85 numbers:
- Four coordinates, `[x, y, w, h]`
- An abjectness score
- `C=80` class predictions corresponding to 80 object categories in the COCO
dataset




### Parsing the configuration file