<img src="https://i.imgur.com/waxJLOR.png"/>

In this notebook, we will learn about different data augmentation techniques for object detection i.e images with bounding boxes. 

In [None]:
import os
import re
import cv2
import ast
import random
import pandas as pd
import numpy as np

from PIL import Image
import albumentations as A
from collections import namedtuple
from albumentations.pytorch.transforms import ToTensorV2

import matplotlib.pyplot as plt
import matplotlib.patches as patches

from pathlib import Path
base_dir = Path('/kaggle/input/global-wheat-detection')
train_dir = base_dir/'train'

We will start by loading our data and processing it.

In [None]:
def load_data():
    train_df = pd.read_csv(base_dir/'train.csv')
    bboxes = np.stack(train_df['bbox'].apply(lambda x: ast.literal_eval(x)))
    for i, col in enumerate(['x_min', 'y_min', 'w', 'h']):
        train_df[col] = bboxes[:, i]

    train_df["x_max"] = train_df.apply(lambda col: col.x_min + col.w, axis=1)
    train_df["y_max"] = train_df.apply(lambda col: col.y_min + col.h, axis = 1)
    train_df.drop(['bbox', 'w', 'h'], axis=1, inplace=True)
    print('DataFrame size: ',train_df.shape)
    return train_df

In [None]:
df = load_data()
df.head()

### Helper functions

In [None]:
def draw_rect(img, bboxes, color=(255, 0, 0)):
    img = img.copy()
    for bbox in bboxes:
        bbox = np.array(bbox).astype(int)
        pt1, pt2 = (bbox[0], bbox[1]), (bbox[2], bbox[3])
        img = cv2.rectangle(img, pt1, pt2, color, int(max(img.shape[:2]) / 200))
    return img

def read_img(img_id):
    img_path = train_dir/f'{img_id}.jpg'
    img = cv2.imread(str(img_path))
    return img

def read_bboxes(img_id):
    return df.loc[df.image_id == img_id, 'x_min y_min x_max y_max'.split()].values

def plot_img(img_id, bbox=False):
    img    = read_img(img_id)
    if bbox:
        bboxes = read_bboxes(img_id)
        img    = draw_rect(img, bboxes)
    plt.imshow(img);
    
def plot_multiple_img(img_matrix_list, title_list, ncols, nrows=3, main_title=""):
    fig, myaxes = plt.subplots(figsize=(20, 15), nrows=nrows, ncols=ncols, squeeze=False)
    fig.suptitle(main_title, fontsize = 30)
    fig.subplots_adjust(wspace=0.3)
    fig.subplots_adjust(hspace=0.3)
    for i, (img, title) in enumerate(zip(img_matrix_list, title_list)):
        
        myaxes[i // ncols][i % ncols].imshow(img)
        myaxes[i // ncols][i % ncols].set_title(title, fontsize=15)
        myaxes[i // ncols][i % ncols].grid(False)
        myaxes[i // ncols][i % ncols].set_xticks([])
        myaxes[i // ncols][i % ncols].set_yticks([])

    plt.show()

Lets select an `image_id` for our experiments

In [None]:
img_id = '0b5b60131'

[Albumentations](https://albumentations.ai/docs/) library makes it super easy to apply transforms to your images. The library provides wide range of transformations (that can be very easily customized). I highly recommended you to checkout their [github repository](https://github.com/albumentations-team/albumentations). 

Lets have a look at some of these transforms:

In [None]:
chosen_img = read_img(img_id)

albumentation_list = [A.RandomFog(p=1), 
                      A.RandomBrightness(p=1), 
                      A.RandomCrop(p=1,height = 512, width = 512), 
                      A.Rotate(p=1, limit=90),
                      A.RGBShift(p=1), 
                      A.RandomSnow(p=1), 
                      A.VerticalFlip(p=1), 
                      A.RandomContrast(limit = 0.5,p = 1)]

titles_list = ["Original", 
               "RandomFog",
               "RandomBrightness", 
               "RandomCrop",
               "Rotate", 
               "RGBShift", 
               "RandomSnow", 
               "VerticalFlip", 
               "RandomContrast"]

img_matrix_list = [chosen_img]
for aug_type in albumentation_list:
    img = aug_type(image = chosen_img)['image']
    img_matrix_list.append(img)

plot_multiple_img(img_matrix_list, 
                  titles_list, 
                  ncols = 3, 
                  main_title="Different Types of Augmentations")

This is pretty dope. Applying a transformation is just a function call. Albumentations has many more augmentations, so if you are seeing albumentations for the first time, then you should definitely learning it. Its simply great and once you have learnt this library, you will not have to learn anything else for image augmentation.

They also have a [website](https://albumentations-demo.herokuapp.com/) that can help you visualize your transformations.

### Bounding Boxes

Augmenting images is easy and straight forward. But things get pretty complex when you start working with bounding boxes. When you transform an image, the coordinates of bounding boxes are also altered. Apply same transformations to your image and bounding box is very sophisticated and prone to errors. Things get even more complex when you start chaining these transformations. 

Lucky for us, albumentations library supports working with bounding boxes. You should refer [this](https://albumentations.ai/docs/getting_started/bounding_boxes_augmentation/) doc page for more details. 

The most important thing when working with bounding boxes is the **annotations**. There are different annotations formats for bounding boxes. Each format uses its specific representation of bouning boxes coordinates. Albumentations supports four formats: `pascal_voc`, `albumentations`, `coco`, and `yolo`. The image below will help you understand different annotations better.

![](https://albumentations.ai/docs/images/getting_started/augmenting_bboxes/bbox_formats.jpg)

**Note:** Different models (like yolo, faster-rcnn, etc.) expects bboxes in different formats. So, before using any model just make sure that you know the annotation type that it uses. And then format your bboxes accordingly.

Here, we have our bounding boxes formatted in `pascal_voc` because our faster-RCNN model expects so. Lets plot all the above transformations again, but this time with bounding boxes.

In [None]:
chosen_img = read_img(img_id)
bboxes = read_bboxes(img_id)
bbox_params = {'format': 'pascal_voc', 'label_fields': ['labels']}

albumentation_list = [A.Compose([A.RandomFog(p=1)], bbox_params=bbox_params),
                      A.Compose([A.RandomBrightness(p=1)], bbox_params=bbox_params),
                      A.Compose([A.RandomCrop(p=1, height=512, width=512)], bbox_params=bbox_params), 
                      A.Compose([A.Rotate(p=1, limit=90)], bbox_params=bbox_params),
                      A.Compose([A.RGBShift(p=1)], bbox_params=bbox_params), 
                      A.Compose([A.RandomSnow(p=1)], bbox_params=bbox_params),
                      A.Compose([A.VerticalFlip(p=1)], bbox_params=bbox_params), 
                      A.Compose([A.RandomContrast(limit=0.5, p = 1)], bbox_params=bbox_params)
                     ]

titles_list = ["Original", 
               "RandomFog",
               "RandomBrightness", 
               "RandomCrop",
               "Rotate", 
               "RGBShift", 
               "RandomSnow", 
               "VerticalFlip", 
               "RandomContrast"]

img_matrix_list = [draw_rect(chosen_img, bboxes)]

for aug_type in albumentation_list:
    anno = aug_type(image=chosen_img, bboxes=bboxes, labels=np.ones(len(bboxes)))
    img  = draw_rect(anno['image'], anno['bboxes'])
    img_matrix_list.append(img)

plot_multiple_img(img_matrix_list, 
                  titles_list, 
                  ncols = 3, 
                  main_title="Different Types of Augmentations")

Albumentations library automatically takes care of bboxes, amazing right? Doing it manually would be a nightmare. Not just bounding boxes but albumentations also support mask augmentation for segmentation and keypoints augmentation. You can also apply all these things simultaneously, what else do you want from an augmentation library?! You can check the complete list of transforms and their supported targets, [here](https://albumentations.ai/docs/getting_started/transforms_and_targets/#spatial-level-transforms).  

## Advance Augmentations

In recent times, many new augmentation techniques have been devloped. Some of them are extremely powerful. When applied, they can help your model generalize better and make it more robust. We will look are four of them: 
- CutOut
- Mixup
- CutMix
- Mosaic

Lets get started . . .

## Cutout Augmentation

**CutOut** is a technique for regularizing CNN models. It works by masking square regions of the image. By randomly masking regions of the image we can prevent the model from falling prey to noice in the data and overfit. 

**Note:** Reading research papers is one of the most under rated skill set in Deep Learning community, but to become a good practitioner you must regularly spend time reading research papers. This [cutout paper](https://arxiv.org/pdf/1708.04552) is an easy read, give it a try.

Albumentations already has a working implementation of CutOut. So, we will use it directly. 

In [None]:
from albumentations.augmentations.transforms import Cutout

In [None]:
img_id = '3a4672486'
chosen_img = read_img(img_id)
bboxes = read_bboxes(img_id)

bbox_params = {'format': 'pascal_voc', 'label_fields': ['labels']}
augmentation = A.Compose([Cutout(num_holes=8, 
                                 max_h_size=80, 
                                 max_w_size=80, 
                                 fill_value=0, 
                                 p=1),
                         ], 
                         bbox_params=bbox_params)

img_matrix_list = [draw_rect(chosen_img, bboxes)]

anno = augmentation(image=chosen_img, bboxes=bboxes, labels=np.ones(len(bboxes)))
img  = draw_rect(anno['image'], anno['bboxes'])
img_matrix_list.append(img)

titles_list = ["Original", "CutOut"]

plot_multiple_img(img_matrix_list, titles_list, ncols = 2, nrows= 1, main_title="")

There is one problem with this, if you make the `max_h_size` and `max_w_size` very high, say 250 px. Then you will have some bounding boxes inside the patch. This is what I am talking about. 

<img src="https://i.imgur.com/dcq9oQB.jpg"/>

This makes it really difficult for the model to learn from the images. Ideally, we should also remove the bounding boxes that lies inside the mask. So, I would recommend you to use it with small values of `max_h_size` and `max_w_size`. 

**Note:** All the data augmentation techniques that we have use so far can be applied to any type of computer vision problem. From here on, the code can only be used for object detection.

To use bigger cutouts we need to write a custom transformer, that would also remove the bounding boxes. The code is taken from [this kernel](http://www.kaggle.com/kaushal2896/data-augmentation-tutorial-basic-cutout-mixup). I have removed some less important parts and made some changes to make it easy to use.

In [None]:
from albumentations.core.transforms_interface import DualTransform
from albumentations.augmentations.bbox_utils import denormalize_bbox, normalize_bbox

class CustomCutout(DualTransform):
    """
    Custom Cutout augmentation with handling of bounding boxes 
    Note: (only supports square cutout regions)
    
    Author: Kaushal28
    Reference: https://arxiv.org/pdf/1708.04552.pdf
    """
    
    def __init__(
        self,
        fill_value=0,
        bbox_removal_threshold=0.50,
        min_cutout_size=192,
        max_cutout_size=512,
        always_apply=False,
        p=0.5
    ):
        """
        Class construstor
        
        :param fill_value: Value to be filled in cutout (default is 0 or black color)
        :param bbox_removal_threshold: Bboxes having content cut by cutout path more than this threshold will be removed
        :param min_cutout_size: minimum size of cutout (192 x 192)
        :param max_cutout_size: maximum size of cutout (512 x 512)
        """
        super(CustomCutout, self).__init__(always_apply, p)  # Initialize parent class
        self.fill_value = fill_value
        self.bbox_removal_threshold = bbox_removal_threshold
        self.min_cutout_size = min_cutout_size
        self.max_cutout_size = max_cutout_size
        
    def _get_cutout_position(self, img_height, img_width, cutout_size):
        """
        Randomly generates cutout position as a named tuple
        
        :param img_height: height of the original image
        :param img_width: width of the original image
        :param cutout_size: size of the cutout patch (square)
        :returns position of cutout patch as a named tuple
        """
        position = namedtuple('Point', 'x y')
        return position(
            np.random.randint(0, img_width - cutout_size + 1),
            np.random.randint(0, img_height - cutout_size + 1)
        )
        
    def _get_cutout(self, img_height, img_width):
        """
        Creates a cutout pacth with given fill value and determines the position in the original image
        
        :param img_height: height of the original image
        :param img_width: width of the original image
        :returns (cutout patch, cutout size, cutout position)
        """
        cutout_size = np.random.randint(self.min_cutout_size, self.max_cutout_size + 1)
        cutout_position = self._get_cutout_position(img_height, img_width, cutout_size)
        return np.full((cutout_size, cutout_size, 3), self.fill_value), cutout_size, cutout_position
        
    def apply(self, image, **params):
        """
        Applies the cutout augmentation on the given image
        
        :param image: The image to be augmented
        :returns augmented image
        """
        image = image.copy()  # Don't change the original image
        self.img_height, self.img_width, _ = image.shape
        cutout_arr, cutout_size, cutout_pos = self._get_cutout(self.img_height, self.img_width)
        
        # Set to instance variables to use this later
        self.image = image
        self.cutout_pos = cutout_pos
        self.cutout_size = cutout_size
        
        image[cutout_pos.y:cutout_pos.y+cutout_size, cutout_pos.x:cutout_size+cutout_pos.x, :] = cutout_arr
        return image
    
    def apply_to_bbox(self, bbox, **params):
        """
        Removes the bounding boxes which are covered by the applied cutout
        
        :param bbox: A single bounding box coordinates in pascal_voc format
        :returns transformed bbox's coordinates
        """

        # Denormalize the bbox coordinates
        bbox = denormalize_bbox(bbox, self.img_height, self.img_width)
        x_min, y_min, x_max, y_max = tuple(map(int, bbox))

        bbox_size = (x_max - x_min) * (y_max - y_min)  # width * height
        overlapping_size = np.sum(
            (self.image[y_min:y_max, x_min:x_max, 0] == self.fill_value) &
            (self.image[y_min:y_max, x_min:x_max, 1] == self.fill_value) &
            (self.image[y_min:y_max, x_min:x_max, 2] == self.fill_value)
        )

        # Remove the bbox if it has more than some threshold of content is inside the cutout patch
        if overlapping_size / bbox_size > self.bbox_removal_threshold:
            return normalize_bbox((0, 0, 0, 0), self.img_height, self.img_width)

        return normalize_bbox(bbox, self.img_height, self.img_width)

    def get_transform_init_args_names(self):
        """
        Fetches the parameter(s) of __init__ method
        :returns: tuple of parameter(s) of __init__ method
        """
        return ('fill_value', 'bbox_removal_threshold', 'min_cutout_size', 'max_cutout_size', 'always_apply', 'p')

In [None]:
img_id = '1b399c9a7'
chosen_img = read_img(img_id)
bboxes = read_bboxes(img_id)

bbox_params = {'format': 'pascal_voc', 'label_fields': ['labels']}
augmentation = A.Compose([CustomCutout(p=1),], bbox_params = bbox_params)

img_matrix_list = [draw_rect(chosen_img, bboxes)]

anno = augmentation(image=chosen_img, bboxes=bboxes, labels=np.ones(len(bboxes)))
img  = draw_rect(anno['image'], anno['bboxes'])
img_matrix_list.append(img)

titles_list = ["Original", 
               f'CutOut Image: Removed bboxes: {len(bboxes)-len(anno["bboxes"])}']

plot_multiple_img(img_matrix_list, titles_list, ncols = 2, nrows= 1, main_title="")

Here, we are also removing the bboxes that are completely inside the patch. You can see the number of removed boxes at the top of the CutOut image. Amazing! Lets write our second custom transformer for MixUp.

## Mixup Augmentation
In mixup, two images are mixed with weights:  λ  and  1−λ .  λ  is generated from symmetric beta distribution with parameter alpha. This creates new virtual training samples.

In image classification images and labels can be mixed up as following:

[Image source](https://www.kaggle.com/kaushal2896/data-augmentation-tutorial-basic-cutout-mixup)
![](https://hoya012.github.io/assets/img/bag_of_trick/9.PNG)

You read the MixUp research paper, [here](https://arxiv.org/pdf/1710.09412.pdf). Again, this is an easy read. 

But in object detection tasks, the labels are not one hot encoded classes and hence after mixing two images, the resultant image's label would be the union of bounding boxes of both the images and this makes implementation simpler.

Now let's implement it.

In [None]:
def mixup(images, bboxes, areas=None, alpha=1.0):
    """
    Randomly mixes the given list if images with each other
    
    :param images: The images to be mixed up
    :param bboxes: The bounding boxes (labels)
    :param alpha: Required to generate image wieghts (lambda) using beta distribution. In this case we'll use alpha=1, which is same as uniform distribution
    """
    # Generate image weight (minimum 0.4 and maximum 0.6)
    lam = np.clip(np.random.beta(alpha, alpha), 0.4, 0.6)
    print(f'lambda: {lam}')
    
    # Weighted Mixup
    mixedup_images = (lam*images[0] + (1 - lam)*images[1]).astype(np.uint8)
    mixedup_bboxes = np.vstack(bboxes)
    if areas: 
        mixedup_areas = areas[0] + areas[1]
        return mixedup_images, mixedup_bboxes, mixedup_areas
    
    return mixedup_images, mixedup_bboxes

As you can see, its pretty straight forward. Just combine two images and bboxes. We also have bbox areas, because some models also take area as input. Lets test it . . . 

In [None]:
image_ids = ['00e903abe', '0bb1adbd8']
images = [read_img(img_id)    for img_id in image_ids]
bboxes = [read_bboxes(img_id) for img_id in image_ids]

aug_image, aug_bbox = mixup(images, bboxes)

images += [aug_image]
bboxes += [aug_bbox]

img_matrix_list = []
for img, bbox in zip(images, bboxes): 
    img_matrix_list.append(draw_rect(img, bbox))
    
titles_list     = ['Image 1', 'Image 2', 'Augmented Image']
plot_multiple_img(img_matrix_list, titles_list, ncols = 3, nrows= 1, main_title="")

You can see the augmented image has many more bounding boxes. And if you observe carefully, you will also see some overlapping components.

## CutMix Augmentation

Cutmix involves cutting a rectangular portion of a random image and then pasting it onto the concerned image at the same spot from where the portion was cut. Here is the link to the paper for details: https://arxiv.org/abs/1905.04899.

A samll part of the code is inspired by this [notebook](https://www.kaggle.com/debanga/cutmix-in-python). I would highly recommend you to checkout this notebook for more details.

In [None]:
def rand_bbox(size, lamb):
    """ Generate random bounding box 
    Args:
        - size: [width, breadth] of the bounding box
        - lamb: (lambda) cut ratio parameter
    Returns:
        - Bounding box
    """
    W = size[0]
    H = size[1]
    cut_rat = np.sqrt(1. - lamb)
    cut_w = np.int(W * cut_rat)
    cut_h = np.int(H * cut_rat)

    # uniform
    cx = np.random.randint(W)
    cy = np.random.randint(H)

    bbx1 = np.clip(cx - cut_w // 2, 0, W)
    bby1 = np.clip(cy - cut_h // 2, 0, H)
    bbx2 = np.clip(cx + cut_w // 2, 0, W)
    bby2 = np.clip(cy + cut_h // 2, 0, H)

    return bbx1, bby1, bbx2, bby2


def generate_cutmix_image(images, bboxes, beta=1.0, th=0.25):
    """ Generate a CutMix augmented image from a batch 
    Args:
        - image_batch: a batch of input images
        - image_batch_labels: labels corresponding to the image batch
        - beta: a parameter of Beta distribution.
    Returns:
        - CutMix image batch, updated labels
    """
    # generate mixed sample
    fill_value = 255
    lam = np.random.beta(beta, beta)
    target_a = bboxes[0]
    target_b = bboxes[1]
    w, h, c = images[0].shape
    bbx1, bby1, bbx2, bby2 = rand_bbox((w,h,c), lam)
    cutmix_image = images[0].copy()
    cutmix_image[bby1:bby2, bbx1:bbx2, :] = fill_value
    
    # bboxes
    new_bboxes = []
    for bbox in bboxes[0]:
        x_min, y_min, x_max, y_max = map(int, bbox)
    
        bbox_size = (x_max - x_min) * (y_max - y_min)  # width * height
        overlapping_size = np.sum(
            (cutmix_image[y_min:y_max, x_min:x_max, 0] == fill_value) &
            (cutmix_image[y_min:y_max, x_min:x_max, 1] == fill_value) &
            (cutmix_image[y_min:y_max, x_min:x_max, 2] == fill_value)
        )

        # Add the bbox if it has less than some threshold of content is inside the cutout patch
        if overlapping_size / bbox_size < th:
            new_bboxes.append(bbox)
            
    mask = np.zeros(images[1].shape)
    mask[bby1:bby2, bbx1:bbx2, :] = 1
    image2 = images[1]*mask
    
    for bbox in bboxes[1]:
        x_min, y_min, x_max, y_max = map(int, bbox)
    
        bbox_size = (x_max - x_min) * (y_max - y_min)  # width * height
        overlapping_size = np.sum(
            (image2[y_min:y_max, x_min:x_max, 0] == 0) &
            (image2[y_min:y_max, x_min:x_max, 1] == 0) &
            (image2[y_min:y_max, x_min:x_max, 2] == 0)
        )

        # Add the bbox if it has less than some threshold of content is inside the cutout patch
        if overlapping_size / bbox_size < th:
            new_bboxes.append(bbox)
    
    cutmix_image[bby1:bby2, bbx1:bbx2, :] = image2[bby1:bby2, bbx1:bbx2, :]
        
    return cutmix_image, new_bboxes

In [None]:
image_ids = ['00e903abe', '0bb1adbd8']
images = [read_img(img_id)    for img_id in image_ids]
bboxes = [read_bboxes(img_id) for img_id in image_ids]

aug_image, aug_bbox = generate_cutmix_image(images, bboxes)

images += [aug_image]
bboxes += [aug_bbox]

img_matrix_list = []
for img, bbox in zip(images, bboxes): 
    img_matrix_list.append(draw_rect(img, bbox))
    
titles_list     = ['Image 1', 'Image 2', 'Augmented Image']
plot_multiple_img(img_matrix_list, titles_list, ncols = 3, nrows= 1, main_title="")

## Mosaic Augmentation

In mosaic, instead of using 2 images we use 4 images. We stitch them together to make one big image and then randomly select a portion of that image. The image below will help you to understand the process better.

<img src="https://i.imgur.com/uP4tD0v.png" />

Here are some sample images generated by mosaic augmentation. 

<div class="row" style="display:flex" >
  <div class="column" style="padding:15px">
    <img src="https://i.imgur.com/KOHpvKm.jpg" style="width:70%"/>
  </div>
  <div class="column" style="padding:15px">
    <img src="https://i.imgur.com/4EqDXgY.jpg" style="width:70%"/>
  </div>
</div>

For the [Global Wheat Detection](https://www.kaggle.com/c/global-wheat-detection) competition I used the implementation from this [Alex Shonenkov's notebook](https://www.kaggle.com/shonenkov/training-efficientdet).

If find some more augmentation technique or faster/better implementation of the above techniques, then please let me know in the comments. 

### I hope this notebook was of some value to you. Don't forget to upvote!