### Basics of Detectron2 - Balloon detection

by @vbookshelf<br>
10 June 2021

## Introduction

Recently I've started exploring Detectron2. In this notebook I will apply what I've learned to create a model that detects balloons on images.

These are some of the questions that this notebook will answer:

- What does the step by step workflow look like?
- How does one feed data into the model?
- What format does the input data need to have?
- What is the format of the model's output?

Let's get started.


In [None]:
import pandas as pd
import numpy as np
import os

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

import torchvision
import torchvision.transforms as transforms

from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, utils

import ast
import cv2

from sklearn import model_selection
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle

import matplotlib.pyplot as plt
%matplotlib inline

# Don't Show Warning Messages
import warnings
warnings.filterwarnings('ignore')

# Note: Pytorch uses a channels-first format:
# [batch_size, num_channels, height, width]

print(torch.__version__)
print(torchvision.__version__)

In [None]:
# Set the seed values

import random

seed_val = 101

os.environ['PYTHONHASHSEED'] = str(seed_val)
random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)
torch.backends.cudnn.deterministic = True

In [None]:
os.listdir('../input/v2-balloon-detection-dataset')

In [None]:
base_path = '../input/v2-balloon-detection-dataset/'

In [None]:
NUM_CORES = os.cpu_count()
NUM_CORES

## Define the device

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

print(device)

if torch.cuda.is_available():
    print('Num GPUs:', torch.cuda.device_count())
    print('GPU Type:', torch.cuda.get_device_name(0))

## Load the balloon data

In [None]:
path = base_path + 'balloon-data.csv'

df_data = pd.read_csv(path)

# Convert bbox column entries from strings to lists
# "[........]" to [......]
df_data['bbox'] = df_data['bbox'].apply(ast.literal_eval)

print(df_data.shape)

df_data.head()

## Helper functions

In [None]:
# https://pythonprogramming.net/drawing-writing-python-opencv-tutorial/
# https://codeyarns.com/tech/2015-03-11-fonts-in-opencv.html
# https://stackoverflow.com/questions/60674501/how-to-make-black-background-in-cv2-puttext-with-python-opencv
# https://www.geeksforgeeks.org/python-opencv-cv2-puttext-method/
# https://pysource.com/2018/01/22/drawing-and-writing-on-images-opencv-3-4-with-python-3-tutorial-3/


def draw_bbox(image, xmin, ymin, xmax, ymax, text=None):
    
    """
    This functions draws one bounding box on an image.
    
    Input: Image (numpy array)
    Output: Image with the bounding box drawn in. (numpy array)
    
    If there are multiple bounding boxes to draw then simply
    run this function multiple times on the same image.
    
    Set text=None to only draw a bbox without
    any text or text background.
    E.g. set text='Balloon' to write a 
    title above the bbox.
    
    xmin, ymin --> coords of the top left corner.
    xmax, ymax --> coords of the bottom right corner.
    
    """


    w = xmax - xmin
    h = ymax - ymin

    # Draw the bounding box
    # ......................
    
    start_point = (xmin, ymin) 
    end_point = (xmax, ymax) 
    bbox_color = (255, 0, 0) 
    bbox_thickness = 15

    image = cv2.rectangle(image, start_point, end_point, bbox_color, bbox_thickness) 
    
    
    
    # Draw the tbackground behind the text and the text
    # .................................................
    
    # Only do this if text is not None.
    if text:
        
        # Draw the background behind the text
        text_bground_color = (0,0,0) # black
        cv2.rectangle(image, (xmin, ymin-150), (xmin+w, ymin), text_bground_color, -1)

        # Draw the text
        text_color = (255, 255, 255) # white
        font = cv2.FONT_HERSHEY_DUPLEX
        origin = (xmin, ymin-30)
        fontScale = 3
        thickness = 10

        image = cv2.putText(image, text, origin, font, 
                           fontScale, text_color, thickness, cv2.LINE_AA)



    return image

In [None]:
def display_images(df):

    # set up the canvas for the subplots
    plt.figure(figsize=(20,70))


    for i in range(1,13):

        index = i

        # Load an image
        path = base_path + 'images/' + df.loc[index, 'fname']
        image = plt.imread(path)
        #image = cv2.resize(image, (IMAGE_SIZE, IMAGE_SIZE))

        plt.subplot(10,3,i)

        plt.imshow(image)
        plt.axis('off')

## Display a few images

In [None]:
display_images(df_data)

## Display one image with bounding boxes

In [None]:
# set the figsize so the image is larger
plt.figure(figsize=(8,8))

# Choose an index.
# Change this number to see different images.
i = 4   

# Load an image
fname = df_data.loc[i, 'fname']

path = base_path + 'images/' + fname
image = plt.imread(path)

bbox_list = df_data.loc[i, 'bbox']

# Draw the bboxes on the image
for coord_dict in bbox_list:
    
    xmin = int(coord_dict['xmin'])
    ymin = int(coord_dict['ymin'])
    xmax = int(coord_dict['xmax'])
    ymax = int(coord_dict['ymax'])
    
    image = draw_bbox(image, xmin, ymin, xmax, ymax, text=None)

print(image.dtype)
print(image.min())
print(image.max())
print(image.shape)

plt.imshow(image)
plt.axis('off')
plt.show()


## Create train and val data

In [None]:
df_train, df_val = train_test_split(df_data, test_size=0.2, random_state=101)

print(df_train.shape)
print(df_val.shape)

## Install Dectectron2¶
In order to install Detectron2 we need to know:

- the torch version that's installed and,
- the CUDA version that's installed.

In [None]:
# Get the torch version
import torch
print(torch.__version__)

In [None]:
# Get the CUDA version
# The GPU needs to be enabled for this to work.

!nvcc --version

In [None]:
# Get the CUDA version
# The GPU needs to be enabled for this to work.
# The CUDA version is in the top right corner.

! nvidia-smi

In [None]:
# Install Dectectron2

# Examples:
# If the CUDA version is 10.2 the we enter cu102 below.
# If the CUDA version is 10.0 we enter cu100 below.

# You can add the -q flag and the output will not displayed.
# !pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu110/torch1.7/index.html -q

# Using torch version 1.7 (torch1.7) and cuda version 11.0 (cu110)
!pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu110/torch1.7/index.html

In [None]:
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg

from detectron2.engine import DefaultTrainer
from detectron2.engine import DefaultPredictor
from detectron2.data import DatasetCatalog
from detectron2.data import MetadataCatalog

from detectron2.utils.visualizer import Visualizer
from detectron2.structures import BoxMode
from detectron2.utils.visualizer import ColorMode

import matplotlib.pyplot as plt
import cv2
import random

## Register the image data
Ref: https://detectron2.readthedocs.io/en/latest/tutorials/datasets.html

**Quote from the docs:**<br>
If "annotations" is an empty list, it means the image is labeled to have no objects. Such images will by default be removed from training, but can be included using DATALOADER.FILTER_EMPTY_ANNOTATIONS.

In [None]:
# Step 1 - Convert train and val data into a list of dictionaries
# ................................................................

# In order to register the data we will need to write 
# functions that return a list of dictionaries:

#  def train_dataset_function():
#    ...
#
#    return [image_dict_1, image_dict2, ...]


#  def val_dataset_function():
#    ...
#
#    return [image_dict_1, image_dict2, ...]



# What is the format of image_dict and what does it contain?
# ...........................................................

# Each image_dict in the above list corresponds to one image in the train or val dataset.


# This is the format for each image_dict:

# image_dict_1 = {
#     'file_name': 'The full path to the image file. ',
#     'image_id': 'A unique id that identifies this image. Can be the index. (str or int)',
#     'height': 'The height of the image - on which the bbox coords are based (int)',
#     'width': 'The width of the image - on which the bbox coords are based (int)',
#    'annotations': [bbox_dict_1, bbox_dict2, bbox_dict3] # for 3 bboxes on the image
# }

# An image can have multiple bboxes and these bboxes can have different classes.
# Each bbox dict corresponds to one bounding box.

# Each bbox_dict is a dictionary. The bbox dict format is as follows:

# bbox_dict_1 = {
#     'bbox': [xmin, ymin, xmax, ymax],
#     'bbox_mode': BoxMode.XYXY_ABS,
#     'category_id': 'The label (class) of the bbox. (int)'
# }

# BoxMode.XYXY_ABS means that the bbox format is [xmin, ymin, xmax, ymax] and
# the list items are absolute values.
# It's also possible to use BoxMode.XYWH_ABS. Then the
# format is [xmin, ymin, width, height].




# Step 2 - Register the train and val data
# .........................................


# The functions, defined at the top, will then be used to register the data.
# This is how the train and val data are registered:


# thing_classes = ['dog', 'cat']
# thing_colors = [(255, 0, 0), (0, 0, 255)]

# rgb colours:
# red is (255, 0, 0)
# blue is (0, 0, 255)

# from detectron2.data import DatasetCatalog
# DatasetCatalog.register("train_dataset", train_dataset_function)
# DatasetCatalog.register("val_dataset", val_dataset_function)

# from detectron2.data import MetadataCatalog
# MetadataCatalog.get("train_dataset").thing_classes=thing_classes
# MetadataCatalog.get("val_dataset").thing_classes=thing_classes

# If we don't assign a color to each class then the model will
# use random colors when displaying a prediction. Now, for example, all dogs will
# appear red and all cats will appear blue.

# MetadataCatalog.get("train_dataset").thing_colors=thing_colors
# MetadataCatalog.get("val_dataset").thing_colors=thing_colors

In [None]:
def train_dataset_function():
    
    df = df_train
    
    image_dict_list = []
    
    for i in range(0, len(df)):
    
        # Load an image
        fname = df.loc[i, 'fname']
        
        bbox_list =  df.loc[i, 'bbox']
        height = df.loc[i, 'height']
        width = df.loc[i, 'width']
        

        path = base_path + 'images/' + fname
        image_id = i
        
        target = 0
        
        annotations_list = []
        
        for box in bbox_list:
            
            
            xmin = int(round(box['xmin']))
            ymin = int(round(box['ymin']))
            xmax = int(round(box['xmax']))
            ymax = int(round(box['ymax']))
            
            target = 0 # there's only one class.
            
            bbox_dict = {
            'bbox': [xmin, ymin, xmax, ymax],
            'bbox_mode': BoxMode.XYXY_ABS,
            'category_id': target
            }
            
            annotations_list.append(bbox_dict)
            
            
        image_dict = {
            'file_name': path,
            'image_id': image_id,
            'height': height,
            'width': width,
            'annotations': annotations_list
        }
        
        image_dict_list.append(image_dict)
        
        
    return image_dict_list
        
        
def val_dataset_function():
    
    df = df_val
    
    image_dict_list = []
    
    for i in range(0, len(df)):
    
        # Load an image
        fname = df.loc[i, 'fname']
        
        bbox_list =  df.loc[i, 'bbox']
        height = df.loc[i, 'height']
        width = df.loc[i, 'width']
        

        path = base_path + 'images/' + fname
        image_id = i
        
        
        annotations_list = []
        
        for box in bbox_list:
            
            
            xmin = int(round(box['xmin']))
            ymin = int(round(box['ymin']))
            xmax = int(round(box['xmax']))
            ymax = int(round(box['ymax']))
            
            target = 0 # there's only one class.
            
            bbox_dict = {
            'bbox': [xmin, ymin, xmax, ymax],
            'bbox_mode': BoxMode.XYXY_ABS,
            'category_id': target
            }
            
            annotations_list.append(bbox_dict)
            
            
        image_dict = {
            'file_name': path,
            'image_id': image_id,
            'height': height,
            'width': width,
            'annotations': annotations_list
        }
        
        image_dict_list.append(image_dict)
        
        
    return image_dict_list

In [None]:
# Test the functions

df_train = df_train.reset_index(drop=True)
df_val = df_val.reset_index(drop=True)

train_image_dict_list = train_dataset_function()
val_image_dict_list = val_dataset_function()

print(len(train_image_dict_list))
print(len(val_image_dict_list))

In [None]:
# Register the train and val data

# add all the classes to a list
thing_classes = ['Balloon']

DatasetCatalog.register("train_dataset", train_dataset_function)
DatasetCatalog.register("val_dataset", val_dataset_function)

MetadataCatalog.get("train_dataset").thing_classes=thing_classes
MetadataCatalog.get("val_dataset").thing_classes=thing_classes

In [None]:
# Check that the data registration process worked

train_dataset_dicts = train_dataset_function()
train_metadata = MetadataCatalog.get("train_dataset")

i = 4
d = train_dataset_dicts[i]

fname = d['file_name'].split('/')[1]
print(fname)

image = cv2.imread(d['file_name'])

visualizer = Visualizer(image[:, :, ::-1], metadata=train_metadata, scale=0.5)
out = visualizer.draw_dataset_dict(d)
plt.imshow(out.get_image()[:, :, ::-1])

## Choose a model

For example, there are several Faster R-CNN models available in Detectron2, with different backbones and learning schedules.<br>
https://paperswithcode.com/lib/detectron2/faster-r-cnn

Click this link and then use the selector at the top left to choose a variant of faster-rcnn. You can see info on all the different variants. Also click the "SHOW MORE' link.<br>
https://paperswithcode.com/lib/detectron2/faster-r-cnn

- Model Zoo<br>
https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md
- Model selection page (click the folder for a specific model):<br>
https://github.com/facebookresearch/detectron2/tree/master/configs
- For example, these are object detection models trained on Coco - fast_rcnn, retinanet etc.<br>
https://github.com/facebookresearch/detectron2/tree/master/configs/COCO-Detection

In [None]:
# How to choose a model
# ----------------------

# 1- Go to the Detectron2 "configs" page on Github.
# https://github.com/facebookresearch/detectron2/tree/master/configs

# 2- Click on the task folder e.g. if you want to do object detection then
# click on the "COCO-Detection" folder. You will see a list of all models
# that have been trained on the COCO dataset.

# 3- Choose a model and take note of its yaml file name.


# This is how to load a model.
# .............................
# Say you chose faster_rcnn_R_101_FPN_3x.yaml:
# Add the name of the model to the config
# (COCO-Detection is the Github folder where the model is stored.)

# from detectron2.config import get_cfg
# cfg = get_cfg()

# cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml"))
# cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml")


# This is another way to load a model but I don't know
# in what context this method would be used.
# Shown here: https://paperswithcode.com/lib/detectron2/faster-r-cnn

# from detectron2 import model_zoo
# model = model_zoo.get("COCO-Detection/faster_rcnn_R_50_C4_1x.yaml", trained=True)

## Train the model

In [None]:
# Create a config object.
cfg = get_cfg()

# Change the config
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("train_dataset",)
cfg.DATASETS.TEST = ()   # Not using the validation data during training.
cfg.DATALOADER.NUM_WORKERS = NUM_CORES
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml")
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025
cfg.SOLVER.MAX_ITER = 500
cfg.SOLVER.STEPS = []        
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # only has one class (ballon)

# Uncomment this line to see all fields in the config.
# print(cfg)

In [None]:
# Train the model
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg) 
trainer.resume_or_load(resume=False)
trainer.train()

## Make a prediction

In [None]:
# Change the config
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set the testing threshold for this model
cfg.DATASETS.TEST = ("val_dataset", )

# Create a predictor
predictor = DefaultPredictor(cfg)

In [None]:
# Get the metadata
val_dataset_dicts = val_dataset_function()
val_metadata = MetadataCatalog.get("val_dataset")

# Get an image to predict on.
# Change this number to select a different image from the val set.
i = 0

d = val_dataset_dicts[i] 
im = cv2.imread(d["file_name"])

# make a prediction
outputs = predictor(im)

outputs

In [None]:
# Visulaize the predicted bounding boxes

# Keep scale=1 in order to associate the predicted coords with the image below.
# If the scale is different the predicted coords won't match the bbox shown on
# the image below.
v = Visualizer(im[:, :, ::-1], metadata=val_metadata, scale=1) 
v = v.draw_instance_predictions(outputs["instances"].to("cpu"))


plt.figure(figsize = (14, 10))
plt.imshow(cv2.cvtColor(v.get_image()[:, :, ::-1], cv2.COLOR_BGR2RGB))
plt.show()

## Conclusion

On the plus side Detectron2 is easy to use and one can try different models. However, I don't feel comfortable using it as a black box. I prefer a more flexible and transparent training process where I understand what's going on - like in Pytorch. For example, how does one get the model to print out a validation loss during training?

According to the Dectectron2 docs it is in fact possible to control the entire training loop, but it's not yet clear to me how to set this up. If you find an example, please link to it in the comments below.

## References and Resources

- Balloon training code by Gilbert Tanner<br>
(Includes code to parse the original Balloon data json files.)<br>
https://github.com/TannerGilbert/Object-Detection-and-Image-Segmentation-with-Detectron2/blob/master/Detectron2_train_on_a_custom_dataset.ipynb

- VinBigData detectron2 train notebook by corochann<br>
(Includes info on data augmentation and on evaluation during training.)
https://www.kaggle.com/corochann/vinbigdata-detectron2-train

- Video: Detectron2 - Object Detection with PyTorch by Gilbert Tanner<br>
https://www.youtube.com/watch?v=CrEW8AfVlKQ

- Video: Using Machine Learning with Detectron2 by Facebook Open Source<br>
(Building an ‘I spy’ colour app.)<br>
https://www.youtube.com/watch?v=eUSgtfK4ivk

- Detectron2 github repo<br>
https://github.com/facebookresearch/detectron2

- Detecron2 docs<br>
https://detectron2.readthedocs.io/en/latest/tutorials/getting_started.html





Thank you for reading.