# ENN583 Week 10 - Object Detection

This week's practical will explore how you can use 3 different object detectors from PyTorch to test an image, comparing how these detectors perform in different challenging conditions.

In particular, we'll compare:
* [Faster R-CNN](https://arxiv.org/pdf/1506.01497.pdf) -- one of the classic object detectors, a two-stage anchor-box architecture
* [RetinaNet](https://arxiv.org/pdf/1708.02002.pdf) -- a high performing one-stage anchor-box architecture
* [FCOS](https://arxiv.org/pdf/1904.01355.pdf) -- a popular one-stage anchor-free architecture 


## Step 1: Import necessary libraries

In [None]:
from torchvision.io.image import read_image
from torchvision.models.detection import fasterrcnn_resnet50_fpn_v2, FasterRCNN_ResNet50_FPN_V2_Weights, fcos_resnet50_fpn, FCOS_ResNet50_FPN_Weights, retinanet_resnet50_fpn_v2, RetinaNet_ResNet50_FPN_V2_Weights
from torchvision.utils import draw_bounding_boxes
from torchvision.transforms.functional import to_pil_image

import torch

from PIL import Image

from colormap import sample_colors

import matplotlib.pyplot as plt
from matplotlib.lines import Line2D

import numpy as np

import glob

## Step 2: Load the detectors from PyTorch

PyTorch has a number of pre-trained models available, including object detectors trained on COCO! 

You can see these [here](https://pytorch.org/vision/main/models.html#object-detection-instance-segmentation-and-person-keypoint-detection).

As mentioned, we will compare Faster R-CNN (v2), RetinaNet (v2), and FCOS. Compare the 'Box MAP' of each of these models as listed on the linked Pytorch page -- which model do you expect to perform the best? Also consider the model number of parameters, and GFLOPS -- is there a big difference?

Below, we have built off the pytorch example code to visualise how to use an object detector. Note the key steps:
1. Initialise the model, load the weights and set into 'eval' mode
    * Note how we need to set a box_score_thresh -- this is the minimum confidence threshold for a detection to be valid. Let's start with this low, and we can always filter detections later
2. Load the data transforms necessary -- these typically ensure the image is the correct size and is normalized
3. Preprocess the data using the data transform -- pretty self-explanatory
4. Test the data with the model

We're printing the output from the object detector -- how is it describing detections as an output? Can you recognise the different elements of the detection? How is the bounding box parameterised?

In [None]:
img = read_image("images/coco/000000017029.jpg")

# Step 1: Initialize model with the best available weights
weights_frcnn = FasterRCNN_ResNet50_FPN_V2_Weights.DEFAULT
frcnn = fasterrcnn_resnet50_fpn_v2(weights=weights_frcnn, box_score_thresh=0.2)
frcnn.eval()

# Step 2: Initialize the inference transforms
preprocess = weights_frcnn.transforms()

# Step 3: Apply inference preprocessing transforms
batch = [preprocess(img)]

# Step 4: Use the model and visualize the prediction
prediction = frcnn(batch)[0]

print(prediction)

## Step 3: Visualise the predictions

Below, I've created a **draw_detections** function that you can use to visualise results from the Pytorch detectors. You need to provide the image, the model prediction, and the name of the detector. Read through the function to understand how it works, and then visualise the results from the last test.

In [None]:
coco_categories = weights_frcnn.meta["categories"]

def draw_detections(img, prediction, detectorName = 'Faster R-CNN'):

    #extract the labels and scores -- combine these into text that can be printed on the image
    labels = [coco_categories[i] for i in prediction["labels"]]
    scores = [f'{int(100.*torch.round(s, decimals=2))}%' for s in prediction['scores']]
    print_txt = [f'{labels[i]}: {scores[i]}' for i in range(len(prediction['scores']))]

    # picks colors that are visually distinct to draw on the image
    color_list = sample_colors(len(scores), rgb = True)
    color_list_int = [(int(c[0]), int(c[1]), int(c[2])) for c in color_list]

    #creates a tensor image with the bboxes drawn on the image
    box = draw_bounding_boxes(img, boxes=prediction["boxes"],
                              labels=print_txt,
                              colors=color_list_int,
                              width=6, font_size=50)
    
    #put the image in format compatible with matplotlib
    box = box.numpy()
    box = np.swapaxes(box, 0, 1)
    box = np.swapaxes(box, 1, 2)
    
    #draw with matplotlib
    fig, ax = plt.subplots()
    ax.imshow(box)
    ax.set_title(detectorName)
    
    #text is hard to read -- create a custom legend
    custom_lines = []
    for c in color_list_int:
        c_plt = (c[0]/255, c[1]/255, c[2]/255)
        custom_lines += [Line2D([0], [0], color = c_plt, lw = 4)]
    fig.legend(custom_lines, print_txt, loc = 'outside right center')
                            
    plt.show()
    
draw_detections(img, prediction)

## Step 4: Compare to RetinaNet and FCOS

In the cell below, I've loaded the 3 models we want to compare. I've also created a loop that tests each of these models and then displays the results. 

Looking at the results -- what differences can you see between how the models behave for this image?

In [None]:
# Step 1: Initialize model with the best available weights
weights_frcnn = FasterRCNN_ResNet50_FPN_V2_Weights.DEFAULT
frcnn = fasterrcnn_resnet50_fpn_v2(weights=weights_frcnn, box_score_thresh=0.2)
frcnn.eval()

weights_fcos = FCOS_ResNet50_FPN_Weights.DEFAULT
fcos = fcos_resnet50_fpn(weights=weights_fcos, box_score_thresh=0.2)
fcos.eval()

weights_retinanet = RetinaNet_ResNet50_FPN_V2_Weights.DEFAULT
retinanet = retinanet_resnet50_fpn_v2(weights=weights_retinanet, box_score_thresh=0.2)
retinanet.eval()

models = {'F-RCNN': frcnn, 'RetinaNet': retinanet, 'FCOS': fcos}
weights = {'F-RCNN': weights_frcnn, 'RetinaNet': weights_retinanet, 'FCOS': weights_fcos}


img = read_image("images/coco/000000017029.jpg")

for modelName in models.keys():
    model = models[modelName]

    with torch.no_grad():
        preprocess = weights[modelName].transforms()

        batch = [preprocess(img)]

        prediction = model(batch)[0]

    draw_detections(img, prediction, modelName)
   

A couple of trends should be evident from the above:
1. We set a minimum confidence threshold for all detectors of 0.2 -- but RetinaNet is producing a huge number of detections with a confidence less than that? What's going on? As far as I can tell, this is a bug in the Pytorch code.
2. FCOS produces **way** more detections than Faster R-CNN with a confidence threshold above 0.2. 
3. Even though Faster R-CNN produces less detections than RetinaNet and FCOS, it is still producing some low confidence detections.

## Step 5: Adding reasonable confidence thresholds

**Your turn:** Looking at the results above, we need to pick a reasonable minimum confidence threshold that seems to keep the correct detections, and throw away the incorrect detections. This seems to be different for every detector. For each detector, choose a minimum reasonable confidence threshold, then go to the **draw_detections** function, and add an optional argument that lets you filter and only draw detections above this threshold. Run the cell above again until you seem to get as reasonable as possible results for all detectors.

## Step 6: Checking performance for multiple images

**Your turn:** Adapt the above code to test all images in the 'images/coco' folder. You can use this **glob.glob** function to achieve this.

Once again, compare the detectors. You may need to adjust the minimum reasonable confidence threshold for each detector. Consider:
* How do the confidence thresholds for detections differ between each detector?
* How is the localisation of each detection?
* How is the recall of each detector (e.g. it detects all the relevant objects)?
* How is the precision of each detector (e.g. it doesn't incorrectly detect any objects)?

## Step 7: Object Detection in the Wild!

**Your turn:** Pick an object category (e.g. dog is an easy one), and search the internet to find ***challenging*** examples of that object, e.g. weird viewpoints, unusual-looking examples, poor illumination, occlusions, cluttered scenes, small objects. Place these in the 'challenging' image folder, then test the detectors on these images and compare their performance. 

What happens? Does the detector still detect the object?