# ENN583 Week 9 - Object Detection

This week's practical will explore how you can use 3 different object detectors from PyTorch to test an image, comparing how these detectors perform in different challenging conditions.

In particular, we'll compare:
* [Faster R-CNN](https://arxiv.org/pdf/1506.01497.pdf) -- one of the classic object detectors, a two-stage anchor-box architecture
* [RetinaNet](https://arxiv.org/pdf/1708.02002.pdf) -- a high performing one-stage anchor-box architecture
* [FCOS](https://arxiv.org/pdf/1904.01355.pdf) -- a popular one-stage point-based architecture 


## Step 1: Import necessary libraries

In [None]:
from torchvision.io.image import read_image
from torchvision.models.detection import fasterrcnn_resnet50_fpn_v2, FasterRCNN_ResNet50_FPN_V2_Weights, fcos_resnet50_fpn, FCOS_ResNet50_FPN_Weights, retinanet_resnet50_fpn_v2, RetinaNet_ResNet50_FPN_V2_Weights
from torchvision.utils import draw_bounding_boxes
from torchvision.transforms.functional import to_pil_image
import torchvision

import torch

from PIL import Image

from colormap import sample_colors

import matplotlib.pyplot as plt
from matplotlib.lines import Line2D

import numpy as np

import glob

## Step 2: Load the detectors from PyTorch

PyTorch has a number of pre-trained models available, including object detectors trained on COCO! 

You can see these [here](https://pytorch.org/vision/main/models.html#object-detection-instance-segmentation-and-person-keypoint-detection).

As mentioned, we will compare Faster R-CNN (v2), RetinaNet (v2), and FCOS. Compare the 'Box MAP' of each of these models as listed on the linked Pytorch page -- which model do you expect to perform the best? Also consider the model number of parameters, and GFLOPS -- is there a big difference?

Below, we have built off the pytorch example code to visualise how to use an object detector. Note the key steps:
1. Initialise the model, load the weights and set into 'eval' mode
    * Note how we need to set a box_score_thresh -- this is the minimum confidence threshold for a detection to be valid. Let's start with this low, and we can always filter detections later
2. Load the data transforms necessary -- these typically ensure the image is the correct size and is normalized
3. Preprocess the data using the data transform -- pretty self-explanatory
4. Test the data with the model

We're printing the output from the object detector -- how is it describing detections as an output? Can you recognise the different elements of the detection? How is the bounding box parameterised?

In [None]:
img = read_image("images/coco/000000017029.jpg")

# Step 1: Initialize model with the best available weights
weights_frcnn = FasterRCNN_ResNet50_FPN_V2_Weights.DEFAULT
frcnn = fasterrcnn_resnet50_fpn_v2(weights=weights_frcnn, box_score_thresh=0.2)
frcnn.eval()

# Step 2: Initialize the inference transforms
preprocess = weights_frcnn.transforms()

# Step 3: Apply inference preprocessing transforms
batch = [preprocess(img)]

# Step 4: Use the model and visualize the prediction
prediction = frcnn(batch)[0]

print(prediction)

## Step 3: Visualise the predictions

Our prediction is a dictionary with a 'boxes', 'labels' and 'scores' field. Each detection has an entry in each field. The boxes are described as [xmin, ymin, xmax, ymax], and the labels are a class index. Below I'm printing the COCO class list in its entirety, as well as the labels predicted in this image.

In [None]:
coco_categories = weights_frcnn.meta["categories"]
print(coco_categories)

#for every detection, print the coco category that was predicted
print('Detected categories:')
for lbl in prediction['labels']:
    print(coco_categories[lbl])

Below, I've created a **draw_detections** function that you can use to visualise results from the Pytorch detectors. You need to provide the image, the model prediction, and the name of the detector. Read through the function to understand how it works, and then visualise the results from the last test.

In [None]:

def draw_detections(img, prediction = None, detectorName = 'F-RCNN'):
    if prediction != None:
        #extract the labels and scores -- combine these into text that can be printed on the image
        labels = [coco_categories[i] for i in prediction["labels"]]
        scores = [f'{int(100.*torch.round(s, decimals=2))}%' for s in prediction['scores']]
        print_txt = [f'{labels[i]}: {scores[i]}' for i in range(len(prediction['scores']))]
    
        # picks colors that are visually distinct to draw on the image
        color_list = sample_colors(len(scores), rgb = True)
        color_list_int = [(int(c[0]), int(c[1]), int(c[2])) for c in color_list]
    
        #creates a tensor image with the bboxes drawn on the image
        box = draw_bounding_boxes(img, boxes=prediction["boxes"],
                                  labels=print_txt,
                                  colors=color_list_int,
                                  width=6, font_size=50)
    else:
        #draw the raw image only
        box = img
    
    #put the image in format compatible with matplotlib
    box = box.numpy()
    box = np.swapaxes(box, 0, 1)
    box = np.swapaxes(box, 1, 2)
    
    #draw with matplotlib
    fig, ax = plt.subplots()
    ax.imshow(box)
    ax.set_title(detectorName)

    if prediction != None:
    #text is hard to read -- create a custom legend
        custom_lines = []
        for c in color_list_int:
            c_plt = (c[0]/255, c[1]/255, c[2]/255)
            custom_lines += [Line2D([0], [0], color = c_plt, lw = 4)]
        fig.legend(custom_lines, print_txt, loc = 'outside right center')
                            
    plt.show()
    
draw_detections(img, prediction, 'Faster R-CNN')

## Step 3: Testing over a set of images and extracting performance.

### Load in the ground-truth dog labels
To test performance, we need some ground-truth labels. Below I'm loading in a file that has the bounding box annotated for each dog in the image that we are trying to detect.

Examine the ground-truth file -- what is it's format?


In [None]:
import json

with open('dog_object_labels.json', 'r') as f:
    gt_labels = json.load(f)

print(gt_labels)

### Test over the images 
The cell below tests Faster R-CNN on all the images in the 'images/coco' folder. This is a selection of images of dogs, that also have other objects present too! 2 visualisations will be drawn -- first the ground-truth label for the dog in the image, and secondly the predictions from Faster R-CNN.

Look at the results, then adapt the code progressively to do the following: 

### **Your turn: Find all the TP and FN for class dog:** 
1. Calculate the IoU between all predicted bounding boxes and the ground-truth bounding box.
2. Check if any predicted boxes overlap the dog with IoU greater than 0.5 (you may find the pytorch [box_iou function](https://pytorch.org/vision/main/generated/torchvision.ops.box_iou.html) useful).
3. If no, this is a FN (undetected object).
4. If yes, check if any of the overlapping boxes have a class label that is 'dog'. If so, this is a TP (classified and localised dog).
5. If no, it is a FN (undetected object).

**How many TPs and how many FNs were there? Try setting the IoU threshold higher, to 0.9 -- does this change the TPs and FNs?**

_Note: we cannot identify FP's (spurious detections of objects that are not present) as we do not have labels for all the COCO class objects in the image. Does this sound familiar (hint Project 2)?_ 

### **Your turn: Find the IoU of all TPs for class dog:** 
Change the IoU threshold for TPs back to a reasonable value, like 0.5. 

Some TPs are more useful than others! Can you collect the IoU of all TPs? To do this, you'll need to check if there are multiple possible TPs (this is rare, but can happen). To do this:
1. Of all the IoUs that meet the score and IoU threshold, find the prediction with the maximum IoU
2. Save this IoU in a list.

### **Your turn: Adding a confidence threshold** 

We now need to pick a reasonable minimum confidence threshold that keeps the TP detections for dog, and throws away other potential false predictions. 
1. Collect the confidence scores of all TPs and select a threshold that will allow you to keep all TPs. It would probably be best to choose a somewhat conservative threshold (not the absolute lowest score of the TPs you have observed). Make sure to collect the score of the prediction with the highest IoU.
2. Threshold predictions using this new confidence score.

**How does this change the predictions that are drawn?**


In [None]:
#for every image in the coco folder
for idx, file_name in enumerate(glob.glob('images/coco/*.jpg')):
    #read in the image
    img = read_image(file_name)
    
    #make a prediction
    with torch.no_grad():
        
        batch = [preprocess(img)]

        prediction = frcnn(batch)[0]

    #draw the ground truth
    gt_bbox = gt_labels[file_name]
    #coco category 18 = dog
    ground_truth = {'scores': torch.Tensor([1]), 'labels': torch.Tensor([18]).int(), 'boxes': torch.Tensor([gt_bbox])}
    draw_detections(img, ground_truth, 'Ground-truth')

    #draw the prediction
    draw_detections(img, prediction, 'Faster R-CNN')


## Step 4: Compare to RetinaNet and FCOS

In the cell below, I've loaded the 2 other models we want to compare to. 

### Your turn: Test RetinaNet and FCOS

Follow the same process as above, but test with RetinaNet and FCOS.
**Note: you will need to find the correct conf_thresh for each detector again.**
Do you observe: 
- different numbers of TPs?
- different localisation qualities (IoU)?
- any differences in detector behaviour?

**Consider: How would you use this information to decide which detector to use?**

In [None]:
weights_fcos = FCOS_ResNet50_FPN_Weights.DEFAULT
fcos = fcos_resnet50_fpn(weights=weights_fcos, box_score_thresh=0.2)
fcos.eval()
fcos_preprocess = weights_fcos.transforms()

weights_retinanet = RetinaNet_ResNet50_FPN_V2_Weights.DEFAULT
retinanet = retinanet_resnet50_fpn_v2(weights=weights_retinanet, box_score_thresh=0.2)
retinanet.eval()
retinanet_preprocess = weights_retinanet.transforms()

In [None]:
## Test RetinaNet in this cell

In [None]:
#Test FCOS in this cell



## Step 5: Object Detection in the Wild!

**Your turn:** Test the detectors over all images in the images/challenging folder. This contains images of dogs in weird viewpoints, unusual-looking examples, poor illumination, occlusions, cluttered scenes, small objects. Note that there are no ground-truth labels for these images, but you can still visualise predictions to visually identify TPs.

**Consider:**
- What happens? Does the detector still detect the object?
- Has the confidence threshold required changed?
- How might this be relevant for Project 2?

In [None]:
for idx, file_name in enumerate(glob.glob('images/challenging/*')):
    #read in the image
    img = read_image(file_name)

    #test one of the models below
