<a href="https://colab.research.google.com/github/nyp-sit/sdaai-iti107/blob/main/session-5/od_using_tfod_api/object_detection_using_tfod_api.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" align="left"/></a>

# Object Detection using Tensorflow Object Detection API (aka TFOD API)

Welcome to the programming exercise of 'Object Detection using TFOD API'.  This notebook will walk you through, step by step, the process of using the TFOD API for object detection.

Before you can run the codes in this notebook, ensure the TFOD API has been installed. If you are using the lab machine or the cloud VM that is provided, the TFOD API has been already been installed. If you are using your own machine, make sure to follow the [TFOD API installation instructions](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2.md) before you start. 

Ensure that you are using Tensorflow > 2.2 environment (activate tf2env if you are using cloud VM)

*Credit: This notebook is adapted from the Object Detection Tutorial in the TFOD API.*

## 1. Imports

In [None]:
import numpy as np
import os
import sys
import tensorflow as tf
from matplotlib import pyplot as plt
from PIL import Image
from tqdm import tqdm
import time
import cv2

In [None]:
# Uncomment the following if you encountered the error message about cuDNN failed to initialize
# You need to run the this immediately after importing tensorflow library
from utils import fix_cudnn_bug

fix_cudnn_bug()

## 2. Environment setup

### TFOD API imports
Here are the imports of the required object detection modules in TFOD API

In [None]:
from object_detection.utils import ops as utils_ops
from object_detection.utils import label_map_util
from object_detection.utils import config_util
from object_detection.utils import visualization_utils as viz_utils
from object_detection.builders import model_builder

## 3. Model preparation 

### Choose the detection model

Any model exported using the `exporter_main_v2.py` tool of TFOD_API can be loaded here. We will cover the exporting tool in the next exercise when we do our own custom training.

By default we use an "SSD with Mobilenet" model here. See the [detection model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md) for a list of other models that can be run out-of-the-box with varying speeds and accuracies. Note the filename of the downloaded file is in the format of \<model name\>.tar.gz, e.g. *faster_rcnn_resnet50_coco_2018_01_28.tar.gz*. Change the variable *MODEL_NAME* below to the \<model name\>, e.g. *faster_rcnn_resnet50_coco_2018_01_28*. 



Now we download the pre-trained model from the model zoo and restore the model using keras api.

In [None]:
model_name = 'ssd_mobilenet_v2_320x320_coco17_tpu-8'
model_url = "http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_320x320_coco17_tpu-8.tar.gz"
model_dir = tf.keras.utils.get_file(
    fname=model_name, 
    origin=model_url,
    untar=True,
    cache_dir='.',
    cache_subdir='.')

In [None]:
pipeline_config = os.path.join(model_name, 'pipeline.config')
model_dir = os.path.join(model_name, 'checkpoint')
configs = config_util.get_configs_from_pipeline_file(pipeline_config)
model_config = configs['model']
detection_model = model_builder.build(
      model_config=model_config, is_training=False)

# Restore checkpoint
ckpt = tf.compat.v2.train.Checkpoint(
      model=detection_model)
ckpt.restore(os.path.join(model_dir, 'ckpt-0')).expect_partial()

def get_model_detection_function(model):
    """Get a tf.function for detection."""

    @tf.function
    def detect_fn(image):
        """Detect objects in image."""

        image, shapes = model.preprocess(image)
        prediction_dict = model.predict(image, shapes)
        detections = model.postprocess(prediction_dict, shapes)

        return detections, prediction_dict, tf.reshape(shapes, [-1])

    return detect_fn

detect_fn = get_model_detection_function(detection_model)

You will also need to provide the path to the appropriate label map file (explained later in 'Loading Label Map'). A list of label map files (with the file suffix .pbtxt) is provided in the `data` subfolder in the TFOD API object detection folder. So depending on the model you chose, copy the mapping file (.pbtxt) to appropriate working directory (e.g. the current directory of this notebook). In this lab, since we chose the model 'ssd_mobilenet_v2_320x320_coco17_tpu-8' which is trained on mscoco dataset, we will use the file 'mscoco_label_map.pbtxt'. This file has been copied to the current directory for your convenience. If you train your own custom detection model, you will need to provide your own label map file.

We will use the TFOD label_map_util to return dictionary mapping integers to appropriate string labels would be fine

In [None]:
PATH_TO_LABELS = 'mscoco_label_map.pbtxt'
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)

# let's print out a few entries to see what are the different objects we have
ids = [1, 50, 70]
for id in ids: 
    print(category_index[id])

### Loading label map
A 'Label map' maps indices to category names, so that when our convolution network predicts `5`, we know that this corresponds to `airplane`.  Here we use internal utility function of TFOD API, but anything that returns a dictionary mapping integers to appropriate string labels would be fine.

In [None]:
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)

## 4. Object Detection on Image

In [None]:
# This is needed to display the images.
%matplotlib inline

### Helper code

The image is read using Pillow as an Image object. Image.size gives the dimension of image as widht, height ordering. `Image.getdata()` gives a flattened array of bytes, so we need to reshape it to `(height, width, channels)`

For the models that are trained with TFOD API, some standard tensor names are used, e.g. num_detections, detection_boxes, 'detection_scores', 'detection_classes', etc. 

The following codes assume the presence of the following tensors 

- detection_boxes: coordinates of the detection boxes in the image.
- detection_scores: detection scores for the detection boxes in the image.
- detection_classes: detection-level class labels.
- num_detections: number of detections in the batch.

In our case, our training specifies maximum total detections (max_total_detections) of 100 and also maximum detections per class (max_detections_per_class) of 100, the output tensors for detection_scores, detection_classes are of the shape (?,100) and for the detection_boxes it is (?, 100, 4) where the 4 refer to the diagonal corners of the bounding box.

Here, we read the image file using pillow Image class.  Remember that our network always expect the tensors to be fed in batches, we need to add additional dimension as first axis, by calling np.expand_dims(x, axis=0).

We then call the detection function (`detect_fn`) obtained above to predict bounding boxes and classes.  We use the utility function provided by TFOD API: `visualization_utils.visualize_boxes_and_labels_on_image_array()` to draw the boxes on the image. We can control the score threshold for a box to be visualized by changing the `min_score_thresh` parameter value. 

If the label text is not clear or illegible, you may want to change the font used by the `visualize_boxes_and_labels_on_image_array()`. By default, it will try to load the font called arial.ttf and if there is an error in loading, it will then call `ImageFont.load_default()` and this default font may not be legible on certain platform (e.g. MacOS).  For more info on ImageFont, refers to [PIL documentation](https://pillow.readthedocs.io/en/stable/reference/ImageFont.html)


In [None]:
def predict(image_np):
    input_tensor = tf.convert_to_tensor(
        np.expand_dims(image_np, 0), dtype=tf.float32)
    detections, predictions_dict, shapes = detect_fn(input_tensor)
    boxes = detections['detection_boxes'][0].numpy()
    classes = detections['detection_classes'][0].numpy()
    scores = detections['detection_scores'][0].numpy()
    viz_utils.visualize_boxes_and_labels_on_image_array(
          image_np,
          boxes,
          (classes + 1).astype(int),
          scores,
          category_index,
          use_normalized_coordinates=True,
          max_boxes_to_draw=200,
          min_score_thresh=.50,
          agnostic_mode=False)

    display(Image.fromarray(image_np))

In [None]:
image = 'data/dog.jpg'
image_np = np.array(Image.open(image))
predict(image_np)

## 5. Object Detection on Video (Optional) 

The following codes will perform detection real-time on video. It reads the video frame one by one and perform detection and draw the bounding boxes on each frame (image) and then display the image frame directly using cv2.imshow()

Only run this when you are using a local computer, as the cv2 video player window is shown as a separate window on local computer, not within the notebook. 

In [None]:
def run_inference_for_video(video_filepath, detection_model):
    video_player = cv2.VideoCapture(video_filepath)
    while video_player.isOpened():
        ret, image_np = video_player.read()
        if ret:
            input_tensor = tf.convert_to_tensor(
            np.expand_dims(image_np, 0), dtype=tf.float32)
            detections, predictions_dict, shapes = detect_fn(input_tensor)
            viz_utils.visualize_boxes_and_labels_on_image_array(
                  image_np,
                  detections['detection_boxes'][0].numpy(),
                  (detections['detection_classes'][0].numpy()+1).astype(int),
                  detections['detection_scores'][0].numpy(),
                  category_index,
                  use_normalized_coordinates=True,
                  max_boxes_to_draw=200,
                  min_score_thresh=.50,
                  agnostic_mode=False)

            cv2.imshow('Object Detection', image_np)
            if cv2.waitKey(1) == 13: #13 is the Enter Key
                break
        else:
            break

    # Release camera and close windows
    video_player.release()
    cv2.destroyAllWindows() 
    cv2.waitKey(1)

In [None]:
video_in_file = 'data/tube.mp4'
run_inference_for_video(video_in_file, detection_model)

The following code is slightly modified to read the video file frame by frame and perform detection on the frame and write the detected frame to a video file usig VideoWriter class provided by openCV. 

In [None]:
def write_video(video_in_filepath, video_out_filepath, detection_model):
    if not os.path.exists(video_in_filepath):
        print('video filepath not valid')
    
    video_reader = cv2.VideoCapture(video_in_filepath)
    
    nb_frames = int(video_reader.get(cv2.CAP_PROP_FRAME_COUNT))
    frame_h = int(video_reader.get(cv2.CAP_PROP_FRAME_HEIGHT))
    frame_w = int(video_reader.get(cv2.CAP_PROP_FRAME_WIDTH))

    video_writer = cv2.VideoWriter(video_out_filepath,
                               cv2.VideoWriter_fourcc(*'XVID'), 
                               30.0, 
                               (frame_w, frame_h))

    for i in tqdm(range(nb_frames)):
        ret, image_np = video_reader.read()
        input_tensor = tf.convert_to_tensor(np.expand_dims(image_np, 0), dtype=tf.float32)
        detections, predictions_dict, shapes = detect_fn(input_tensor)
        viz_utils.visualize_boxes_and_labels_on_image_array(
                  image_np,
                  detections['detection_boxes'][0].numpy(),
                  (detections['detection_classes'][0].numpy()+1).astype(int),
                  detections['detection_scores'][0].numpy(),
                  category_index,
                  use_normalized_coordinates=True,
                  max_boxes_to_draw=200,
                  min_score_thresh=.50,
                  agnostic_mode=False)

        video_writer.write(np.uint8(image_np))
                
    # Release camera and close windows
    video_reader.release()
    video_writer.release() 
    cv2.destroyAllWindows() 
    cv2.waitKey(1)

Run this code to create a video file.

In [None]:
video_in_file = 'data/london.mp4'
video_out_file = 'data/london_detected.mp4'

write_video(video_in_file, video_out_file, detection_model)

The following code shows how to perform detection using the webcam. You can only run this from your own laptop/computer, not from the cloud VM.

In [None]:
cap = cv2.VideoCapture(0)

while True:
    # Read frame from camera
    ret, image_np = cap.read()
    input_tensor = tf.convert_to_tensor(np.expand_dims(image_np, 0), dtype=tf.float32)
    detections, predictions_dict, shapes = detect_fn(input_tensor)
    #input_tensor = tf.convert_to_tensor(np.expand_dims(image_np, 0))
    #print(input_tensor.shape)
    #detections  = detection_model(input_tensor)
    
    label_id_offset = 1
    #image_np_with_detections = image_np.copy()

    viz_utils.visualize_boxes_and_labels_on_image_array(
          image_np,
          detections['detection_boxes'][0].numpy(),
          (detections['detection_classes'][0].numpy() + label_id_offset).astype(int),
          detections['detection_scores'][0].numpy(),
          category_index,
          use_normalized_coordinates=True,
          max_boxes_to_draw=200,
          min_score_thresh=.5,
          agnostic_mode=False)

    # Display output
    cv2.imshow('object detection', cv2.resize(image_np, (800, 600)))

    # Check if 'q' key is hit, if yes, stop the video capture
    if cv2.waitKey(25) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()