# Object Detection 

{{ badge }}




Object detection is a computer vision technique for locating instances of objects in images or videos. Object detection algorithms typically leverage machine learning or deep learning to produce meaningful results. The goal of object detection is to replicate this intelligence using a computer.

## Imports and Setup

Let's start with our basic imports

## Visualization tools

To visualize the images with the proper detected boxes, keypoints and segmentation, we will use the TensorFlow Object Detection API. To install it we will clone the repo.

Now we can import the dependencies we will need later

In [1]:
import os
import pathlib

import matplotlib
import matplotlib.pyplot as plt

import io
import scipy.misc
import numpy as np
from six import BytesIO
from PIL import Image, ImageDraw, ImageFont
from six.moves.urllib.request import urlopen

import tensorflow as tf
import tensorflow_hub as hub

In [2]:
# Clone the tensorflow models repository
!git clone --depth 1 https://github.com/tensorflow/models

Cloning into 'models'...
remote: Enumerating objects: 3251, done.[K
remote: Counting objects: 100% (3251/3251), done.[K
remote: Compressing objects: 100% (2553/2553), done.[K
remote: Total 3251 (delta 886), reused 1551 (delta 650), pack-reused 0[K
Receiving objects: 100% (3251/3251), 33.44 MiB | 18.68 MiB/s, done.
Resolving deltas: 100% (886/886), done.


Intalling the Object Detection API

In [3]:
%%bash
sudo apt install -y protobuf-compiler
cd models/research/
protoc object_detection/protos/*.proto --python_out=.
cp object_detection/packages/tf2/setup.py .
python -m pip install .


Reading package lists...
Building dependency tree...
Reading state information...
protobuf-compiler is already the newest version (3.0.0-9.1ubuntu1).
The following packages were automatically installed and are no longer required:
  cuda-command-line-tools-10-0 cuda-command-line-tools-10-1
  cuda-command-line-tools-11-0 cuda-compiler-10-0 cuda-compiler-10-1
  cuda-compiler-11-0 cuda-cuobjdump-10-0 cuda-cuobjdump-10-1
  cuda-cuobjdump-11-0 cuda-cupti-10-0 cuda-cupti-10-1 cuda-cupti-11-0
  cuda-cupti-dev-11-0 cuda-documentation-10-0 cuda-documentation-10-1
  cuda-documentation-11-0 cuda-documentation-11-1 cuda-gdb-10-0 cuda-gdb-10-1
  cuda-gdb-11-0 cuda-gpu-library-advisor-10-0 cuda-gpu-library-advisor-10-1
  cuda-libraries-10-0 cuda-libraries-10-1 cuda-libraries-11-0
  cuda-memcheck-10-0 cuda-memcheck-10-1 cuda-memcheck-11-0 cuda-nsight-10-0
  cuda-nsight-10-1 cuda-nsight-11-0 cuda-nsight-11-1 cuda-nsight-compute-10-0
  cuda-nsight-compute-10-1 cuda-nsight-compute-11-0 cuda-nsight-comput



  DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
   pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
yellowbrick 1.3.post1 requires numpy<1.20,>=1.16.0, but you have numpy 1.21.5 which is incompatible.
multiprocess 0.70.12.2 requires dill>=0.3.4, but you have dill 0.3.1.1 which is incompatible.
gym 0.17.3 requires cloudpickle<1.7.0,>=1.2.0, but you have cloudpickle 2.0.0 which is incompatible.
google-colab 1.0.0 requires requests~=2.23.0, but you have requests 2.27.1 which is incompatible.
datascience 0.10.6 requires foliu

In [4]:
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as viz_utils
from object_detection.utils import ops as utils_ops

%matplotlib inline

## Utilities

Run the below cell to create some utils that will be needed later:

- Helper method to load an image
- Map of Model Name to TF Hub handle
- List of tuples with Human Keypoints for the COCO 2017 dataset. This is needed for models with keypoints.

In [5]:
def load_image_into_numpy_array(path):
    """Load an image from file into a numpy array.

    Puts image into numpy array to feed into tensorflow graph.
    Note that by convention we put it into a numpy array with shape
    (height, width, channels), where channels=3 for RGB.

    Args:
      path: the file path to the image

    Returns:
      uint8 numpy array with shape (img_height, img_width, 3)
    """
    image = None
    if path.startswith("http"):
        response = urlopen(path)
        image_data = response.read()
        image_data = BytesIO(image_data)
        image = Image.open(image_data)
    else:
        image_data = tf.io.gfile.GFile(path, "rb").read()
        image = Image.open(BytesIO(image_data))

    (im_width, im_height) = image.size
    return (
        np.array(image.getdata()).reshape((1, im_height, im_width, 3)).astype(np.uint8)
    )


ALL_MODELS = {
    "CenterNet HourGlass104 512x512": "https://tfhub.dev/tensorflow/centernet/hourglass_512x512/1",
    "CenterNet HourGlass104 Keypoints 512x512": "https://tfhub.dev/tensorflow/centernet/hourglass_512x512_kpts/1",
    "CenterNet HourGlass104 1024x1024": "https://tfhub.dev/tensorflow/centernet/hourglass_1024x1024/1",
    "CenterNet HourGlass104 Keypoints 1024x1024": "https://tfhub.dev/tensorflow/centernet/hourglass_1024x1024_kpts/1",
    "CenterNet Resnet50 V1 FPN 512x512": "https://tfhub.dev/tensorflow/centernet/resnet50v1_fpn_512x512/1",
    "CenterNet Resnet50 V1 FPN Keypoints 512x512": "https://tfhub.dev/tensorflow/centernet/resnet50v1_fpn_512x512_kpts/1",
    "CenterNet Resnet101 V1 FPN 512x512": "https://tfhub.dev/tensorflow/centernet/resnet101v1_fpn_512x512/1",
    "CenterNet Resnet50 V2 512x512": "https://tfhub.dev/tensorflow/centernet/resnet50v2_512x512/1",
    "CenterNet Resnet50 V2 Keypoints 512x512": "https://tfhub.dev/tensorflow/centernet/resnet50v2_512x512_kpts/1",
    "EfficientDet D0 512x512": "https://tfhub.dev/tensorflow/efficientdet/d0/1",
    "EfficientDet D1 640x640": "https://tfhub.dev/tensorflow/efficientdet/d1/1",
    "EfficientDet D2 768x768": "https://tfhub.dev/tensorflow/efficientdet/d2/1",
    "EfficientDet D3 896x896": "https://tfhub.dev/tensorflow/efficientdet/d3/1",
    "EfficientDet D4 1024x1024": "https://tfhub.dev/tensorflow/efficientdet/d4/1",
    "EfficientDet D5 1280x1280": "https://tfhub.dev/tensorflow/efficientdet/d5/1",
    "EfficientDet D6 1280x1280": "https://tfhub.dev/tensorflow/efficientdet/d6/1",
    "EfficientDet D7 1536x1536": "https://tfhub.dev/tensorflow/efficientdet/d7/1",
    "SSD MobileNet v2 320x320": "https://tfhub.dev/tensorflow/ssd_mobilenet_v2/2",
    "SSD MobileNet V1 FPN 640x640": "https://tfhub.dev/tensorflow/ssd_mobilenet_v1/fpn_640x640/1",
    "SSD MobileNet V2 FPNLite 320x320": "https://tfhub.dev/tensorflow/ssd_mobilenet_v2/fpnlite_320x320/1",
    "SSD MobileNet V2 FPNLite 640x640": "https://tfhub.dev/tensorflow/ssd_mobilenet_v2/fpnlite_640x640/1",
    "SSD ResNet50 V1 FPN 640x640 (RetinaNet50)": "https://tfhub.dev/tensorflow/retinanet/resnet50_v1_fpn_640x640/1",
    "SSD ResNet50 V1 FPN 1024x1024 (RetinaNet50)": "https://tfhub.dev/tensorflow/retinanet/resnet50_v1_fpn_1024x1024/1",
    "SSD ResNet101 V1 FPN 640x640 (RetinaNet101)": "https://tfhub.dev/tensorflow/retinanet/resnet101_v1_fpn_640x640/1",
    "SSD ResNet101 V1 FPN 1024x1024 (RetinaNet101)": "https://tfhub.dev/tensorflow/retinanet/resnet101_v1_fpn_1024x1024/1",
    "SSD ResNet152 V1 FPN 640x640 (RetinaNet152)": "https://tfhub.dev/tensorflow/retinanet/resnet152_v1_fpn_640x640/1",
    "SSD ResNet152 V1 FPN 1024x1024 (RetinaNet152)": "https://tfhub.dev/tensorflow/retinanet/resnet152_v1_fpn_1024x1024/1",
    "Faster R-CNN ResNet50 V1 640x640": "https://tfhub.dev/tensorflow/faster_rcnn/resnet50_v1_640x640/1",
    "Faster R-CNN ResNet50 V1 1024x1024": "https://tfhub.dev/tensorflow/faster_rcnn/resnet50_v1_1024x1024/1",
    "Faster R-CNN ResNet50 V1 800x1333": "https://tfhub.dev/tensorflow/faster_rcnn/resnet50_v1_800x1333/1",
    "Faster R-CNN ResNet101 V1 640x640": "https://tfhub.dev/tensorflow/faster_rcnn/resnet101_v1_640x640/1",
    "Faster R-CNN ResNet101 V1 1024x1024": "https://tfhub.dev/tensorflow/faster_rcnn/resnet101_v1_1024x1024/1",
    "Faster R-CNN ResNet101 V1 800x1333": "https://tfhub.dev/tensorflow/faster_rcnn/resnet101_v1_800x1333/1",
    "Faster R-CNN ResNet152 V1 640x640": "https://tfhub.dev/tensorflow/faster_rcnn/resnet152_v1_640x640/1",
    "Faster R-CNN ResNet152 V1 1024x1024": "https://tfhub.dev/tensorflow/faster_rcnn/resnet152_v1_1024x1024/1",
    "Faster R-CNN ResNet152 V1 800x1333": "https://tfhub.dev/tensorflow/faster_rcnn/resnet152_v1_800x1333/1",
    "Faster R-CNN Inception ResNet V2 640x640": "https://tfhub.dev/tensorflow/faster_rcnn/inception_resnet_v2_640x640/1",
    "Faster R-CNN Inception ResNet V2 1024x1024": "https://tfhub.dev/tensorflow/faster_rcnn/inception_resnet_v2_1024x1024/1",
    "Mask R-CNN Inception ResNet V2 1024x1024": "https://tfhub.dev/tensorflow/mask_rcnn/inception_resnet_v2_1024x1024/1",
    "Yolo": "https://tfhub.dev/rishit-dagli/yolo-cppe5/1",
}

IMAGES_FOR_TEST = {
    "Beach": "models/research/object_detection/test_images/image2.jpg",
    "Dogs": "models/research/object_detection/test_images/image1.jpg",
    # By Heiko Gorski, Source: https://commons.wikimedia.org/wiki/File:Naxos_Taverna.jpg
    "Naxos Taverna": "https://upload.wikimedia.org/wikipedia/commons/6/60/Naxos_Taverna.jpg",
    # Source: https://commons.wikimedia.org/wiki/File:The_Coleoptera_of_the_British_islands_(Plate_125)_(8592917784).jpg
    "Beatles": "https://upload.wikimedia.org/wikipedia/commons/1/1b/The_Coleoptera_of_the_British_islands_%28Plate_125%29_%288592917784%29.jpg",
    # By Américo Toledano, Source: https://commons.wikimedia.org/wiki/File:Biblioteca_Maim%C3%B3nides,_Campus_Universitario_de_Rabanales_007.jpg
    "Phones": "https://upload.wikimedia.org/wikipedia/commons/thumb/0/0d/Biblioteca_Maim%C3%B3nides%2C_Campus_Universitario_de_Rabanales_007.jpg/1024px-Biblioteca_Maim%C3%B3nides%2C_Campus_Universitario_de_Rabanales_007.jpg",
    # Source: https://commons.wikimedia.org/wiki/File:The_smaller_British_birds_(8053836633).jpg
    "Birds": "https://upload.wikimedia.org/wikipedia/commons/0/09/The_smaller_British_birds_%288053836633%29.jpg",
}

### Load label map data (for plotting).

Label maps correspond index numbers to category names, so that when our convolution network predicts `5`, we know that this corresponds to `airplane`.  Here we use internal utility functions, but anything that returns a dictionary mapping integers to appropriate string labels would be fine.

We are going, for simplicity, to load from the repository that we loaded the Object Detection API code

In [6]:
PATH_TO_LABELS = "./models/research/object_detection/data/mscoco_label_map.pbtxt"
category_index = label_map_util.create_category_index_from_labelmap(
    PATH_TO_LABELS, use_display_name=True
)

## Build a detection model and load pre-trained model weights

Here we will choose which Object Detection model we will use.
Select the architecture and it will be loaded automatically.
If you want to change the model to try other architectures later, just change the below cell and execute following ones.

**Tip:** if you want to read more details about the selected model, you can follow the link (model handle) and read additional documentation on TF Hub. After you select a model, we will print the handle to make it easier.

In [12]:
model_display_name = [
    "CenterNet HourGlass104 512x512",
    "CenterNet HourGlass104 Keypoints 512x512",
    "CenterNet HourGlass104 1024x1024",
    "CenterNet HourGlass104 Keypoints 1024x1024",
    "CenterNet Resnet50 V1 FPN 512x512",
    "CenterNet Resnet50 V1 FPN Keypoints 512x512",
    "CenterNet Resnet101 V1 FPN 512x512",
    "CenterNet Resnet50 V2 512x512",
    "CenterNet Resnet50 V2 Keypoints 512x512",
    "EfficientDet D0 512x512",
    "EfficientDet D1 640x640",
    "EfficientDet D2 768x768",
    "EfficientDet D3 896x896",
    "EfficientDet D4 1024x1024",
    "EfficientDet D5 1280x1280",
    "EfficientDet D6 1280x1280",
    "EfficientDet D7 1536x1536",
    "SSD MobileNet v2 320x320",
    "SSD MobileNet V1 FPN 640x640",
    "SSD MobileNet V2 FPNLite 320x320",
    "SSD MobileNet V2 FPNLite 640x640",
    "SSD ResNet50 V1 FPN 640x640 (RetinaNet50)",
    "SSD ResNet50 V1 FPN 1024x1024 (RetinaNet50)",
    "SSD ResNet101 V1 FPN 640x640 (RetinaNet101)",
    "SSD ResNet101 V1 FPN 1024x1024 (RetinaNet101)",
    "SSD ResNet152 V1 FPN 640x640 (RetinaNet152)",
    "SSD ResNet152 V1 FPN 1024x1024 (RetinaNet152)",
    "Faster R-CNN ResNet50 V1 640x640",
    "Faster R-CNN ResNet50 V1 1024x1024",
    "Faster R-CNN ResNet50 V1 800x1333",
    "Faster R-CNN ResNet101 V1 640x640",
    "Faster R-CNN ResNet101 V1 1024x1024",
    "Faster R-CNN ResNet101 V1 800x1333",
    "Faster R-CNN ResNet152 V1 640x640",
    "Faster R-CNN ResNet152 V1 1024x1024",
    "Faster R-CNN ResNet152 V1 800x1333",
    "Faster R-CNN Inception ResNet V2 640x640",
    "Faster R-CNN Inception ResNet V2 1024x1024",
    "Mask R-CNN Inception ResNet V2 1024x1024",
]
model_handle = ALL_MODELS["Faster R-CNN Inception ResNet V2 1024x1024"]

print("Selected model:" + "Faster R-CNN Inception ResNet V2 1024x1024")
print("Model Handle at TensorFlow Hub: {}".format(model_handle))

Selected model:Faster R-CNN Inception ResNet V2 1024x1024
Model Handle at TensorFlow Hub: https://tfhub.dev/tensorflow/faster_rcnn/inception_resnet_v2_1024x1024/1


## Loading the selected model from TensorFlow Hub

Here we just need the model handle that was selected and use the Tensorflow Hub library to load it to memory.


In [13]:
print("loading model...")
hub_model = hub.load(model_handle)
print("model loaded!")

loading model...
model loaded!


## Loading an image

Let's try the model on a simple image. To help with this, we provide a list of test images.

Here are some simple things to try out if you are curious:
* Try running inference on your own images, just upload them to colab and load them in the same way it's done in the cell below.
* Modify some of the input images and see if detection still works.  Some simple things to try out here include flipping the image horizontally, or converting to grayscale (note that we still expect the input image to have 3 channels).

**Be careful:** when using images with an alpha channel, the model expect 3 channels images and the alpha will count as a 4th.



In [14]:
selected_image = ["Beach", "Dogs", "Naxos Taverna", "Beatles", "Phones", "Birds"]
flip_image_horizontally = False
convert_image_to_grayscale = False

image_path = IMAGES_FOR_TEST[selected_image[0]]
image_np = load_image_into_numpy_array(image_path)

# Flip horizontally
if flip_image_horizontally:
    image_np[0] = np.fliplr(image_np[0]).copy()

# Convert image to grayscale
if convert_image_to_grayscale:
    image_np[0] = np.tile(np.mean(image_np[0], 2, keepdims=True), (1, 1, 3)).astype(
        np.uint8
    )

plt.figure(figsize=(24, 32))
plt.imshow(image_np[0])
plt.show()

Output hidden; open in https://colab.research.google.com to view.

## Implementing the inference

To implementing the inference, we just need to call our TF Hub loaded model.

Things you can try:
* Print out `result['detection_boxes']` and try to match the box locations to the boxes in the image.  Notice that coordinates are given in normalized form (i.e., in the interval [0, 1]).
* inspect other output keys present in the result. A full documentation can be seen on the models documentation page (pointing your browser to the model handle printed earlier)

In [15]:
# implement inference
results = hub_model(image_np)

# different object detection models have additional results
# all of them are explained in the documentation
result = {key: value.numpy() for key, value in results.items()}
print(result)

{'detection_classes': array([[ 1.,  1., 38., 38., 38., 38., 38.,  1.,  1., 38.,  1.,  1.,  1.,
         1.,  1.,  1.,  1., 38.,  1.,  1.,  1., 16.,  1., 16.,  1., 38.,
         1., 38.,  1.,  1., 16.,  1.,  1., 16.,  1.,  1.,  1., 42.,  1.,
        38.,  1., 16., 38., 42., 16.,  1.,  1.,  1.,  1.,  1.,  1., 16.,
        16., 42.,  1., 42.,  1.,  1.,  1.,  1.,  1.,  1.,  1., 38.,  1.,
        42.,  9., 42., 38., 20.,  1., 16., 18.,  1.,  1., 42., 42., 38.,
         1.,  1., 16., 16., 16., 16.,  1., 37., 38., 42., 21.,  1., 41.,
        16.,  1., 38., 16.,  1., 42., 10., 38., 27.]], dtype=float32), 'raw_detection_scores': array([[[8.1177830e-04, 1.2964295e-05, 3.9066081e-06, ...,
         2.6630867e-06, 8.1472774e-07, 9.4400173e-07],
        [6.2566715e-01, 3.7213042e-01, 1.5195325e-04, ...,
         5.2963505e-06, 1.1403456e-06, 2.4184349e-06],
        [3.7567262e-02, 9.6172601e-01, 1.3892446e-05, ...,
         1.4677748e-06, 3.9479991e-07, 3.2718205e-07],
        ...,
        [9.942417

## Visualizing the results

Here is where we will need the TensorFlow Object Detection API to show the squares from the inference step (and the keypoints when available).

the full documentation of this method can be seen [here](https://github.com/tensorflow/models/blob/master/research/object_detection/utils/visualization_utils.py)

Here you can, for example, set `min_score_thresh` to other values (between 0 and 1) to allow more detections in or to filter out more detections.

In [16]:
label_id_offset = 0
image_np_with_detections = image_np.copy()


viz_utils.visualize_boxes_and_labels_on_image_array(
    image_np_with_detections[0],
    result["detection_boxes"][0],
    (result["detection_classes"][0] + label_id_offset).astype(int),
    result["detection_scores"][0],
    category_index,
    use_normalized_coordinates=True,
    max_boxes_to_draw=200,
    min_score_thresh=0.30,
    agnostic_mode=False,
)

plt.figure(figsize=(24, 32))
plt.imshow(image_np_with_detections[0])
plt.show()

Output hidden; open in https://colab.research.google.com to view.