# <center>Object Detection</center>


# <center>What is Object Detection?</center>
Object detection is the process of finding and delineating specific objects within an image.  For example, finding which specific part of the image is a tumor or finding letters while someone is signing in American Sign Language.

* Example of detection of brain tumor.  The red boxes are the "ground truth."  The green boxes are the predicted region.
![title](./images/Brain_Tumor_Bounding_Boxes.JPG)

# <i>The biggest pain about object detection is labeling the images with bounding boxes!!!!</i>

I use something called "labelImg".  There are other programs that can be used, but I like this one.
https://github.com/tzutalin/labelImg



In [2]:
from IPython.display import Video
Video("./images/ASL_Compressed.mp4")

![title](./images/Red_Divider.JPG)

# </center>How does Object Detection Work?<center>

Image classification involves predicting the class of one object in an image. Object localization refers to identifying the location of one or more objects in an image and drawing abounding box around their extent. Object detection combines these two tasks and localizes and classifies one or more objects in an image.

From: https://machinelearningmastery.com/object-recognition-with-deep-learning/

## Faster-RCNN
* Very popular
* Very accurate
* Computationally expensive to train
* Inference is slow, especially as model gets bigger - typically needs a GPU

How it works - the simple version.  
![title](./images/Faster_RCNN.JPG)
Image from https://towardsdatascience.com/r-cnn-fast-r-cnn-faster-r-cnn-yolo-object-detection-algorithms-36d53571365e

This website is incredibly easy and helpful to set up an Object Detection pipeline using TensorFlow with Faster-RCNN.
https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10


## YOLO (You Only Look Once)
* Very popular
* Not as accurate as Faster-RCNN
* Not as computationally expensive to train
* Inference is very, very fast
* Struggles with small objects

How it works - A single CNN predicts the bounding boxes and the class probabilities for these boxes, rather than multiple CNNs.
![title](./images/YOLO_Image.JPG)
Image from https://towardsdatascience.com/r-cnn-fast-r-cnn-faster-r-cnn-yolo-object-detection-algorithms-36d53571365e


In [3]:
from IPython.display import Video
Video("./images/YOLO_Compressed.mp4")




Video from https://www.youtube.com/watch?time_continue=94&v=eeIEH2wjvhg&feature=emb_logo

## MobileNetV2
* Probably most common CNN used for inference on mobile devices
* Not as accurate as Faster-RCNN
* Not as computationally expensive to train
* Inference is very, very fast and light weight

How it works - Check out Google's blog for details - https://ai.googleblog.com/2018/04/mobilenetv2-next-generation-of-on.html
![title](./images/SSD_MobileNet_Image.png)

Image from https://ai.googleblog.com/2018/04/mobilenetv2-next-generation-of-on.html


In [4]:
from IPython.display import Video
Video("./images/SSD_Short_Video_Compressed.mp4")

# Comparison of YOLOv2, SSD MobileNet, and Faster-RCNN

In [5]:
from IPython.display import Video
Video("./images/Comparison_Short_Video_Compressed.mp4")

![title](./images/Red_Divider.JPG)

# <center>Transfer Learning</center>
A <i>pre-trained model</i> is a saved network that was previously trained on a large dataset, typically on a large-scale image-classification task. You either use the pretrained model as is or use transfer learning to customize this model to a given task.

The intuition behind transfer learning for image classification is that if a model is trained on a large and general enough dataset, this model will effectively serve as a generic model of the visual world. You can then take advantage of these learned feature maps without having to start from scratch by training a large model on a large dataset.

Many pretrained models can be downloaded from the Google Model Zoo.  https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md



Examples of some pretrained models datasets:
* COCO - Common Objects in Context (http://cocodataset.org/#home)
330K images (>200K labeled), 1.5 million object instances, 80 object categories, 91 stuff categories, 5 captions per image, 250,000 people with keypoints

* Open Images v4 (https://storage.googleapis.com/openimages/web/index.html)
Open Images is a dataset of ~9M images that have been annotated with image-level labels and object bounding boxes.  The training set of V4 contains 14.6M bounding boxes for 600 object classes on 1.74M images, making it the largest existing dataset with object location annotations. The boxes have been largely manually drawn by professional annotators to ensure accuracy and consistency. The images are very diverse and often contain complex scenes with several objects (8.4 per image on average). Moreover, the dataset is annotated with image-level labels spanning thousands of classes.


From https://www.tensorflow.org/tutorials/images/transfer_learning