#  Deep learning-based object detection


    Faster R-CNNs (Girshick et al., 2015)
    You Only Look Once (YOLO) (Redmon and Farhadi, 2015)
    Single Shot Detectors (SSDs) (Liu et al., 2015)


# Popular network architectures compatible with OpenCV 3.3 include:

    GoogleLeNet 
    AlexNet
    SqueezeNet
    VGGNet 
    ResNet


# OpenCV 3.3 supports the Caffe, TensorFlow, and Torch/PyTorch frameworks.

Faster R-CNNs are likely the most “heard of” method for object detection using deep learning; however, the technique can be difficult to understand (especially for beginners in deep learning), hard to implement, and challenging to train.

Furthermore, even with the “faster” implementation R-CNNs (where the “R” stands for “Region Proposal”) the algorithm can be quite slow, on the order of 7 FPS.

If we are looking for pure speed then we tend to use YOLO as this algorithm is much faster, capable of processing 40-90 FPS on a Titan X GPU. The super fast variant of YOLO can even get up to 155 FPS.

The problem with YOLO is that it leaves much accuracy to be desired.

SSDs, originally developed by Google, are a balance between the two. The algorithm is more straightforward (and I would argue better explained in the original seminal paper) than Faster R-CNNs.

When building object detection networks we normally use an existing network architecture, such as VGG or ResNet, and then use it inside the object detection pipeline. The problem is that these network architectures can be very large in the order of 200-500MB.

# MobileNets




Network architectures such as these are unsuitable for resource constrained devices due to their sheer size and resulting number of computations.

Instead, we can use MobileNets (Howard et al., 2017), another paper by Google researchers. We call these networks “MobileNets” because they are designed for resource constrained devices such as your smartphone. MobileNets differ from traditional CNNs through the usage of depthwise separable convolution 

Mobilenet as it is lightweight in its architecture. It uses depthwise separable convolutions which basically means it performs a single convolution on each colour channel rather than combining all three and flattening it. This has the effect of filtering the input channels. Or as the authors of the paper explain clearly: “ For MobileNets the depthwise convolution applies a single filter to each input channel. The pointwise convolution then applies a 1×1 convolution to combine the outputs the depthwise convolution. A standard convolution both filters and combines inputs into a new set of outputs in one step. The depthwise separable convolution splits this into two layers, a separate layer for filtering and a separate layer for combining. This factorization has the effect of drastically reducing computation and model size. ”

![image.png](attachment:image.png)

# SSD (Single Shot Detector)

Single Shot object detection or SSD takes one single shot to detect multiple objects within the image. As you can see in the below image we are detecting coffee, iPhone, notebook, laptop and glasses a

![image.png](attachment:image.png)

It composes of two parts

    Extract feature maps, and
    Apply convolution filter to detect objects

SSD is faster than R-CNN because in R-CNN we need two shots one for generating region proposals and one for detecting objects whereas in SSD It can be done in a single shot.

We are using MobileNet-SSD (it is a caffe implementation of MobileNet-SSD detection network )
The MobileNet SSD was first trained on the COCO dataset (Common Objects in Context) and was then fine-tuned on PASCAL VOC reaching 72.7% mAP (mean average precision).

We can therefore detect 20 objects in images (+1 for the background class), including airplanes, bicycles, birds, boats, bottles, buses, cars, cats, chairs, cows, dining tables, dogs, horses, motorbikes, people, potted plants, sheep, sofas, trains, and tv monitors.

# MobileNet-SSD

detection objects using pre-trained models with SSD method. we use pre-trained models MobileNets to impliment with SSD Model.