# YOLO Object Detection using OpenCV

[Prashant Brahmbhatt](https://www.github.com/hashbanger)

____

## Why CNNs aren't good enough!

First of all, why do we need other image detection algorithms if we already had **Convolutional Neural Networks**?  
As you can guess, to overcome the disadvantages of the traditional CNNs, some of them are:  
- High computational cost.
- If you don't have a good GPU they are quite slow to train (for complex tasks).
- They use to need a lot of training data.  
- CNNs depend on the initial parameter tuning (for a good point) to avoid local optima.

But we do have R-CNNs, Faster R-Cnns as well don't we?  
Although, they are much better implemented than vanilla CNNs by using Region Proposal Algorithm which could do localization and convolution classification, they are still quite slow sadly!

The CNN are good at image classification that requires a single class associated with an image however in real life scenarios that's not good enough! We require detection of multiple objects in an image and also where are they located, termed as **Object Detection** and **Object Localization**.

If you're confused about image classification, object detection, segmentation have a look at this given image.  
![img1](img1.png)


## The YOLO Approach (You Look Only Once)

As the original papers cites, the object detection problem is reframed as a regression problem. YOLO trains on full images and directly optimizes detection performance. It doesn't requires a complex pipeline.  
Unlike the sliding window technique it looks at the image only once hence the name. It implcitily encodes textual information about the classes and their appearance. 
The YOLO sees the entire image at once and gets the entire context of the image and makes rare background errors.   
The YOLO is a highly generalizable approach it is less prone to bad performance for unexpected inputs or unknown domains.  

### Working
- The YOLO divides the image in $S x S$ grid, the if the center of an object lies in a grid then that grid becomes responsible for predicting the class of that object. **(Image 1)**
- Each of the grid is responsible of predicting some $B$ bounding boxes and confidence score for those boxes to show how sure the model is about any particular object. The score doesn't indicate what kind of object it is rather if it contains some object. If there is no object then the confidence should be zero (duh!).
- Each bounding box is consists of 5 predictions $x,y,w,h$ where the (x,y) are the coordinates of the center of the box relative to the bounds of the cell. The w, h are the width and the height which are predicted relative to the whole image.  
- When we visualise all of the predictions we get a bunch of bounding boxes around each object and the thickness of the box depends on the confidence score for that object. **(Image 2)**
-  Each grid cell predicts the class probabilities. Given that it's an object, the conditional probabilities for each class of the object.
- It predicts only one set of class probabilities per grid cell regardless of $B$. So if the grid predicts a *Dog* that doesn't mean that it contains a Dog but rather if that grid contains an object then most probably it is a dog. **(Image 3)** Then at test time it multiplies multiple conditional class probabilities and the individual box confidence predictions.

![img3](img3.png)
Where $IOU$ is the ***"Intersection of Union"***

The output scores not only encodes the probability of the class fitting the box but also how well the box fits the object.

- We then have a lot of predictions which can include multiple predictions for the same object by different grids with different threshold values so we use ***Non Max Suppresion***. NMS in a nutshell suppress or discards bounding boxes with confidence score less than a selected threshold and then further discards the ones that are left which do not have maximum values, hence the name. **(Image 4)**


![img2](img2.png)

references:    
https://arxiv.org/pdf/1506.02640v5.pdf (The original Paper)  
https://www.pjreddie.com  
https://www.stackoverflow.com  
https://www.medium.com  
https://www.pyimagesearch.com      


### de nada!