## Table of Contents
* [General information](#section_1_1)
* [Online presence](#section_2_1)
* [Review](#section_3_1)
* [Reproduction](#section_4_1)
* [Tests and results](#section_5_1)

### General information  <a class="anchor" id="section_1_1"></a>

<b>Article name</b>: You Only Look Once: Unified, Real-Time Object Detection </br>
<b>Subjects</b>: Computer Vision and Pattern Recognition </br>
<b>Authors</b>: Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi </br>

### Online presence  <a class="anchor" id="section_2_1"></a>

__[arxiv.org](https://arxiv.org/abs/1506.02640)__ </br>
__[springer.com](https://www.springer.com/journal/11263)__ </br>
__[ieeexplore.ieee.org](https://ieeexplore.ieee.org/xpl/conhome/1000147/all-proceedings)__

### Review  <a class="anchor" id="section_3_1"></a>

<b>Abstract</b>:</br>
This scientific paper addresses the problem of real-time object detection, suggesting an innovative model named YOLO (You Only Look Once). Unlike traditional object detection methods that require multiple stages such as region proposals and subsequent classification, YOLO conducts object detection as a single regression problem to spatially separated bounding boxes and associated class probabilities. </br></br>
<b>Key Points</b>:
 - <i>Unified Detection</i>: YOLO introduces a new architecture that simultaneously predicts bounding box coordinates and class probabilities for all objects in a single forward pass. This outperforms the traditional approaches.
 - <i>Real-Time Performance</i>: The new technique allows YOLO to achieve real-time performance, making it suitable for applications requiring fast and efficient object detection.
 - <i>End-to-End Model</i>: The model is trained end-to-end, optimizing the detection performance directly without relying on pre-trained models for different tasks.
 - <i>Grid-based Prediction</i>: The image is divided into a grid, and each grid cell predicts bounding boxes and class probabilities. This grid-based approach aids in capturing objects of different scales and locations.
 - <i>Bounding Box Encoding</i>: YOLO uses a novel encoding scheme for bounding boxes, allowing the model to predict bounding box dimensions and offsets directly, contributing to the simplicity and efficiency of the architecture.
 
<b>Methodology</b>:</br>
 - <i>Network Architecture</i>: The YOLO architecture consists of 24 convolutional layers followed by 2 fully connected layers. The final layer predicts bounding box coordinates and class probabilities.
 - <i>Loss Function</i>: YOLO employs a customized loss function that penalizes localization errors and classification errors. This loss function contributes to the end-to-end training of the model.
 - <i>Non-Maximum Suppression</i>: Post-processing involves non-maximum suppression to eliminate duplicate detections and improve the final output.
 
<b>Evaluation</b>:</br>
 - <i>Datasets</i>: The authors evaluate YOLO on standard object detection datasets, including VOC and COCO, demonstrating competitive performance compared to existing methods.
 - <i>Real-Time Performance</i>: YOLO achieves impressive real-time performance, with the ability to process images at 45 frames per second.
 
<b>Pros and Cons</b>:</br>
 - Pros:
    - Unified architecture simplifies the object detection pipeline.
    - Real-time performance suitable for applications like video analysis.
    - End-to-end training facilitates optimization for detection tasks.
 - Cons:
    - YOLO might struggle with small objects due to the fixed grid-based approach.
    - Precision might be sacrificed for real-time speed in certain scenarios.




### Reproduction  <a class="anchor" id="section_4_1"></a>

Notes: I personally managed to reproduce the results on Ubuntu Machine with the following kernel details: </br>
<i>uname -a</i></br>
Linux leonidg 5.4.0-150-generic #167~18.04.1-Ubuntu

The reproduction requires downloading a lot of data (nearly 300 MB). That is why it is not included in this notebook. Considering the limits I can provide only the results of my tests and output of a console

<b>Setup process</b>:</br>
<i>Step 1)</i> - update the packages and install the required ones:
 - sudo apt-get update
 - sudo apt-get install build-essential cmake git libopencv-dev libgtk-3-dev

<i>Step 2)</i> - clone the repo that contains YOLO implementation:
 - git clone https://github.com/AlexeyAB/darknet.git

<i>Step 3)</i> - make it executable:
 - cd darknet
 - make

<i>Step 4)</i> - download pre-trained YOLO weights file:
 - wget https://pjreddie.com/media/files/yolov3.weights

<i>Step 5)</i> - download default images or other from the Net

<i>Step 6)</i> - run YOLO on the images
 - ./darknet detect cfg/yolov3.cfg yolov3.weights image_file_name.jpg

<b>Example run</b>:</br>

### Tests and results <a class="anchor" id="section_5_1"></a>
During the tests, I have used both images included in the git repo and other downloaded from the Net :</br></br>
<i>Large images with easily detectable objects </i>:</br>

<table><tr>
<td> <img src="predictions_dog.jpg" style="width: 300px;"/> </td>
<td> <img src="predictions_random.jpg" style="width: 300px;"/> </td>
<td> <img src="predictions_bird.jpg" style="width: 300px;"/> </td>
</tr></table>


<i>Small images in which the objects may overlap </i>:</br>

<table><tr>
<td> <img src="predictions_3dogs.jpg" style="width: 350px;"/> </td>
<td> <img src="predictions_small.jpg" style="width: 350px;"/> </td>
</tr></table>