## YOLOv5 Model
Out of the many deep learning based object detection models, I chose to focus primarily on YOLO due to its fast predictive capabilities, which come with a tradeoff of high accuracy. Given the nature of my application and the context of its deployment, it seemed necessary to prioritize speed and space efficiency over high accuracy.

Moreover, YOLO is open-source. Ever since its [first introduction in 2015](https://arxiv.org/abs/1506.02640), the architecture has seen quite a lot of iterations over the years. The latest version (YOLOv8 released by Ultralytics this year) is currently considered a state-of-the-art model that has significantly boosted the capabilities of the previous version, making it faster and more accurate.

## You Only Look Once
YOLO is an acronym for "You Only Look Once", and it refers to the single-stage learning of the model . Unlike traditional CNN models which use a sliding window followed by classifiers that run multiple times over an image, YOLO uses a method called Non-Maximum Suppression that selects the best bounding boxes proposed by the model. This makes YOLO much faster than other object detection CNNs out there.


## Model Architecture
YOLO is a single-stage object detector, having the usual three parts in its architecture: the model backbone, the neck, and the head. The model backbone extracts important features from the input image, the neck generates feature pyramids which help identify the same object in different configurations, and the head performs the final detection.
![image](assets/YOLOv5_architecture.png)


## Model Outputs
For each image, YOLO outputs a vector containing bounding boxes, class IDs, and confidence scores. For object tracking, object detection algorithms like the Kalman filter can be used to identify the same object over multiple frames.

## Loss Function
YOLO uses Binary Cross-Entropy with Logit Loss [BCELoss](https://pytorch.org/docs/master/generated/torch.nn.BCEWithLogitsLoss.html). 

## Other Parameters
The authors of YOLO decided to go with Leaky ReLU as their activation function for most of the model, but used the classic Sigmoid function for the final detection layer. They also used SGD as their optimizer, although there is an option to change the optimizer to Adam if so desired by the user. 

## Citations
- [Ultralytics YOLOv5 - Github Page](https://github.com/ultralytics/yolov5)
- [PyTorch YOLOv5 - Docs](https://pytorch.org/hub/ultralytics_yolov5/)
- [J. Terven, D. Cordova-Esparaza, "*A Comprehensive Review of YOLO: From YOLOv1 and Beyond*", 2023](https://arxiv.org/pdf/2304.00501.pdf)