# Object Detection

Object detection is an important computer vision task used to detect instances of visual objects of certain classes (for example, humans, animals, cars, or buildings) in digital images such as photos or video frames. The goal of object detection is to develop computational models that provide the most fundamental information needed by computer vision applications: "What objects are where?"

Object detection is one of the fundamental problems of computer vision. It forms the basis of many other downstream computer vision tasks, for example, instance segmentation, image captioning, object tracking, and more. Specific object detection applications include pedestrian detection, people counting, face detection, text detection, pose detection, or number-plate recognition.

### Milestones in state-of-the-art Object Detection

The field of object detection is not as new as it may seem. In fact, object detection has evolved over the past 20 years. The progress of object detection is usually separated into two separate historical periods (before and after the introduction of Deep Learning):

Before 2014 – Traditional Object Detection period

+ Viola-Jones Detector (2001), the pioneering work that started the development of traditional object detection methods
+ HOG Detector (2006), a popular feature descriptor for object detection in computer vision and image processing
+ DPM (2008) with the first introduction of bounding box regression

After 2014 – Deep Learning Detection period

Most important two-stage object detection algorithms

+ RCNN and SPPNet (2014)
+ Fast RCNN and Faster RCNN (2015)
+ Mask R-CNN (2017)
+ Pyramid Networks/FPN (2017)
+ G-RCNN (2021)

Most important one-stage object detection algorithms

+ YOLO (2016)
+ SSD (2016)
+ RetinaNet (2017)
+ YOLOv3 (2018)
+ YOLOv4 (2020)
+ YOLOR (2021)

To understand which algorithm is the best for a given use case, it is important to understand the main characteristics. First, we will look into the key differences of the relevant image recognition algorithms for object detection before discussing the individual algorithms.

### One-stage vs. two-stage deep learning object detectors

As you can see in the list above, the state-of-the-art object detection methods can be categorized into two main types: One-stage vs. two-stage object detectors.

In general, deep learning based object detectors extract features from the input image or video frame. An object detector solves two subsequent tasks:

+ Task #1: Find an arbitrary number of objects (possibly even zero), and
+ Task #2: Classify every single object and estimate its size with a bounding box.

To simplify the process, you can separate those tasks into two stages. Other methods combine both tasks into one step (single-stage detectors) to achieve higher performance at the cost of accuracy.

**Two-stage detectors:** In two-stage object detectors, the approximate object regions are proposed using deep features before these features are used for the classification as well as bounding box regression for the object candidate.

The two-stage architecture involves (1) object region proposal with conventional Computer Vision methods or deep networks, followed by (2) object classification based on features extracted from the proposed region with bounding-box regression.
Two-stage methods achieve the highest detection accuracy but are typically slower. Because of the many inference steps per image, the performance (frames per second) is not as good as one-stage detectors.
Various two-stage detectors include region convolutional neural network (RCNN), with evolutions Faster R-CNN or Mask R-CNN. The latest evolution is the granulated RCNN (G-RCNN).
Two-stage object detectors first find a region of interest and use this cropped region for classification. However, such multi-stage detectors are usually not end-to-end trainable because cropping is a non-differentiable operation.

**One-stage detectors:** One-stage detectors predict bounding boxes over the images without the region proposal step. This process consumes less time and can therefore be used in real-time applications.

One-stage object detectors prioritize inference speed and are super fast but not as good at recognizing irregularly shaped objects or a group of small objects.

The most popular one-stage detectors include the YOLO, SSD, and RetinaNet. The latest real-time detectors are YOLOv4-Scaled (2020) and YOLOR (2021). The main advantage of single-stage is that those algorithms are generally faster than multi-stage detectors and structurally simpler.
