![](https://www.kios.ucy.ac.cy/evai/wp-content/uploads/2022/10/MicrosoftTeams-image-8-e1665993611643.png)  ![](https://www.kios.ucy.ac.cy/evai/wp-content/uploads/2022/10/MicrosoftTeams-image-9-e1665993618291.png)

# YOLOv7 - Object Detection Pipeline Tutorial


***Yolov7 Object Detection Pipeline Tutorial***

***IEEE Drone A.I. School 2022***

***Author: Rafael Makrigiorgis***

***Date: October 2022***



---



This tutorial is based on the [YOLOv7 repository](https://github.com/WongKinYiu/yolov7) by WongKinYiu. This notebook shows training on your own custom objects. Many thanks to WongKinYiu and AlexeyAB for putting this repository together.

### **Steps Covered in this Tutorial**
To train our detector we take the following steps:

1.  Learn about YOLO
3.  Download and Install YOLOv7 dependencies
4.  Prepare the custom dataset
5.  Run YOLOv7 training
6.  Evaluate YOLOv7 performance
7.  Run YOLOv7 inference on test images / sample video

#YOLO

## What is YOLO?

**YOLO** is an abbreviation for the term **‘You Only Look Once’**. This is an algorithm that **localizes** and **classifies** various **objects in a picture** (in real-time). 

The original YOLO model was introduced in the paper “You Only Look Once: Unified, Real-Time Object Detection” in 2015[1]. At the time, RCNN models[2,3,4] were the best way to perform object detection, and their time consuming, multi-step training process made them cumbersome to use in practice. 
YOLO was created to do away with as much of that hassle as possible, by offering single-stage object detection they reduced training & inference times as well as massively reduced the cost to run object detection.
As the name suggests, the algorithm requires only a single forward propagation through a neural network to detect objects.
![](https://opencv-tutorial.readthedocs.io/en/latest/_images/yolo1_net.png)

[Image Source](https://arxiv.org/pdf/1506.02640.pdf)

YOLO algorithm aims to predict a class of an object and the bounding box that defines the object location on the **input image**. It recognizes objects and **outputs their bounding boxes** using four numbers:

*   Center of the bounding box - *Bx, By*
*   Width of the box - *W*
*   Height of the box - *H*

In addition to that, YOLO predicts the corresponding number c for the predicted **class** as well as the **probability** of the prediction *P(c)*

In order to be able to get these output parameters, we extract a **weights** file which contains the final numbers for each of the deep learning filters. It can be used for either transfer learning, set as initial weights for a new training set, or to detect the objects it is trained on.

## Why the YOLO algorithm is important

YOLO algorithm is important because of the following reasons:


*   **Speed**: This algorithm improves the speed of detection because it can predict objects in real-time.

*   **High accuracy**: YOLO is a predictive technique that provides accurate results with minimal background errors.

*   **Learning capabilities**: The algorithm has excellent learning capabilities that enable it to learn the representations of objects and apply them in object detection.





## How does YOLO works?

YOLO works to perform object detection in a single stage by first separating the image into N grids. Each of these grids is of equal size SxS. Each of these regions is used to detect and localize any objects they may contain. For each grid, bounding box coordinates, B, for the potential object(s) are predicted with an object label and a probability score for the predicted object's presence.

As you may have guessed, this leads to a significant overlap of predicted objects from the cumulative predictions of the grids. To handle this redundancy and reduce the predicted objects down to those of interest, YOLO uses [Non-Maximum Suppression](https://towardsdatascience.com/non-maximum-suppression-nms-93ce178e177c) to suppress all the bounding boxes with comparatively lower probability scores.

To achieve this, YOLO first compares the probability scores associated with each decision, and takes the largest score. Following this, it removes the bounding boxes with the largest Intersection over Union with the chosen high probability bounding box. This step is then repeated until only the desired final bounding boxes remain.

![](https://i.ibb.co/VCCfGTZ/grid-ssd.jpg)

![](https://i.ibb.co/qRY29DX/yolossd.png)

(a)Image Divided into Grids   (b) Before Non- Maximum Suppression  (c)After Non Maximal Suppression (Final Output)







---


# Download and Install YOLOv7 

---



### Clone the repository of YOLOv7

In [None]:
!git clone https://github.com/WongKinYiu/yolov7.git 

Change directory to the cloned folder.

In [None]:
%cd yolov7

### Install Dependencies
*Remember to choose GPU in Runtime if not already selected. Runtime --> Change Runtime Type --> Hardware accelerator --> GPU*



In [34]:
!pip install -r requirements.txt

# Prepare Custom Dataset


## Annotation - Labelimg

Download locally on your computer the sample images from [here](https://www.kios.ucy.ac.cy/evai/wp-content/uploads/2022/10/labeling_sample.zip).

It contains aerial images of vehicles that needs to be labeled using the Labelimg tool mentioned below.

LabelImg is a graphical image annotation tool.

It is written in Python and uses Qt for its graphical interface.

Annotations are saved as XML files in PASCAL VOC format, the format used by ImageNet. Besides, it also supports YOLO and CreateML formats.

![](https://raw.githubusercontent.com/tzutalin/labelImg/master/demo/demo3.jpg)


### Installation


*   Follow the instructions from the [github](https://github.com/heartexlabs/labelImg#installation) or,
*   Download  [this](https://github.com/tzutalin/labelImg/files/2638199/windows_v1.8.1.zip) ( for windows only)



### Instructions
1. Create pre-defined classes. You can edit the data/predefined_classes.txt to load pre-defined classes
2.  Copy the existing lables file to same folder with the images. The labels file name must be same with image file name.
3.  Click File and choose 'Open Dir' then Open the image folder.
4.  Select image in File List, it will appear the bounding box and label for all objects in that image.
(Choose Display Labels mode in View to show/hide lablels)
5.  We only need YOLO format

**The output of the labelimg for each image is a txt file that contains the bounding boxes you created with the following format "class_id center_x center_Y width height"**




### Hotkeys
*  Ctrl + u	Load all of the images from a directory
*  Ctrl + r	Change the default annotation target dir
*  Ctrl + s	Save
*  Ctrl + d	Copy the current label and rect box
*  Ctrl + Shift + d	Delete the current image
*  Space	Flag the current image as verified
*  w	Create a rect box
*  d	Next image
*  a	Previous image
*  del	Delete the selected rect box
*  Ctrl++	Zoom in
*  Ctrl--	Zoom out
*  ↑→↓←	Keyboard arrows to move selected rect box

## Prepare custom data

Upon Labeling your custom dataset do the following you should split your custom dataset into train, validation and test samples. Usually we split them into 80% of your data for training and 20% for validation and testing.



Create a folder for your training data

In [4]:
!mkdir vehicles

Change directory to the new folder, then download and unzip the custom dataset.

In [None]:
%cd vehicles
!wget https://www.kios.ucy.ac.cy/evai/wp-content/uploads/2022/10/DJI-405-720p.zip

In [None]:
!unzip "*****.zip"

Split your custom data to train/test/valid text files which contains the filepaths for each image.

In [30]:
import os
import random

directory = os.getcwd() + "****/folderpath"
# Percentage of images to be used for the test set
percentage_test = 20

# Create and/or truncate train.txt and test.txt
file_train = open('train.txt', 'w')
file_valid = open('valid.txt', 'w')
file_test = open('test.txt', 'w')

# Populate train.txt and test.txt
counter = 1
index_test = round(100 / percentage_test)
filenames = []
test_c = 0
for path, subdirs, files in os.walk(directory):

    for filename in files:
        infilename = os.path.join(path, filename)
        if not os.path.isfile(infilename): continue
        if infilename.endswith('.jpg') or infilename.endswith('.JPG') or infilename.endswith('.PNG') or infilename.endswith('.png'):  # check if ifle is txt format
            filenames.append(path+'/'+filename)
        random.shuffle(filenames)
for filename in filenames:
    if counter == index_test:
        counter = 1
        if test_c == 0:
          file_test.write(filename + "\n")
          test_c = 1
        else:
          file_valid.write(filename + "\n")
          test_c = 0
    else:
        file_train.write(filename  + "\n")
        counter = counter + 1
    index_test = round(100 / percentage_test)

Navigate to yolov7 folder.



In [None]:
%cd /content/yolov7

### Prepare training files

You will be needing the following files:
*   yolov7.yaml => it contains the model filters of the neural network. It can be found in "cfg/training" folder
*  vehicles.yaml => contains information about our data, modify the one that it can be found in "data/coco.yaml"
*  yolov7_training.pt => pre-trained weights of yolov7 trained on [COCO dataset](https://cocodataset.org/) 

1.  Copy the model yaml file into our training folder.

In [19]:
!cp cfg/training/yolov7.yaml vehicles/yolov7.yaml

2.  Modify yolov7-tiny.yaml 'nc' parameter with the number of your classes.


3.  Download yolov7 per-trained model to be used as initial weights during training

In [None]:
!wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7_training.pt

4.  Copy the data yaml file into our training folder.


In [9]:
!cp data/coco.yaml vehicles/vehicles_data.yaml

5.  Edit your file by adding the content that corresponds to your custom dataset

This file should contain the paths for the text files which contains the images path for each corresponding data set that will be used for training, validation and testing. An example is depicted below.


```
# train and val data as 1) directory: path/images/, 2) file: path/images.txt, or 3) list: [path1/images/, path2/images/]
train: data/crowd_human/train.txt  # train images
val: data/crowd_human/valid.txt    # valid images
test: data/crowd_human/test.txt    # test images

# number of classes
nc: 2

# class names
names: [ 'person', 'head']
```



# Training your custom dataset


In [None]:
!python train.py --workers 16 --device 0 --epochs 30 --batch-size 16 --data "data_folder/data.yaml" --img 512 512 --cfg "data_folder/yolov7.yaml" --weights "yolov7_training.pt" --name yolov7-custom --hyp data/hyp.scratch.custom.yaml


# Evaluating your trained model

In [None]:
!python test.py --data "data/data.yaml" --img 512 --batch 16 --iou 0.65 --device 0 --weights "runs/train/****/weights/best.pt" --name yolov7_640_val

You can find the output of the script by going to "runs/test/exp#"


![](https://builtin.com/sites/www.builtin.com/files/styles/ckeditor_optimize/public/inline-images/5_precision%20and%20recall.png)

![](https://builtin.com/sites/www.builtin.com/files/styles/ckeditor_optimize/public/inline-images/6_precision%20and%20recall.png) ~[](https://builtin.com/sites/www.builtin.com/files/styles/ckeditor_optimize/public/inline-images/6_precision%20and%20recall.png)

**Average Precision(AP):**
The general definition for the Average Precision (AP) is finding the area under the precision-recall curve above.

**mAP (mean average precision):**
Is the average of AP. In some context, we compute the AP for each class and average them.

# Extract detections

Download the sample video and run the detection script on it.

In [None]:
!wget https://www.kios.ucy.ac.cy/evai/wp-content/uploads/2022/10/DJI_0406_cut.mp4

In [None]:
!python detect.py --weights "runs/train/*****/weights/best.pt" --conf 0.25 --img-size 512 --source "videofile.mp4" --device 0

You can find the output video in folder 'runs/detect/exp#'.

You can also run the detection on a whole folder with images.

Download and unzip the following images sample.

In [None]:
!wget https://www.kios.ucy.ac.cy/evai/wp-content/uploads/2022/10/mtihani.zip

In [None]:
!unzip "filename.zip"

In [None]:
!python detect.py --weights "runs/train/******/weights/best.pt" --conf 0.25 --img-size 512 --source "image_folder/" --device 0

You can find the output images in folder 'runs/detect/exp#'.

# References


1.   *Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.*
2.   *Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.*
3.   *Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE international conference on computer vision. 2015.*
4.   *Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems 28 (2015).*

