# TUTORIAL: Train YOLOv7 for American Sign Language recognition

*A guide to use Transfer Learning on YOLOv7 in order to detect ASL letters through an AI Notebook.*

### **USE CASE:** Train YOLOv7 model on custom dataset and enjoy the power of Transfer Learning

<img src="attachment:4cc0dc99-09e7-4c45-90e6-c0c3e40208a3.png" >

## Introduction

The purpose of this tutorial is to show how it is possible to train YOLOv7 to recognise American Sign Language letters. YOLOv7 is an object detection algorithm. Although closely related to image classification, object detection performs image classification on a more precise scale. Object detection locates and categorises features in images.

It is based on the YOLOv7 repository: <a href="https://github.com/WongKinYiu/yolov7">WongKinYiu/yolov7</a>.

## Code

The different steps are as follow:

- Download the American Sign Language Letters Dataset
- Clone YOLOv7 repository
- Install YOLOv7 dependencies
- Import dependencies and check GPU availability
- Define the number of classes and YOLOv7 model architecture
- Recover YOLOv7 weights
- Run YOLOv7 training on ASL letters dataset
- Display results of YOLOv7 training on ASL letters dataset
- Test your YOLOv7 custom model on the ASL Letters test dataset
- Run YOLOv7 inference on new images
- Export trained weights for future inference

## Download the American Sign Language Letters Dataset

The ASL Letters Dataset is available on <a href="https://public.roboflow.com/object-detection/american-sign-language-letters">Roboflow</a>.

> *The American Sign Language Letters dataset is an object detection dataset of each ASL letter with a bounding box. David Lee, a data scientist focused on accessibility, curated and released the dataset for public use.*

If you want to use this **Public Dataset** on the tutorial, follow the next requirements:

- create a Roboflow account
- click on `Download` in order to download the dataset
- select`YOLO v7 PyTorch` format
- choose the method `show download code`

<img src="attachment:1d3c6db6-f9a4-4d33-868d-b9597c82173e.png">

You will get a URL (`<dataset_url>`) that will allow you to download your dataset directly inside the notebook.

Finally, replace `<dataset_url>` with yours in the following command:

In [None]:
# go to the folder corresponding to your object container
%cd /workspace/data/FinData
!curl -L "https://public.roboflow.com/ds/9FYyYNe2HP?key=X7KMp85nNf" > roboflow.zip; unzip roboflow.zip; rm roboflow.zip /workspace/data/FinData

⚠️ First, you have to modify the *data.yaml* file. 

Follow this path: `workspace` -> `data` -> `data.yaml`

Then you need to **change the path** to :

`train: /workspace/data/BasicData/train/images`

`val: /workspace/data/BasicData/valid/images`

`test: /workspace/data/BasicData/test/images`

> You are now ready to start coding!

## Clone YOLOv7 repository

In order to get more information about YOLOv7, you can check <a href="https://github.com/WongKinYiu/yolov7">WongKinYiu/yolov7</a>.

Clone the repository directly in the notebook `workspace`.

In [1]:
!git clone https://github.com/WongKinYiu/yolov7.git /workspace/yolov7 # clone repo

Cloning into '/workspace/yolov7'...
remote: Enumerating objects: 1127, done.[K
remote: Total 1127 (delta 0), reused 0 (delta 0), pack-reused 1127[K
Receiving objects: 100% (1127/1127), 69.94 MiB | 18.93 MiB/s, done.
Resolving deltas: 100% (521/521), done.


## Install YOLOv7 dependencies

You can start the packages installation!

In [None]:
# install dependencies as necessary
!pip install -r /workspace/yolov7/requirements.txt

In [None]:
!pip install wandb

## Import dependencies and check GPU availability

In [4]:
import torch
import os
import torchvision
import random
import shutil
import yaml
import numpy as np
import wandb
import matplotlib.pyplot as plt
from IPython.display import Image, clear_output

In [None]:
wandb.login()

In [None]:
wandb.init(project="YOLOv7-ASL-detection", entity="asl-alphabet-data-augment-ovh")

In [7]:
print('Setup complete. Using torch %s %s' % (torch.__version__, torch.cuda.get_device_properties(0) if torch.cuda.is_available() else 'CPU'))

Setup complete. Using torch 1.13.1+cu117 _CudaDeviceProperties(name='Tesla V100S-PCIE-32GB', major=7, minor=0, total_memory=32510MB, multi_processor_count=80)


## Define the number of classes and YOLOv7 model architecture

In [9]:
# define number of classes based on data.yaml (here we got 26: A > Z)
with open("/workspace/data/BasicData/data.yaml", 'r') as stream:
    num_classes = str(yaml.safe_load(stream)['nc'])

In [10]:
# model configuration used for the tutorial: yolov7
%cat /workspace/yolov7/cfg/training/yolov7.yaml

# parameters
nc: 80  # number of classes
depth_multiple: 1.0  # model depth multiple
width_multiple: 1.0  # layer channel multiple

# anchors
anchors:
  - [12,16, 19,36, 40,28]  # P3/8
  - [36,75, 76,55, 72,146]  # P4/16
  - [142,110, 192,243, 459,401]  # P5/32

# yolov7 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [32, 3, 1]],  # 0
  
   [-1, 1, Conv, [64, 3, 2]],  # 1-P1/2      
   [-1, 1, Conv, [64, 3, 1]],
   
   [-1, 1, Conv, [128, 3, 2]],  # 3-P2/4  
   [-1, 1, Conv, [64, 1, 1]],
   [-2, 1, Conv, [64, 1, 1]],
   [-1, 1, Conv, [64, 3, 1]],
   [-1, 1, Conv, [64, 3, 1]],
   [-1, 1, Conv, [64, 3, 1]],
   [-1, 1, Conv, [64, 3, 1]],
   [[-1, -3, -5, -6], 1, Concat, [1]],
   [-1, 1, Conv, [256, 1, 1]],  # 11
         
   [-1, 1, MP, []],
   [-1, 1, Conv, [128, 1, 1]],
   [-3, 1, Conv, [128, 1, 1]],
   [-1, 1, Conv, [128, 3, 2]],
   [[-1, -3], 1, Concat, [1]],  # 16-P3/8  
   [-1, 1, Conv, [128, 1, 1]],
   [-2, 1, Conv, [128, 1, 1]],
   [-1, 1, Conv, [128, 3, 1]],

## Recover YOLOv7 weights

In this tutorial, we will do **Transfer Learning** based on a YOLOv7 model pre-trained on the <a href="https://cocodataset.org/">COCO dataset</a>.

**How to define Transfer Learning?**

For both humans and machines, learning something new takes time and practice. However, it is easier to perform similar tasks to those already learned. As with humans, AI will be able to identify patterns from previous knowledge and apply them to new learning.

If a model is trained on a database, there is no need to re-train the model from scratch to fit a new set of similar data.

Main advantages of Transfer Learning:

- saving resources
- improving efficiency
- model training facilitation
- saving time

**What is the COCO dataset?**

COCO is a large-scale object detection, segmentation, and also captioning dataset. COCO has several features:

- Object segmentation
- Recognition in context
- Superpixel stuff segmentation
- 330K images
- 1.5 million object instances
- 80 object categories
- 91 stuff categories
- 5 captions per image
- 250 000 people with keypoints

In [11]:
# YOLOv7 path
%cd /workspace/yolov7
!wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt 

/workspace/yolov7
--2023-01-31 13:43:52--  https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt
Resolving github.com (github.com)... 140.82.121.3
Connecting to github.com (github.com)|140.82.121.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/511187726/b0243edf-9fb0-4337-95e1-42555f1b37cf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230131%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230131T134352Z&X-Amz-Expires=300&X-Amz-Signature=d43a7683ebd7197cd6a64e2224c61d5fc5f7f2cb8b4cd5d8a09ed8dcad823c0f&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=511187726&response-content-disposition=attachment%3B%20filename%3Dyolov7.pt&response-content-type=application%2Foctet-stream [following]
--2023-01-31 13:43:52--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/511187726/b0243edf-9fb0-4337-95e1-42555f1b37cf?X-A

## Run YOLOv7 training on ASL Basic Letters Dataset

Parameters definitions:

- workers: maximum number of dataloader workers.
- device: cuda device.
- batch-size: refers to the batch size (number of training examples utilized in one iteration).
- data: refers to the path to the yaml file.
- img: refers to the input images size.
- cfg: define the model configuration.
- weights: initial weights path.
- name: save to project/name.
- hyp: hyperparameters path.
- epochs: refers to the number of training epochs. An epoch corresponds to one cycle through the full training dataset.

In [None]:
# train yolov7 on custom data for 100 epochs
# time its performance
%time
%cd /workspace/yolov7
!python /workspace/yolov7/train.py --workers 8 --device 0 --batch-size 8 --data '/workspace/data/BasicData/data.yaml' --img 416 416 --cfg '/workspace/yolov7/cfg/training/yolov7.yaml' --weights '/workspace/yolov7/yolov7.pt' --project YOLOv7-ASL-detection --name YOLOv7-Basic-Data-ASL --epochs 100

#### Graphs and functions explanation

**Loss functions:**

*For the training set:*

- Box: loss due to a box prediction not exactly covering an object.
- Objectness: loss due to a wrong box-object IoU **[1]** prediction.
- Classification: loss due to deviations from predicting ‘1’ for the correct classes and ‘0’ for all the other classes for the object in that box.

*For the valid set (the same loss functions as for the training data):*

- val Box
- val Objectness
- val Classification

**Precision & Recall:**

- Precision: measures how accurate are the predictions. It is the percentage of your correct predictions
- Recall: measures how good it finds all the positives

*How to calculate Precision and Recall ?*

<img src="attachment:3a6a5c3e-0445-4c50-a272-e56d49b7a6ea.jpg" width="600"/>

**Accuracy functions:**

mAP (mean Average Precision) compares the ground-truth bounding box to the detected box and returns a score. The higher the score, the more accurate the model is in its detections.

- mAP@ 0.5：when IoU is set to 0.5, the AP **[2]** of all pictures of each category is calculated, and then all categories are averaged : mAP
- mAP@ 0.5:0.95：represents the average mAP at different IoU thresholds (from 0.5 to 0.95 in steps of 0.05)

**[1] IoU (Intersection over Union)** = measures the overlap between two boundaries. It is used to measure how much the predicted boundary overlaps with the ground truth

*How to calculate IoU ?*

<img src="attachment:697927a7-b60a-4588-bdd6-9193796e0e9c.jpg" width="600"/>

**[2] AP (Average precision)** = popular metric in measuring the accuracy of object detectors. It computes the average precision value for recall value over 0 to 1

## Check new dataset size

In [40]:
print("Basic train dataset size:", len(os.listdir('/workspace/data/BasicData/train/images')))
print("Basic validation dataset size:", len(os.listdir('/workspace/data/BasicData/valid/images')))
print("Basic test dataset size:", len(os.listdir('/workspace/data/BasicData/test/images')), "\n")

print("Augmented train dataset size:", len(os.listdir('/workspace/data/FinData/train/images')))
print("Augmented validation dataset size:", len(os.listdir('/workspace/data/FinData/valid/images')))
print("Aumented test dataset size:", len(os.listdir('/workspace/data/FinData/test/images')))

Basic train dataset size: 1512
Basic validation dataset size: 144
Basic test dataset size: 72 

Augmented train dataset size: 1975
Augmented validation dataset size: 267
Aumented test dataset size: 104


## Run YOLOv7 training on ASL Letters Augmented Dataset

In [None]:
# train yolov7 on custom data for 100 epochs
# time its performance
%time
%cd /workspace/yolov7
!python /workspace/yolov7/train.py --workers 8 --device 0 --batch-size 8 --data '/workspace/data/FinData/data.yaml' --img 416 416 --cfg '/workspace/yolov7/cfg/training/yolov7.yaml' --weights '/workspace/yolov7/yolov7.pt' --hyp '/workspace/yolov7/data/hyp.scratch.custom.yaml' --project YOLOv7-ASL-detection --name YOLOv7-Augmented-Data-ASL --epochs 100

## Export trained weights for future inference

Weights after having train a YOLOv7 model on 100 epochs:

In [None]:
# firstly, rename it with the name you want
%cd /workspace/yolov7/YOLOv7-ASL-detection/YOLOv7-Augmented-Data-ASL/weights/
os.rename("best.pt","yolov7.pt")

In [None]:
# secondly, copy it in a new folder where you can put all the weights generated during your trainings
%cp /workspace/yolov7/YOLOv7-ASL-detection/YOLOv7-Augmented-Data-ASL/weights/yolov7.pt /workspace/asl-volov7-model/yolov7.pt

## Conclusion

The **YOLOv7** model has been trained on 100 epochs.

The performance results on the **ASL Letters Dataset** are as follows after 100 epochs:

**LOSS:**

- Box: 0.01557
- Objectness: 0.00454
- Clasification: 0.01225
- val Box: 0.03068
- val Objectness: 0.0345
- val Classification: 0.003724

**PRECISION & RECALL:**

- Precision: 0.8735
- Recall: 0.922

**ACCURACY:**

- mAP @0.5: 0.9436
- mAP @0.5:0.95: 0.7496