# Linkit Challenge WS 2022/23 Hand Gesture Detection - Team X

## 1. Setup


First, you usually create a virtual environment where you can keep track of all package versions.
Create the environment with the following command in Terminal (Mac)/ CMD (Windows)

python3 -m venv venv


Then activate it with
- in CMD: venv\Scripts\activate
- in Terminal: . venv/bin/activate

In [12]:
# only run once to setup your environment
first_run = True

if first_run:
    # install reqiurements.txt
    !pip3 install -r requirements.txt
    # clone the git repo of ultralytics/yolov5
    !git clone https://github.com/ultralytics/yolov5.git
    # install dependencies for yolo
    !pip3 install -r yolov5/requirements.txt
    # or from website
    #!pip3 install -qr https://raw.githubusercontent.com/ultralytics/yolov5/master/requirements.txt

Cloning into 'yolov5'...
remote: Enumerating objects: 15935, done.[K
remote: Counting objects: 100% (104/104), done.[K
remote: Compressing objects: 100% (46/46), done.[K
remote: Total 15935 (delta 65), reused 87 (delta 58), pack-reused 15831[K
Receiving objects: 100% (15935/15935), 14.53 MiB | 4.95 MiB/s, done.
Resolving deltas: 100% (10939/10939), done.


#### 1.1. Import Packages

In [3]:
import pytorch_lightning as pl # PyTorch Lightning for easier training and evaluation of models
import torch # PyTorch
import cv2  # OpenCV for image processing
import matplotlib.pyplot as plt # for plotting
import matplotlib.patches as patches  # for plotting bounding boxes
%matplotlib inline
import uuid   # Unique identifier
import os # File system operations
import time # Time operations


  from .autonotebook import tqdm as notebook_tqdm


## 2. Annotation

## 2.1 Capture Images

Select the number of how many images you want to take per class and choose the path were these images will be stored

In [4]:
labels = ['rock', 'paper', 'scissor']
number_imgs = 1
IMAGES_PATH = os.path.join('datasets', 'gestures', 'images')

Create the directory (works for all operation systems)

In [5]:
if not os.path.exists(IMAGES_PATH):
    if os.name == 'posix':
        !mkdir -p {IMAGES_PATH}
    if os.name == 'nt':
         !mkdir {IMAGES_PATH}
for label in labels:
    path = os.path.join(IMAGES_PATH, label)
    if not os.path.exists(path):
        !mkdir {path}

Capture the images! You can adjust the time between the different frames to have more time of switching between your hand gestures.

In [33]:
for label in labels:
    cap = cv2.VideoCapture(0)
    print('Collecting images for {}'.format(label))
    time.sleep(7) #time between gestures
    for imgnum in range(number_imgs):
        print('Collecting image {}'.format(imgnum))
        ret, frame = cap.read()
        imgname = os.path.join(IMAGES_PATH,label,label+'.'+'{}.jpg'.format(str(uuid.uuid1())))
        cv2.imwrite(imgname, frame)
        cv2.imshow('frame', frame)
        time.sleep(2) #time between frames

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
cap.release()
cv2.destroyAllWindows()

Collecting images for rock
Collecting image 0
Collecting images for paper
Collecting image 0
Collecting images for scissor
Collecting image 0


Go to https://www.makesense.ai and label your data there. First create the label rock, then paper and then scissor to ensure that they are labeled as 0, 1, 2.

**Export the labels in yolo format**

## 3. Training

#### 3.2. Load and Test Pretrained Model
In Computer Vision, we usually use pretrained Models, to reduce the number of samples required for training. In this case, we use a pretrained YOLOv5 model, which was trained on the COCO dataset. The COCO dataset contains 80 different classes, which are not relevant for our task. For now, let's just test the model on a random image.

#### Tasks:
_**Task 3.1:**_ Can you find better models for our task?
_**Task 3.2:**_ Can you find better pretrained weights for our task?
_Note:_ You can find possible models and weights on:
1. huggingface [Model Hub](https://huggingface.co/models?pipeline_tag=object-detection&sort=downloads).
2. pytorch [Model Zoo](https://pytorch.org/docs/stable/torchvision/models.html).
3. pytorch [Model Hub](https://pytorch.org/hub/).


In [6]:
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

Using cache found in /Users/justusthomsen/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2023-6-1 Python-3.10.10 torch-1.13.0 CPU



[Errno 2] No such file or directory: '/Users/justusthomsen/Documents/LinkIT/Challenges/linkit_hand_gesture_recognition/venv/lib/python3.10/site-packages/psutil-5.9.5.dist-info/METADATA'


Downloading https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5s.pt to yolov5s.pt...
100%|██████████| 14.1M/14.1M [00:04<00:00, 3.46MB/s]

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
Adding AutoShape... 


We Load  an Image and run the model on it. The model returns a list of bounding boxes, with their corresponding class and confidence score.

In [7]:
# Load image
img = cv2.imread('datasets/example/zidane.png')

In [8]:
results = model(img)

In [10]:
# Display image and Convert to RGB, display labels and bounding boxes from the results with cv2
fig, ax = plt.subplots()
ax.imshow(img[:,:,::-1])

# Draw bounding boxes and labels of detections
for *rect,  conf, name, cls   in results.pandas().xyxy[0].values:
    x1, y1, x2, y2 = rect
    rect = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=1, edgecolor='r', facecolor='none')
    ax.add_patch(rect)
    ax.text(x1, y1, f'{cls} {conf:.2f}', fontsize=12, c='white')
plt.show()

#### 3.3 Adapt the Model for our Task
We need to adapt the model for our task. Therefore, we need to remove the last layer of the model, and replace it with a new layer, which only contains 3 classes (one for each hand gesture).
Since we use the [yolov5 implementation of ultralytics](https://github.com/ultralytics/yolov5), we can use their provided training script to train our model.


First we define our 3 label names, and the path to the images and labels.

This is done in the dataset.yaml file.
Here we have to define the 3 classes, and the path to the images and labels.
Make sure to use the correct mapping between classID and label
1. 0 -> "rock"
2. 1 -> "paper"
3. 2 -> "scissors"

Now we train our model.
we use the dataset.yml file to define the path to the images and labels, and the number of classes.

For different Parameter Configurations refer to:
https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data

### Monitoring
It is important to monitor the training process to ensure that the model is training properly.
To do so, we recommend [Weights and Biases](https://wandb.ai/) (or [tensorboard](https://www.tensorflow.org/tensorboard)).
Both tools keep track of the training process and automatically log the results.
#### Weights and Biases
1. Create an account on [Weights and Biases](https://wandb.ai/)
2. Install the wandb package `pip install wandb`
3. Login to your account `wandb login`
4. Run the training script with the `--project` flag `python train.py --project <project_name>`
5. Go to your [Weights and Biases](https://wandb.ai/) dashboard to view the results

For more information on the YOLOV5 integration with Weights and Biases, refer to [here](https://docs.wandb.ai/guides/integrations/yolov5)


In [14]:
# Train YOLOv5s on COCO128 for 3 epochs
!cd yolov5 && python train.py --img 640 --batch 16 --epochs 3 --data ../dataset.yaml --weights yolov5s.pt --cache --workers 0 --save-period 1

[34m[1mwandb[0m: W&B API key is configured. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mtrain: [0mweights=yolov5s.pt, cfg=, data=../dataset.yaml, hyp=data/hyps/hyp.scratch-low.yaml, epochs=3, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=ram, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=0, project=runs/train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
[34m[1mgithub: [0mup to date with https://github.com/ultralytics/yolov5 ✅
YOLOv5 🚀 v7.0-175-g5f11555 Python-3.10.10 torch-1.13.0 CPU

[34m[1mhyperparameters: [0mlr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0

## 4. Inference
Now that we have trained our model, we can use it to detect hand gestures in images.
We can use the `detect.py` script to detect hand gestures in images.

1. The `--source 0` argument specifies that we want to use the webcam as input.
2. The `--weights path.pt` argument specifies the path to the weights of the model.
3. The `--conf 0.X` argument specifies the confidence threshold.

The confidence threshold determines the minimum confidence score for a bounding box to be considered as a detection. If the confidence score is below the threshold, the bounding box will be ignored. This is useful to filter out false positives.

In [15]:
!cd yolov5 && python3 detect.py --source 0 --weights ../yolov5s.pt --conf 0.4

[34m[1mdetect: [0mweights=['../yolov5s.pt'], source=0, data=data/coco128.yaml, imgsz=[640, 640], conf_thres=0.4, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
YOLOv5 🚀 v7.0-175-g5f11555 Python-3.10.10 torch-1.13.0 CPU

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
1/1: 0...  Success (inf frames 1920x1080 at 30.00 FPS)

0: 384x640 (no detections), 56.6ms
0: 384x640 (no detections), 46.9ms
0: 384x640 (no detections), 46.2ms
0: 384x640 (no detections), 47.4ms
0: 384x640 (no detections), 46.9ms
0: 384x640 (no detections), 47.4ms
0: 384x640 (no detections), 48.2ms
0: 384x640 (no detections), 48.0ms
0: 384x640 (no detections), 46.9ms
0: 384x640 (no detections), 44.6ms
0: 