# Generating My Own Dataset For Training

## Intro and Reason For Not Using Online Dataset

So after I was able to parse through a dataset from online and collecting the videos, frames, and data to use, and then training the data, I found that my model was highly inaccurate and would fluctuate during training.

There are a few reasons for why this might be happening, including the dataset not having a lot of videos per word or the videos being very low quality.

I tried looking for other datasets to use, but many of them were just a clone of the original with some minor modifications to the way the author structured the directory.

So, I decided the best approach is to try and create my own videos that will be higher quality and there will be more videos than in the dataset. Currently, I plan to record about 100 videos per word and, since the sign only takes a few seconds to complete, each word should take only a few minutes.

The reason why this was not done before, is that I thought the dataset would be enough and I was planning to use over 2000 words, with a total of over 21,000 videos. This is not easy to do manually. However, since the low quality is making my training inaccurate, I will try to do as many words as I can but manually

## Generating the Dataset

First, I will generate 100 videos for each word and save them in mp4 format. Then, I will go through each video and convert it to individual frames. Finally, I will loop through each frame in each video and generate the landmark points using mediapipe's holistic model

### Collecting The Videos

The video generation will be done using ```cv2```. ```Numpy``` will be used for dealing with array data. ```os``` will be used for system commands like creating directories and moving files.

In [11]:
import cv2
import numpy as np
import os
import time

First, I will try to make one video and then use a function to generate videos for all the words

In [None]:
# For read from webcam
cap = cv2.VideoCapture(0)
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

# For creating the video
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
fps = 60

while True:
    # Capturing videos for 1 word
    for word in range(1):
        # Capturing 2 videos per word
        for vid_num in range(2):
            out_vid_file_path = f'{word}_{vid_num}.mp4'
            out_vid = cv2.VideoWriter(out_vid_file_path, fourcc, fps, (frame_width, frame_height))
            #Capturing 2 seconds per video
            for num_frame in range(int(1.25*fps + 1)):
                ret,frame = cap.read()

                if not ret:
                    print('Could not capture frame')
                    break
                cv2.putText(frame, f'Frame {num_frame}, res {frame_width}, {frame_height}', (30,30), cv2.FONT_HERSHEY_SIMPLEX, 
                            1, (255,0,0), 2, cv2.LINE_AA)
                if num_frame == 0:
                    cv2.imshow('Test_Feed', frame)
                    cv2.waitKey(2000)
                else:
                    cv2.imshow('Test_Feed', frame)
                    out_vid.write(frame)

                # Break the loop on 'q' key press
                if cv2.waitKey(1) & 0xFF == ord('q'):
                    break
            out_vid.release()
    break

cap.release()
cv2.destroyAllWindows()

This successfully generated 2 videos in a row. Now, I can put this in a function and get the start_word, end_word, and a list of labels as arguments.

This function will first create the directories where the videos will be saved if they do not exist already. Then it will generate 100 videos for each word, with each video being 2 seconds long

In [27]:
def generate_videos(start_word=0, end_word=1):
    videos_per_word = 100

    # For read from webcam
    cap = cv2.VideoCapture(0)
    frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

    # For creating the video
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
    fps = 60

    break_videos = False
    while True:
        # Capturing videos for as many words as given in the parameters
        for word in range(start_word, end_word):

            # Capturing 2 videos per word
            for vid_num in range(videos_per_word):
                # Create output directory if it does not exist
                out_vid_dir = os.path.join('video_data', f'{word}', f'{vid_num}')
                if os.path.exists(out_vid_dir) is False:
                    print(f'Creating {out_vid_dir}')
                    os.makedirs(out_vid_dir)
                else:
                    print(f'{out_vid_dir} already exists')

                # Creating VideoWriter
                out_vid_file_path = os.path.join(out_vid_dir,f'{word}_{vid_num}.mp4')
                out_vid = cv2.VideoWriter(out_vid_file_path, fourcc, fps, (frame_width, frame_height))
                
                #Capturing 2 seconds per video
                for num_frame in range(int(1.25*fps + 1)):
                    ret,frame = cap.read()

                    if not ret:
                        print('Could not capture frame')
                        break
                    cv2.putText(frame, f'Frame {num_frame}, res {frame_width}, {frame_height}', (30,30), cv2.FONT_HERSHEY_SIMPLEX, 
                                0.8, (255,0,0), 1, cv2.LINE_AA)
                    # Wait to get into position to show the sign, then start saving frames
                    if num_frame == 0:
                        cv2.imshow('Test_Feed', frame)
                        cv2.waitKey(2000)
                    else:
                        cv2.imshow('Test_Feed', frame)
                        out_vid.write(frame)

                    # Break the whole video creation on 'q' key press
                    if cv2.waitKey(1) & 0xFF == ord('q'):
                        break_videos = True
                        break
                if break_videos is True:
                    break
            if break_videos is True:
                break
        break

    cap.release()
    out_vid.release()
    cv2.destroyAllWindows()

Now I will test this with one word: __book__

In [28]:
generate_videos()

Creating video_data/0/0
Creating video_data/0/1
Creating video_data/0/2
Creating video_data/0/3
Creating video_data/0/4
Creating video_data/0/5
Creating video_data/0/6
Creating video_data/0/7
Creating video_data/0/8
Creating video_data/0/9
Creating video_data/0/10
Creating video_data/0/11
Creating video_data/0/12
Creating video_data/0/13
Creating video_data/0/14
Creating video_data/0/15
Creating video_data/0/16
Creating video_data/0/17
Creating video_data/0/18
Creating video_data/0/19
Creating video_data/0/20
Creating video_data/0/21
Creating video_data/0/22
Creating video_data/0/23
Creating video_data/0/24
Creating video_data/0/25
Creating video_data/0/26
Creating video_data/0/27
Creating video_data/0/28
Creating video_data/0/29
Creating video_data/0/30
Creating video_data/0/31
Creating video_data/0/32
Creating video_data/0/33
Creating video_data/0/34
Creating video_data/0/35
Creating video_data/0/36
Creating video_data/0/37
Creating video_data/0/38
Creating video_data/0/39
Creating v

I watched back some of the videos, and it looks like there is one small issue. Every video after the first one has its first 5 frames from the previous video. But, this is not a big deal, as a I can later make the data for training, testing, and validation only take the last 70 frames.

However, it takes about 7.5 min per word, so I will not be doing all 10 words at the same time. I will try to do one each, 10 times.

The next word after book is: __drink__

In [29]:
generate_videos(start_word=1, end_word=2) # Word 1 (second word) is drink

Creating video_data/1/0
Creating video_data/1/1
Creating video_data/1/2
Creating video_data/1/3
Creating video_data/1/4
Creating video_data/1/5
Creating video_data/1/6
Creating video_data/1/7
Creating video_data/1/8
Creating video_data/1/9
Creating video_data/1/10
Creating video_data/1/11
Creating video_data/1/12
Creating video_data/1/13
Creating video_data/1/14
Creating video_data/1/15
Creating video_data/1/16
Creating video_data/1/17
Creating video_data/1/18
Creating video_data/1/19
Creating video_data/1/20
Creating video_data/1/21
Creating video_data/1/22
Creating video_data/1/23
Creating video_data/1/24
Creating video_data/1/25
Creating video_data/1/26
Creating video_data/1/27
Creating video_data/1/28
Creating video_data/1/29
Creating video_data/1/30
Creating video_data/1/31
Creating video_data/1/32
Creating video_data/1/33
Creating video_data/1/34
Creating video_data/1/35
Creating video_data/1/36
Creating video_data/1/37
Creating video_data/1/38
Creating video_data/1/39
Creating v

Next is the word: __computer__