# WLASL (World Level American Sign Language)

I am using a dataset that I downloaded from Kaggle. This dataset has 2,000 words, and over 20,000 videos of these 2,000 words in total.

## JSON Parsing and Organizing Videos/Frames

The video names represent the video-id of that video. To find out what word the video is showing, I have to find the video ID in the WLASL_v0.3.json file and see what word that video ID belongs to. The structure of the json file is as follows:

```
[
    {
        "gloss": "<word>",
        "instances": [
            {
                ...,
                ...,
                ...,
                "video_id": "<video_id_num>"
            },
            {
                ...,
                ...,
                ...,
                "video_id": "video_id_num"
            },
            ...
        ]
    },
    {
        "gloss": "<word>",
        "instances": [
            {
                ...,
                ...,
                ...,
                "video_id": "<video_id_num>"
            },
            ...
        ]
    },
    ...
]
```

To read and parse through json files, there is a json module that is downloaded alongside python. I will use this to parse through the json file and organize all the videos into their own directory in one of the three splits: Train, Test, or Validation. In addition, each video will be split into individual frames and the frames will be saved as png images. The final directory structure will look like:
```
/Videos
|--<split_type>
|   |--<word_id>
|   |   |--<word_id>-<instance_id>
|   |   |   |--<word_id>-<instance_id>-<frame_num>.png
|   |   |   |--<word_id>-<instance_id>-<frame_num>.png
|   |   |   ...
|   |   |--<word_id>-<instance_id>
|   |   |   |--<word_id>-<instance_id>-<frame_num>.png
|   |   |   |--<word_id>-<instance_id>-<frame_num>.png
|   |   |   ...
|   |   ...
|   |--<word_id>
|   ...
|...
```

In [2]:
import json

Next, I will load the json file as an object. This will make it easy to work with in python.

In [3]:
with open('WLASL/WLASL_v0.3.json', 'r') as file:
    data = json.load(file)

The json file is now loaded as an object in python

Now I will test out how this json module works and how to parse the different properties I need.

The bracket character [] represents a list, and the curly brackets {} represent objects that have name:value pairs.

The json file starts with [], which is a list of objects. Each object has a word and another list for all the instances of that word. Each instance is an object that has name:value pairs for that instance such as: video URL, video id, start time, end time, fps, and more.

The object for the first word can be accessed by accessing the first element of the list in the data json object:

In [4]:
data[0]

{'gloss': 'book',
 'instances': [{'bbox': [385, 37, 885, 720],
   'fps': 25,
   'frame_end': -1,
   'frame_start': 1,
   'instance_id': 0,
   'signer_id': 118,
   'source': 'aslbrick',
   'split': 'train',
   'url': 'http://aslbricks.org/New/ASL-Videos/book.mp4',
   'variation_id': 0,
   'video_id': '69241'},
  {'bbox': [190, 25, 489, 370],
   'fps': 25,
   'frame_end': -1,
   'frame_start': 1,
   'instance_id': 1,
   'signer_id': 90,
   'source': 'aslsignbank',
   'split': 'train',
   'url': 'https://aslsignbank.haskins.yale.edu/dictionary/protected_media/glossvideo/ASL/BO/BOOK-418.mp4',
   'variation_id': 0,
   'video_id': '65225'},
  {'bbox': [262, 1, 652, 480],
   'fps': 25,
   'frame_end': -1,
   'frame_start': 1,
   'instance_id': 2,
   'signer_id': 110,
   'source': 'valencia-asl',
   'split': 'train',
   'url': 'https://www.youtube.com/watch?v=0UsjUE-TXns',
   'variation_id': 0,
   'video_id': '68011'},
  {'bbox': [123, 19, 516, 358],
   'fps': 25,
   'frame_end': 60,
   'frame

The number of words in this object should be 2000. Getting the length of how many dictionary objects are in the first list should result in 2000.

In [14]:
len(data)

2000

This means all the words are in the json object

To access the word, I specify the key of the value I want. In this case, the word is the value for the key called 'gloss':

In [9]:
data[0]['gloss']

'book'

So from this, the first word in the json file is 'book'

The first instance of book is then accessed by appending [0]:

In [10]:
data[0]['instances'][0]

{'bbox': [385, 37, 885, 720],
 'fps': 25,
 'frame_end': -1,
 'frame_start': 1,
 'instance_id': 0,
 'signer_id': 118,
 'source': 'aslbrick',
 'split': 'train',
 'url': 'http://aslbricks.org/New/ASL-Videos/book.mp4',
 'variation_id': 0,
 'video_id': '69241'}

The len() function can be used again here to get how many instances the word 'book' has.

In [15]:
len(data[0]['instances'])

40

The word 'book' has 40 instances.

Finally, to get the video-id to open the correct video, I use the video_id name.

In [11]:
data[0]['instances'][0]['video_id']

'69241'

This means that the video named 69241.mp4 is the video for the first instance of the word 'book'.

The same process can be used to extract the split (train, test, or validation) that this video is a part of, and how many frames are in this video clip.

note: This dataset already has the videos so there is no need to download the video again from the URL. But that would be an extra step if the dataset did not have the videos already.

Now that I know how to access the different name-value pairs, I can begin organizing the videos into train, test, and validation splits. Then, splitting the videos into frames.

### Saving Video Frames

To do this, I will navigate through the json file into each instance and create a string to save the video at. The string will be in the format: ```<split>/<word_id>/<word_id>-<instance_id>/<word_id>-<instance_id>-<frame_num>```

To split the video into frames, I will load the video using opencv, and save each frame as a png image:

In [31]:
import cv2
import os

In [32]:
def save_video_frames(video_name, output_path):
    video_path = 'WLASL/videos/' + video_name
    cap = cv2.VideoCapture(video_path)

    if not os.path.exists(output_path):
        print(f'No directory called {output_path}. Creating one now')
        os.makedirs(output_path)

    frame_num = 0

    while True:
        # Read next frame
        ret, frame = cap.read()

        # If no more frames, stop.
        if not ret:
            break

        # increment frame number
        frame_num += 1

        # Save frame
        frame_path = os.path.join(output_path, f'{frame_num}.png')
        cv2.imwrite(frame_path, frame)

        print(f'Frame {frame_num} has been saved to {frame_path}')
    
    # Release video capture
    cap.release()
    print(f'All frames for the video {video_name} have been successfully saved')


Testing this with one video:

In [23]:
save_video_frames('69241.mp4', './test/')

No directory called ./test/. Creating one now
Frame 1 has been saved to ./test/1.png
Frame 2 has been saved to ./test/2.png
Frame 3 has been saved to ./test/3.png
Frame 4 has been saved to ./test/4.png
Frame 5 has been saved to ./test/5.png
Frame 6 has been saved to ./test/6.png
Frame 7 has been saved to ./test/7.png
Frame 8 has been saved to ./test/8.png
Frame 9 has been saved to ./test/9.png
Frame 10 has been saved to ./test/10.png
Frame 11 has been saved to ./test/11.png
Frame 12 has been saved to ./test/12.png
Frame 13 has been saved to ./test/13.png
Frame 14 has been saved to ./test/14.png
Frame 15 has been saved to ./test/15.png
Frame 16 has been saved to ./test/16.png
Frame 17 has been saved to ./test/17.png
Frame 18 has been saved to ./test/18.png
Frame 19 has been saved to ./test/19.png
Frame 20 has been saved to ./test/20.png
Frame 21 has been saved to ./test/21.png
Frame 22 has been saved to ./test/22.png
Frame 23 has been saved to ./test/23.png
Frame 24 has been saved to ./

This works great. So now I will use this in a loop to do this for every video.

In [41]:
if not os.path.exists('./train/1/1/'):
    print(f'No directory called ./train/1/1/. Creating one now')
    os.makedirs('./train/1/1/')

No directory called ./train/1/1/. Creating one now


In [54]:
def copy_video(directory_path, video_id):
    if not os.path.exists(directory_path):
        print(f'No directory called {directory_path}. Creating one now')
        os.makedirs(directory_path)

    copy_command = 'cp WLASL/videos/' + video_id + '.mp4' + ' ' + directory_path + '/' + video_id + '.mp4'
    os.system(copy_command)


def organize_words(end_word, start_word=0):
    for word_id in range(start_word, end_word):
        # List of dictionary objects for each word
        word = data[word_id]['gloss']
        train_instance = 0
        test_instance = 0
        val_instance = 0
        for instance in range(0,len(data[word_id]['instances'])):
            video_id = data[word_id]['instances'][instance]['video_id']
            split_type = data[word_id]['instances'][instance]['split']
            match split_type:
                case 'train':
                    directory_path = os.path.join(split_type, str(word_id), str(train_instance))
                    copy_video(directory_path, video_id)
                    train_instance += 1
                case 'test':
                    directory_path = os.path.join(split_type, str(word_id), str(test_instance))
                    copy_video(directory_path, video_id)
                    test_instance += 1
                case 'val':
                    directory_path = os.path.join(split_type, str(word_id), str(val_instance))
                    #if not os.path.exists(directory_path):
                    #    print(f'No directory called {directory_path}. Creating one now')
                    #    os.makedirs(directory_path)
                    #copy_command = 'cp WLASL/videos/' + video_id + '.mp4' + ' ' + directory_path + '/' + video_id + '.mp4'
                    #os.system(copy_command)
                    copy_video(directory_path, video_id)
                    val_instance += 1                       

In [55]:
organize_words(1, start_word=0)

No directory called train/0/0. Creating one now
No directory called train/0/1. Creating one now
No directory called train/0/2. Creating one now
No directory called train/0/3. Creating one now
No directory called train/0/4. Creating one now
No directory called val/0/0. Creating one now
No directory called train/0/5. Creating one now
No directory called train/0/6. Creating one now
No directory called train/0/7. Creating one now
No directory called train/0/8. Creating one now
No directory called train/0/9. Creating one now
No directory called train/0/10. Creating one now
No directory called train/0/11. Creating one now
No directory called train/0/12. Creating one now
No directory called train/0/13. Creating one now
No directory called test/0/0. Creating one now
No directory called test/0/1. Creating one now
No directory called train/0/14. Creating one now
No directory called val/0/1. Creating one now
No directory called test/0/2. Creating one now
No directory called val/0/2. Creating one 