# Topic K – Aligning Text to Sign Language Video

Imane Elbacha & Rajae Sebai

## Topic description 

For Deaf communities, sign languages are an essential
means of communication. In actuality, sign languages are
similar to other spoken languages except that they primarily communicate through the use of hands, body posture,and facial expressions. The purpose of this project is to convert the hand signs used in continuous signing video into subtitles. The creation of such a tool could have a wide
range of uses, such as the indexing of sign language video corpora and the automatic generation of massive sign language data sets.

The process of aligning subtitles to continuous signing
can be difficult. First off, the grammatical structures of sign languages differ significantly from those of spoken languages. Second, because to variations in speed and syntax, a subtitle’s length varies greatly between speech and signing. Third, there is no direct one-to-one mapping between subtitle words and signs created by interpreters, and whole subtitles may not be signed.

To sum up, the goal of this project is to develop an algorithm that successfully assigns text to signing video. The tools are a comprehensive bibliography, the BOBSL dataset and the methods presented in class.

## Imports 

In [None]:
#general
import os
import numpy as np
import matplotlib.pyplot as plt

import glob 
import scipy.io
import gzip, shutil

import nltk
nltk.download('stopwords')
#colab 
#from google.colab import drive
#drive.mount('/content/drive/', force_remount=True)
#os.chdir('/content/drive/MyDrive/MVA/Object recognition /Final project/code')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

# DATA: BOBSL BBC-Oxford British Sign Language Dataset

[BOBSL](https://www.robots.ox.ac.uk/~vgg/data/bobsl/) is a large-scale dataset of British Sign Language (BSL). It comprises 1,940 episodes (approximately 1,400 hours) of BSL-interpreted BBC broadcast footage accompanied by written English subtitles. From horror, period and medical dramas, history, nature and science documentaries, sitcoms, children’s shows and programs covering cooking, beauty, business and travel, BOBSL covers a wide range of topics. BOBSLv1_2 also includes 272 episodes as part of a challenge partition for the ECCV SLRTP 2022 workshop challenge. This partition is not accompanied by subtitles or annotations. The dataset features a total of 37 signers (excludes signers in challenge episodes). Distinct signers appear in the training, validation, test and challenge sets for signer-independent evaluation.

In [None]:
#download features into the colab 
!wget --recursive --no-parent --continue --wait=1 \
    --no-host-directories --cut-dirs 2 \
    --user bobsl-00064 --password Eeyabei7 \
    https://thor.robots.ox.ac.uk/~vgg/data/bobsl/features/i3d_c2281_16f_m8_-15_4_d0.8_-3_22/

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Length: 17094651 (16M) [application/octet-stream]
Saving to: ‘bobsl/features/i3d_c2281_16f_m8_-15_4_d0.8_-3_22/6133065122597278010/features.mat.gz’


2023-01-01 21:38:08 (14.4 MB/s) - ‘bobsl/features/i3d_c2281_16f_m8_-15_4_d0.8_-3_22/6133065122597278010/features.mat.gz’ saved [17094651/17094651]

--2023-01-01 21:38:09--  https://thor.robots.ox.ac.uk/~vgg/data/bobsl/features/i3d_c2281_16f_m8_-15_4_d0.8_-3_22/6134436076158161360/features.mat.gz
Reusing existing connection to thor.robots.ox.ac.uk:443.
HTTP request sent, awaiting response... 200 OK
Length: 32250569 (31M) [application/octet-stream]
Saving to: ‘bobsl/features/i3d_c2281_16f_m8_-15_4_d0.8_-3_22/6134436076158161360/features.mat.gz’


2023-01-01 21:38:11 (16.5 MB/s) - ‘bobsl/features/i3d_c2281_16f_m8_-15_4_d0.8_-3_22/6134436076158161360/features.mat.gz’ saved [32250569/32250569]

--2023-01-01 21:38:12--  https://thor.robots.ox.ac.uk/~vgg/data/bobsl/features/i3d_c22

In [None]:
#download videos into the colab 
!wget --recursive --no-parent --continue --wait=1 \
    --no-host-directories --cut-dirs 2 \
    --user bobsl-00064 --password Eeyabei7 \
    https://thor.robots.ox.ac.uk/~vgg/data/bobsl/videos 

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Length: 186997544 (178M) [video/mp4]
Saving to: ‘bobsl/videos/6152598633859508329.mp4’


2023-01-02 21:41:19 (18.5 MB/s) - ‘bobsl/videos/6152598633859508329.mp4’ saved [186997544/186997544]

--2023-01-02 21:41:20--  https://thor.robots.ox.ac.uk/~vgg/data/bobsl/videos/6152614095741770702.mp4
Reusing existing connection to thor.robots.ox.ac.uk:443.
HTTP request sent, awaiting response... 200 OK
Length: 159161429 (152M) [video/mp4]
Saving to: ‘bobsl/videos/6152614095741770702.mp4’


2023-01-02 21:41:29 (18.1 MB/s) - ‘bobsl/videos/6152614095741770702.mp4’ saved [159161429/159161429]

--2023-01-02 21:41:30--  https://thor.robots.ox.ac.uk/~vgg/data/bobsl/videos/6153322765345610755.mp4
Reusing existing connection to thor.robots.ox.ac.uk:443.
HTTP request sent, awaiting response... 200 OK
Length: 170911508 (163M) [video/mp4]
Saving to: ‘bobsl/videos/6153322765345610755.mp4’


2023-01-02 21:41:39 (18.8 MB/s) - ‘bobsl/videos/615332

In [None]:
#download spotting into the colab 
!wget --recursive --no-parent --continue --wait=1 \
    --no-host-directories --cut-dirs 2 \
    --user bobsl-00064 --password Eeyabei7 \
    https://thor.robots.ox.ac.uk/~vgg/data/bobsl/spottings.tar.gz  

#download subtitles
!wget --recursive --no-parent --continue --wait=1 \
    --no-host-directories --cut-dirs 2 \
    --user bobsl-00064 --password Eeyabei7 \
    https://thor.robots.ox.ac.uk/~vgg/data/bobsl/subtitles.tar.gz  

--2023-01-02 15:14:34--  https://thor.robots.ox.ac.uk/~vgg/data/bobsl/spottings.tar.gz
Resolving thor.robots.ox.ac.uk (thor.robots.ox.ac.uk)... 129.67.95.98
Connecting to thor.robots.ox.ac.uk (thor.robots.ox.ac.uk)|129.67.95.98|:443... connected.
HTTP request sent, awaiting response... 401 Unauthorized
Authentication selected: Basic realm="Restricted Content"
Reusing existing connection to thor.robots.ox.ac.uk:443.
HTTP request sent, awaiting response... 200 OK
Length: 69767691 (67M) [application/octet-stream]
Saving to: ‘bobsl/spottings.tar.gz’


2023-01-02 15:14:41 (13.0 MB/s) - ‘bobsl/spottings.tar.gz’ saved [69767691/69767691]

FINISHED --2023-01-02 15:14:41--
Total wall clock time: 6.5s
Downloaded: 1 files, 67M in 5.1s (13.0 MB/s)
--2023-01-02 15:14:41--  https://thor.robots.ox.ac.uk/~vgg/data/bobsl/subtitles.tar.gz
Resolving thor.robots.ox.ac.uk (thor.robots.ox.ac.uk)... 129.67.95.98
Connecting to thor.robots.ox.ac.uk (thor.robots.ox.ac.uk)|129.67.95.98|:443... connected.
HTTP re

In [None]:
# unzip the subtitles folder
!tar -xf /content/bobsl/subtitles.tar.gz --directory /content/bobsl/
!rm -r /content/bobsl/subtitles.tar.gz

# unzip the spottings folder
!tar -xf /content/bobsl/spottings.tar.gz --directory /content/bobsl/
!rm -r /content/bobsl/spottings.tar.gz


In [None]:
#unzip features 
def gz_extract(files):
    extension = ".gz"
    for item in files: # loop through items in dir
      if item.endswith(extension): # check for ".gz" extension
          file_name = item.rsplit('.',1)[0] #get file name for file within
          with gzip.open(item,"rb") as f_in, open(file_name,"wb") as f_out:
              shutil.copyfileobj(f_in, f_out)
          os.remove(item) # delete zipped file

features_path = '/content/bobsl/features/i3d_c2281_16f_m8_-15_4_d0.8_-3_22/'
files = [f for f in glob.glob(features_path + "**/*.gz", recursive=True)]
gz_extract(files)

In [None]:
#unzip annotations
!tar -xf /content/bobsl/annotations/annotations.pkl.zip --directory /content/bobsl/annotations/
!rm -r /content/bobsl/annotations/annotations.pkl.zip

# Baseline reproduction 

## Exploring the git 

In [None]:
#git clone 
!git clone https://github.com/hannahbull/subtitle_align.git

Cloning into 'subtitle_align'...
remote: Enumerating objects: 58, done.[K
remote: Counting objects: 100% (58/58), done.[K
remote: Compressing objects: 100% (43/43), done.[K
remote: Total 58 (delta 18), reused 52 (delta 14), pack-reused 0[K
Unpacking objects: 100% (58/58), done.


In [None]:
#environement
!pip install -r /content/subtitle_align/requirements.txt

Collecting beartype==0.5.1
  Downloading beartype-0.5.1-py3-none-any.whl (260 kB)
[?25l[K     |█▎                              | 10 kB 28.5 MB/s eta 0:00:01[K     |██▌                             | 20 kB 29.8 MB/s eta 0:00:01[K     |███▊                            | 30 kB 34.3 MB/s eta 0:00:01[K     |█████                           | 40 kB 38.2 MB/s eta 0:00:01[K     |██████▎                         | 51 kB 42.0 MB/s eta 0:00:01[K     |███████▌                        | 61 kB 46.9 MB/s eta 0:00:01[K     |████████▉                       | 71 kB 32.6 MB/s eta 0:00:01[K     |██████████                      | 81 kB 31.7 MB/s eta 0:00:01[K     |███████████▎                    | 92 kB 34.0 MB/s eta 0:00:01[K     |████████████▋                   | 102 kB 34.4 MB/s eta 0:00:01[K     |█████████████▉                  | 112 kB 34.4 MB/s eta 0:00:01[K     |███████████████                 | 122 kB 34.4 MB/s eta 0:00:01[K     |████████████████▎               | 133 kB 34.4 M

## Model checkpoints

In [None]:
#!wget "https://drive.google.com/u/0/uc?id=1GNIm1XXRDQNFNGZVbqFcyVTZo3dFzOlD&export=download&confirm=t&uuid=61dfecd8-9063-45be-b836-cfa0ec03779f&at=ACjLJWnJbQGtddv7aGPqM6KpYb36:1672251550922" -O "/content/inference_output.zip"
!unzip /content/inference_output.zip
!rm /content/inference_output.zip

Archive:  /content/inference_output.zip
   creating: checkpoints_subtitle_align/
   creating: checkpoints_subtitle_align/finetune_subtitles/
   creating: checkpoints_subtitle_align/finetune_subtitles/checkpoints/
  inflating: checkpoints_subtitle_align/finetune_subtitles/checkpoints/model_0000264041.pt  
   creating: checkpoints_subtitle_align/train_coarse_subtitles/
   creating: checkpoints_subtitle_align/train_coarse_subtitles/checkpoints/
  inflating: checkpoints_subtitle_align/train_coarse_subtitles/checkpoints/model_0000250341.pt  
   creating: checkpoints_subtitle_align/word_pretrain/
   creating: checkpoints_subtitle_align/word_pretrain/checkpoints/
  inflating: checkpoints_subtitle_align/word_pretrain/checkpoints/model_0000191709.pt  


##  Word level pretraining

In [None]:
#bash commands/word_pretrain.sh
!python main.py \
--features_path '/content/bobsl/features/i3d_c2281_16f_m8_-15_4_d0.8_-3_22' \
--spottings_path '/content/bobsl/annotations/annotations.pkl' \
--gpu_id 0 \
--batch_size 64 \
--n_workers 32 \
--pr_subs_delta_bias 0 \
--fixed_feat_len 20 \
--jitter_location \
--jitter_abs \
--jitter_loc_quantity 10. \
--load_words True \
--load_subtitles False \
--lr 1e-5 \
--centre_window \
--save_path '/content/inference_output/word_pretrain' \
--train_videos_txt '/content/subtitle_align/data/bobsl_train_1658.txt' \
--val_videos_txt '/content/subtitle_align/data/bobsl_val_32.txt' \
--test_videos_txt '/content/subtitle_align/data/bobsl_test_250.txt' \
--pos_weight 19. \
--n_epochs 100 \
--shuffle_getitem True \
--concatenate_prior True \

## Training on coarsely aligned subtitles

In [None]:
!1qazpython main.py \
--features_path '/content/bobsl/features/i3d_c2281_16f_m8_-15_4_d0.8_-3_22/' \
--gt_sub_path '/content/bobsl/subtitles/audio-aligned-heuristic-correction/' \
--pr_sub_path '/content/bobsl/subtitles/audio-aligned-heuristic-correction/' \
--gpu_id 0 \
--batch_size 64 \
--n_workers 32 \
--pr_subs_delta_bias 2.7 \
--gt_subs_delta_bias 2.7 \
--fixed_feat_len 20 \
--jitter_location \
--jitter_abs \
--jitter_loc_quantity 3. \
--load_words False \
--load_subtitles True \
--lr 5e-6 \
--save_path '/content/inference_output/train_coarse_subtitles' \
--train_videos_txt '/content/subtitle_align/data/bobsl_train_1658.txt' \
--val_videos_txt '/content/subtitle_align/data/bobsl_val_32.txt' \
--test_videos_txt '/content/subtitle_align/data/bobsl_test_250.txt' \
--n_epochs 100 \
--concatenate_prior True \
--min_sent_len_filter 0.5 \
--max_sent_len_filter 20 \
--shuffle_words_subs 0.5 \
--drop_words_subs 0.15 \
--resume 'content/inference_output/word_pretrain/checkpoints/model_0000191709.pt' \

## Finetune using manually aligned subtitles

In [None]:
!python subtitle_align/main.py \
--features_path '/content/bobsl/features/i3d_c2281_16f_m8_-15_4_d0.8_-3_22/' \
--gt_sub_path '/content/bobsl/subtitles/manually-aligned/' \
--pr_sub_path '/content/bobsl/subtitles/audio-aligned-heuristic-correction/' \
--gpu_id 0 \
--batch_size 64 \
--n_workers 0 \
--pr_subs_delta_bias 2.7 \
--fixed_feat_len 20 \
--jitter_location \
--jitter_abs \
--jitter_loc_quantity 2. \
--load_words False \
--load_subtitles True \
--lr 1e-6 \
--save_path '/content/inference_output/finetune_subtitles' \
--train_videos_txt '/content/subtitle_align/data/bobsl_align_train.txt' \
--val_videos_txt '/content/subtitle_align/data/bobsl_align_val.txt' \
--test_videos_txt '/content/subtitle_align/data/bobsl_align_test.txt' \
--n_epochs 100 \
--concatenate_prior True \
--min_sent_len_filter 0.5 \
--max_sent_len_filter 20 \
--shuffle_words_subs 0.5 \
--drop_words_subs 0.15 \
--resume '/content/inference_output/train_coarse_subtitles/checkpoints/model_0000250341.pt' \