Skip to content

sign-language-translator/sign-language-datasets

Repository files navigation

Sign Language Datasets

Datasets used by the sign_language_translator python package.

See the download tree for quick links to videos, landmarks, word mappings & parallel corpus.

  1. Sign Language Datasets
    1. Problem Overview
      1. Sign Recording Options
      2. Translation Dataset Needs
    2. Datasets
    3. Download Tree
    4. How to Contribute
    5. Citation
    6. Glossary

Download via CLI (needs python):

pip install sign-langauge-translator
slt download "datasets/.*landmarks.*csv.*\.zip"

Problem Overview

Sign Language is a gesture based method of communication. In sign languages, the vocabulary is small with many spoken language words corresponding to the same sign. Sentences are short and contain only the keywords. Each person has their own accent of performing the sign. Every region has its own sign language and there are few large scale standardization efforts.

  1. We can obtain standard dictionaries from reputable organizations in each country and concatenate signs from them using standardized grammar rules to translate & synthesize datasets.
  2. We can record people performing the signs from the dictionary to capture diversity of accents.
  3. We can scrape sign language videos and use deep learning to generate their glosses & translations.

Because most regional languages will have very few hours of data, the best approach will be to train a many-to-many seq2seq translation model.

Sign Recording Options

Sign language can be represented as:

Videos
  • Videos can consist of individual words, phrases or sentences.
  • Each video can contain just one person or multiple people talking at the same time.
  • Using computer vision, videos can be decomposed into 3D motion vectors of joints on the body as a preprocessing step to reduce the bias and noise in the dataset and enables more data augmentation.
Token Sequence + Gesture Dictionary
  1. Sign sequence written using text word-for-word is called gloss and it captures the grammar of sign language.
  2. There are other sign writing notations like HamNoSys etc which write down individual movements of the hands but this project currently only uses the word level tokens.
Motion capture gloves (costly for users & dataset makers)

Translation Dataset Needs

A translation model requires a parallel corpus of sentences that should be mapped to each other.
  • For each sign language video or sequence of videos, save translations & glosses in multiple languages
Sign Languages can be modeled as semi-formal languages (a mixture of rule based language & natural language). So, there is an opportunity for synthetic dataset generation.
  • Obtain sign language dictionaries.
  • List down all words in several text languages that can be mapped to those videos.
  • Train a language model to write sentences of only the supported words.
  • Translate those generated sentences using grammar rules of that regional language or a deep learning model into gloss (sign labels).
  • Concatenate videos corresponding to the tokens in the text to synthesize parallel video.

Datasets

The datasets currently available in the sign_language_translator package are chunked, preprocessed and labeled appropriately. More details on assets can be found in the release description.

Naming conventions:
  1. Dictionaries: country-organization-number_sign-label.mp4
  2. Replications: c*-o*-n*_s*_person-code_camera-angle.mp4
  3. Sentences: c*-o*-n*_gloss[_p*_c*].mp4
  4. Archives: c*-o*-n*[_p*-c*]_category-subcategory-extension.zip
  5. Preprocessed videos: c*-o*-n*_s*[_p*_c*].category-model.ext
  6. Videos without Signs: wordless_wordless_person_camera.mp4
  • The sign labels, tokens & glosses may contain word sense disambiguation wrapped in parenthesis e.g. *_spring(coil).mp4 or *_spring(water-fountain).mp4.
  • Person Codes are of the format [dh][fm]\d+. For example df0001 stands for deaf-female-0001 and hm0002 means hearing-male-0002
  • Camera Angles are from (front|below|left|right|top-left|top-right)-\d+x\d+y\d+z. (not finalized yet)
  • Category in preprocessed videos and archives is from (videos|landmarks).
  • Subcategory in Archive name is from (dictionary(-replication)?|sentences(-replication)?|mediapipe-pose-2-hand-1). It will include the model name in case of preprocessed files.

Statistics:

Sign Language Dictionary Sentences Synthetic Sentences Replications
Pakistan
Signs: 776 (27 min)
Word Tokens:
en: 1584
hi: 92
latn-ur: 2
ur: 2071
Count: 13 (57 sec)
Translations:
en: 19
hi: 14
latn-ur: 13
ur: 17
glosses
en: 14
latn-ur: 13
ur: 15
Count: 1 (7 sec)
Translations:
en: 2
hi: 2
latn-ur: 1
ur: 2
glosses
en: 2
latn-ur: 1
ur: 2
Dictionary: 22 hrs
Sentences: 45 min

Download Tree

sign-language-datasets
├── README.md
├── text-preprocessing.json
├── todo.json
│
├── asset_urls
│   ├── archive-urls.json
│   ├── extra-urls.json
│   └── pk-dictionary-urls.json
│
├── parallel_texts
│   ├── pk-dictionary-mapping.json
│   ├── pk-sentence-mapping.json
│   └── pk-synthetic-sentence-mapping.json
│
└── schemas
    └── mapping-schema.json
Releases
├── v0.0.4 (Landmark Datasets)
│   ├── pk-hfad-1_landmarks-mediapipe-pose-2-hand-1-csv.zip
│   └── pk-hfad-1_landmarks-mediapipe-pose-2-hand-1-json.zip
│
├── v0.0.3 (Video Datasets)
│   └── pk-hfad-1_videos-mp4.zip
│
├── v0.0.2 (Dictionary)
│   ├── pk-hfad-1_*.mp4 [788]
│   ├── pk-hfad-2_*.mp4 [1]
│   └── wordless_wordless.mp4 [1]
│
└── v0.0.1 (Language Models for Dataset generation)
    └── *

How to Contribute

Project Setup:
  1. Clone the repo

    git clone https://github.com/sign-language-translator/sign-language-datasets.git
  2. Configure JSON schema in VSCode workspace settings especially for *-mapping.json files.

Our Needs:
1. Compile dictionaries
  1. Rename files to follow the convention (country-organization-...)
  2. Upload individual files to v0.0.2 Dictionary release.
  3. Upload zip archive to v0.0.3 Video Datasets release.
  4. Link individual file urls in asset_urls/*-dictionary-urls.json
  5. Link archive urls into asset_urls/archive-urls.json.
  6. Add the text tokens that have same the meaning and can be mapped to these dictionary videos to parallel_texts/*-dictionary-mapping.json.
2. Record Dictionary Videos to capture diverse accents
  1. Rename files to follow the convention (*_person-id_camera-angle*).
  2. Upload zip archive to v0.0.3 Video Datasets release.
  3. Link archive urls into asset_urls/archive-urls.json.
3. Scrape or Record sign language Sentences.
  • Upload & Link the data
  • Add translations and glosses to the parallel corpus
4. Contribute to the Synthetic Parallel Corpus
  1. Write sentences of supported words
  2. Compile dataset for training a language model to do the above step.
  3. Translate to other text languages
5. Translate existing tokens, translations & glosses to other text languages.

Note

Ensure uniqueness in sign labels before publishing anything.

Citation

Coming Soon!

Glossary

Word Definition
Label Text identifier of a sign language video/data sample. A filename without extension.
Accent A particular style of performing a sign such as speed, position and distance traveled by the hand.
Gloss Word sequence corresponding to the signs performed in the source sign language video.
Translation Valid text of a spoken language with the same meaning as source sign language video.
Parallel Corpus Collection of statements in Sign Language and their translations/glosses in spoken language texts.
Supported Word A text language token (word or phrase) for which a sign language video is available in the dictionary.
Replication Videos created using the dictionary videos or web-scraped sentences as a reference clips.
The performer can be hearing-abled person as well and multiple cameras from different angles can be used simultaneously.
Synthetic Sentence A sign language sentence formed by concatenating videos corresponding to word tokens written in a particular order.
Word Sense Disambiguation The task of figuring out the meaning or a relevant synonym of a word based on the current context.