American Sing Language Real-time Recognition

The following project aims to demonstrate the feasibility of translating American Sign Language with a real-time approach.

The results obtained from the study of the problem are contained in the real-time demo application. Below, some GIFs are extracted from the webcam video stream.

In addition, a client-server web app has also been implemented. The following GIF shows how it works.

Getting Started

Follow the instructions below to get a clean installation.

Dataset

Download the WLASL dataset.

git clone https://github.com/dxli94/WLASL

Prerequisites

Create and activate a new virtual environment in the project folder.

~/project_folder$ virtualenv .env
~/project_folder$ source .env/bin/activate

Installation

Clone the repo.

(.env) git clone https://github.com/simonefinelli/ASL-Recognition-backup

Install requirements.

(.env) python -m pip install -r requirements.txt

Split the WLASL dataset in the right format using the script in 'tools/dataset splitting/'.
```
(.env) python k_gloss_splitting.py ./WLASL_full/ 2000
```
Copy the pre-processed dataset in the 'data' folder.

Usage

Now let's see how to use the neural network, the demo and the web app.

Neural Net

To start the training run:
```
(.env) python train_model.py
```
After training, to evaluate the best model on the test-set, run:
```
(.env) python evaluate_model.py
```
Now, we can use the model in the demo or for the web app.

Tips

The WLASL dataset can be divided into 4 sub-datasets: WLASL100, WLASL300, WLASL1000 e WLASL2000. You can find the various models used for each sub-dataset in the models.py file.
The custom frame generator used in the model needs at least 12 frames to work. However, videos 59958, 18223, 15144, 02914 and 55325, in the WLASL1000 and WLASL2000 datasets, are shorter. To solve this problem use the video_extender.py script.

Real-time demo

To start the demo run:
```
(.env) python demo.py
```

Web app

To start the web app run:
```
(.env) python serve.py
```
Go to the following URL: http://127.0.0.1:5000/

Tip

The model used in the demo and the web app was obtained by training the neural net on a custom dataset, called WLASL20custom. This dataset consists of only 20 words: book, chair, clothes, computer, drink, drum, family, football, go, hat, hello, kiss, like, play, school, street, table, university, violin and wall.

Results

I achieved the following accuracy with the proposed models:

WLASL20c: 63% of accuracy.
WLASL100: 34% of accuracy.
WLASL300: 28% of accuracy.
WLASL1000: 19% of accuracy.
WLASL2000: 10% of accuracy.

License

Distributed under the MIT License. See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
docs/thesis/italian_version		docs/thesis/italian_version
media		media
real-time demo		real-time demo
tools		tools
web app		web app
LICENSE		LICENSE
README.md		README.md
data_utils.py		data_utils.py
evaluate_model.py		evaluate_model.py
frame_generator.py		frame_generator.py
models.py		models.py
requirements.txt		requirements.txt
train_model.py		train_model.py

License

simonefinelli/ASL-Real-time-Recognition

Folders and files

Latest commit

History

Repository files navigation

American Sing Language Real-time Recognition

Getting Started

Dataset

Prerequisites

Installation

Usage

Neural Net

Tips

Real-time demo

Web app

Tip

Results

License

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Languages