Deprecated

Repository code and models are now available at https://github.com/sign-language-processing/pose-to-video.

Everybody Sign Now

This repository aims to train an in-browser real-time image translation model from pose estimation to videos.

The model code is a port of Tensorflow's pix2pix tutorial.

Data

We have recorded high resolution green screen videos of:

Maayan Gazuli (Maayan_1, Maayan_2) - Israeli Sign Language Interpreter
Amit Moryossef (Amit) - Project author

These videos are open for anyone to use for the purpose of sign language video generation.

This repository does not support additional input except for images. By default, every image of Maayan is inferenced with a red background (255, 200, 200), and every image of Amit is inferenced with a blue background (200, 200, 255).

Data processing

The videos were recorded in ProRes and were convereted to mp4 using ffmpeg.
Then, using Final Cut Pro X, removed the green screen using the keying effect, and exported for "desktop".
Finally, the FCPX export was processed again by ffmpeg to reduce its size (3.5GB -> 250MB).

ffmpeg -i CAM3_output.mp4 -qscale 0 CAM3_norm.mp4

Download

Download the data from here.

Or use the command line:

wget --no-clobber --convert-links --random-wait -r -p --level 3 -E -e robots=off --adjust-extension -U mozilla "https://nlp.biu.ac.il/~amit/datasets/GreenScreen/"

Temporary File Storage


cd /home/nlp/amit/WWW/datasets/GreenScreen/mp4/Amit && gdown --folder --continue --id 1X1GuGMPHm4Sty9hr7Goxbbig5KpBE7p1
cd /home/nlp/amit/WWW/datasets/GreenScreen/mp4/Maayan_1 && gdown --folder --continue --id 1X4-LagvS2JWm9xyOg5t2QAvP1nDxt3Vr
cd /home/nlp/amit/WWW/datasets/GreenScreen/mp4/Maayan_2 && gdown --folder --continue --id 1XBz8NrRomAU506q7xYZUWkXEw_yVz5YD

Training

Run python -m everybody_sign_now.train

Training is currently performed on CPU, roughly 5 minutes / 1000 steps.

This will train for a long while, and log each epoch result in a progress directory. Once satisfied with the result, the script can be killed.

Next Steps!

Add LSTM to the pix2pix state, to introduce temporal coherence with very little additional compute
Add another upsampler, from 256x256 to 512x512
Add face specific descriminator
Add hand specific descriminator
Mostly position body in fixed position

Converting to `tfjs`

Run ./convert_to_tfjs.sh

This will create a web_model directory with the model in tfjs, quantized to float16.

Additional Resources

Can Everybody Sign Now? Exploring Sign Language Video Generation from 2D Poses - evaluates generated videos quantitatively and qualitatively showing that the current models are not enough to generated adequate videos for Sign Language
Everybody Sign Now: Translating Spoken Language to Photo Realistic Sign Language Video - proposal for better pose-to-video generation models, in higher quality, and with person-look control

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
everybody_sign_now		everybody_sign_now
.gitignore		.gitignore
README.md		README.md
convert_to_tfjs.sh		convert_to_tfjs.sh
example.pose		example.pose
progress_sample.png		progress_sample.png
progress_video.py		progress_video.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deprecated

Everybody Sign Now

Data

Data processing

Download

Temporary File Storage

Training

Next Steps!

Converting to `tfjs`

Additional Resources

About

Releases

Packages

Languages

sign-language-processing/everybody-sign-now

Folders and files

Latest commit

History

Repository files navigation

Deprecated

Everybody Sign Now

Data

Data processing

Download

Temporary File Storage

Training

Next Steps!

Converting to tfjs

Additional Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Converting to `tfjs`

Packages