Note: This repository is archived. Please visit https://github.com/Svito-zar/gesticulator for an up-to-date version of the Gesticulator repository, which includes all changes from here.
This repository contains my contributions to the implementation of Gesticulator during my summer internship at KTH Royal Institute of Technology, Division of Robotics, Perception and Learning. Furthermore, the agent_integration
branch contains several adaptations (e.g. FastText embedding instead of BERT and prosodic features instead of spectrograms) that make it possible to integrate the model into a conversational virtual agent in Unity.
The official (and up-to-date) repository of the Gesticulator project by Taras Kucherenko et al. can be found on this link.
- python3.6+
- ffmpeg (for visualization)
NOTE: during installation, there will be two error messages (one for bert-embedding and one for mxnet) about conflicting packages - those can be ignored.
git clone git@github.com:Svito-zar/gesticulator.git
cd gesticulator
pip install -r gesticulator/requirements.txt
pip install -e .
pip install -e gesticulator/visualization
For all the scripts which we refer to in this repo description there are several command line arguments which you can see by calling them with the --help
argument.
Head over to the demo
folder for a quick demonstration if you're not interested in training the model yourself.
- Pretrained model files can be loaded with the following command
from gesticulator.model import GesticulatorModel loaded_model = GesticulatorModel.load_from_checkpoint(<PATH_TO_MODEL_FILE>)
- If the
--save_model_every_n_epochs argument
is provided totrain.py
, then the model will be saved regularly during training.
- Download the Trinity Speech-Gesture dataset
- Either obtain transcriptions by yourself:
- Transcribe the audio using Automatic Speech Recognition (ASR), such as Google ASR
- Manually correct the transcriptions and add punctuations
- Or obtain already transcribed dataset as a participant of the GENEA Gesture Generation Challenge
- Place the dataset in the
dataset
folder next togesticulator
folder in three subfolders:speech
,motion
andtranscript
.
cd gesticulator/data_processing
# encode motion from BVH files into exponensial map representation
python bvh2features.py
# Split the dataset into training and validation
python split_dataset.py
# Encode all the features
python process_dataset.py
By default, the model expects the dataset in the dataset/raw
folder, and the processed dataset will be available in the dataset/processed folder
. If your dataset is elsewhere, please provide the correct paths with the --raw_data_dir
and --proc_data_dir
command line arguments.
In order to train and evaluate the model, run
cd ..
python train.py
The model configuration and the training parameters are automatically read from the gesticulator/config/default_model_config.yaml
file.
The results will be available in the results/last_run/
folder, where you will find the Tensorboard logs alongside with the trained model file and the generated output on the semantic test segments (described in the paper).
If the --run_name <name>
command-line argument is provided, the results/<name>
folder will be created and the results will be stored there. This can be very useful when you want to keep your logs and outputs for separate runs.
To train the model on the GPU, provide the --gpus
argument. For details regarding training parameters, please visit this link.
The gestures generated during training, validation and testing can be found in the results/<run_name>/generated_gestures
folder. By default, we only store the outputs on the semantic test segments, but other outputs can be saved as well - see the config file for the corresponding parameters.
In order to manually generate the the videos from the raw coordinates, run
cd visualization/aamas20_visualizer
python generate_videos.py
If you changed the arguments of train.py
(e.g. run_name
), you might have to provide them for generate_videos.py
as well.
Please check the required arguments by running
python generate_videos.py --help
For the quantitative evaluation (velocity histograms and jerk), you may use the scripts in the gesticulator/obj_evaluation
folder.
For using the dataset I have used in this work, please don't forget to cite Trinity Speech-Gesture dataset using their IVA'18 paper.
For using the code base itself please cite the Gesticulator paper.
If you encounter any problems/bugs/issues please contact me on Github or by emailing me at tarask@kth.se for any bug reports/questions/suggestions. I prefer questions and bug reports on Github as that provides visibility to others who might be encountering same issues or who have the same questions.