Advances in speech driven animation techniques now allow creating convincing animations of virtual characters solely from audio data. While many approaches focus on facial and lip motion, they often do not provide realistic animation of the inner mouth. Performance or motion capture of the tongue and jaw from video alone is difficult because the inner mouth is only partially observable during speech. In this work, we collected a large-scale speech to tongue mocap dataset that focuses on capturing tongue, jaw, and lip motion during speech. This dataset enables research on data-driven techniques for realistic inner mouth animation. We present a method that leverages recent deep-learning based audio feature representations to build a robust and generalizable speech to animation pipeline. We find that self-supervised deep learning based audio feature encoders are robust and generalize well to unseen speakers and content.
Links: [Project] | [Paper] | [Video] | [Data]
The data can be downloaded from this link. The dataset includes:
- Mono audio in
wav
format with a sample rate of 16 kHz - EMA 3D landmark sequences @ 50 FPS
- Audio transcripts
👷👷👷 UNDER CONSTRUCTION 👷👷👷
Create the conda environment from the yaml file envs/tongueanim.yaml
conda create -f envs/tongueanim.yaml
Our best model uses Wav2Vec audio features. For this you need to download the model from the Fairseq repository and place it under the models/
folder.
Our pipeline consists of the following stage:
- Extract audio features from wav2vec model
- Build the dataset to train the model
- Train the landmark prediction model
- Evaluate the model
- Visualize the model
If you find this work useful on your research, please cite our work:
@inproceedings{medina2022speechtongue,
title={Speech Driven Tongue Animation},
author={Medina, Salvador and Tomé, Denis and Stoll, Carsten and Tiede, Mark and Munhall, Kevin and Hauptmann, Alex and Matthews, Iain},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2022},
organization={IEEE/CVF}
}
Our code is released under MIT License.
The license agreement for the data usage implies citation of the paper. Please notice that citing the dataset URL instead of the publication would not be compliant with this license agreement.