Skip to content

FedeNoce/s2l-s2d

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

S2L-S2D: Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation

This repository contains the code for the paper "Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation" (link). The paper presents a novel approach based on landmarks motion for generating 3D Talking Heads from speech. The code includes the implementation of two models proposed in the paper: S2L and S2D. Check out some qualitative results in this video

Installation

To run the code, you need to install the following dependencies:

  • Python 3.8
  • PyTorch-GPU 1.13.0
  • Trimesh 3.22.1
  • Librosa 3.9.2
  • Transformers 4.6.1 from Hugging Face
  • MPI-IS for mesh rendering (link)
  • Additional dependencies for running the demo: pysimplegui==4.60.5, sounddevice==0.4.6, soundfile==0.12.1

Training Setup

  1. Clone the repository:
git clone https://github.com/FedeNoce/s2l-s2d.git
  1. Download the vocaset dataset from here (Training Data, 8GB).
  2. Put the downloaded file into the "S2L/vocaset" and "S2D/vocaset" directories.
  3. To train S2L, preprocess the data by running "preprocess_voca_data.py" in the "S2L/vocaset" directory. Then, run "train_S2L.py".
  4. To train S2D, preprocess the data by running "Data_processing.py" in the "S2D" directory. Then, run "train_S2D.py".

Inference

  1. Download the pretrained models from here and place them in the "S2L/Results" and "S2D/Results" directories.
  2. Run the GUI demo using "demo.py".
  3. If you're interested, we have an updated version of the demo that allows us to reconstruct a user's face from a webcam photo using a 3DMM fitting. Before running the demo using "demo_with_rec.py" you'll need to download a file from here and place it in the "Rec/Values" directory.

Citation

If you use this code or find it helpful, please consider citing:

@misc{nocentini2023learning,
  title={Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation},
  author={Federico Nocentini and Claudio Ferrari and Stefano Berretti},
  year={2023},
  eprint={2306.01415},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Authors

About

Speech-driven 3D Talking Heads Generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages