Audio-driven facial animation is the process that aotomatically synthesizes talking head video from speech signals.
This project presents an end-to-end system that take an image and a clip of audio to generate the talking video. The system can simplify the film animation process through automatic generation from the voice acting. It can also be applied inpost-production to achieve better lip-synchronization in movie dubbing.
This repository uses the model described in the paper Hierarchical Cross-modal Talking Face Generation with Dynamic Pixel-wise Loss.
- Pytorch environment:Pytorch 0.4.1. (conda install pytorch=0.4.1 torchvision cuda90 -c pytorch)
- Install requirements.txt (pip install -r requirement.txt)
- Download the pretrained ATnet and VGnet weights at google drive. Put the weights under
model
folder. - Run the demo code:
python demo.py
-device_ids
: gpu id-cuda
: using cuda or not-vg_model
: pretrained VGnet weight-at_model
: pretrained ATnet weight-lstm
: use lstm or not-p
: input example image-i
: input audio file-lstm
: use lstm or not-sample_dir
: folder to save the outputs- ...
This repository is based on repository ATVGnet.
MIT