SyncTalk: The Devil😈 is in the Synchronization for Talking Head Synthesis [CVPR 2024]

The official repository of the paper SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis

Colab notebook demonstration:

The proposed SyncTalk synthesizes synchronized talking head videos, employing tri-plane hash representations to maintain subject identity. It can generate synchronized lip movements, facial expressions, and stable head poses, and restores hair details to create high-resolution videos.

🔥🔥🔥 For Windows

Thanks to okgpt, we have launched a Windows integration package, you can download SyncTalk-Windows.zip and unzip it, double-click inference.bat to run the demo.

Download link: Hugging Face || Baidu Netdisk

For Linux

Installation

Tested on Ubuntu 18.04, Pytorch 1.12.1 and CUDA 11.3.

git clone https://github.com/ZiqiaoPeng/SyncTalk.git
cd SyncTalk

Install dependency

conda create -n synctalk python==3.8.8
conda activate synctalk
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt
pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py38_cu113_pyt1121/download.html
pip install ./freqencoder
pip install ./shencoder
pip install ./gridencoder
pip install ./raymarching

If you encounter problems installing PyTorch3D, you can use the following command to install it:

python ./scripts/install_pytorch3d.py

Data Preparation

Please place the May.zip in the data folder, the trial_may.zip in the model folder, and then unzip them.

Quick Start

Run the evaluation code

python main.py data/May --workspace model/trial_may -O --test --asr_model ave

python main.py data/May --workspace model/trial_may -O --test --asr_model ave --portrait

“ave” refers to our Audio Visual Encoder, “portrait” signifies pasting the generated face back onto the original image, representing higher quality. If it runs correctly, you will get the following results.

Setting	PSNR	LPIPS	LMD
SyncTalk (w/o Portrait)	32.201	0.0394	2.822
SyncTalk (Portrait)	37.644	0.0117	2.825

This is for a single subject; the paper reports the average results for multiple subjects.

Inference with target audio

python main.py data/May --workspace model/trial_may -O --test --test_train --asr_model ave --portrait --aud ./demo/test.wav

Please use files with the “.wav” extension for inference, and the inference results will be saved in “model/trial_may/results/”.

Train

# by default, we load data from disk on the fly.
# we can also preload all data to CPU/GPU for faster training, but this is very memory-hungry for large datasets.
# `--preload 0`: load from disk (default, slower).
# `--preload 1`: load to CPU (slightly slower)
# `--preload 2`: load to GPU (fast)
python main.py data/May --workspace model/trial_may -O --iters 60000 --asr_model ave
python main.py data/May --workspace model/trial_may -O --iters 100000 --finetune_lips --patch_size 64 --asr_model ave

# or you can use the script to train
sh ./scripts/train_may.sh

Test

python main.py data/May --workspace model/trial_may -O --test --asr_model ave --portrait

TODO

Release Training Code.
Release Pre-trained Model.
Release Google Colab.
Release Preprocessing Code.

Citation

@InProceedings{peng2023synctalk,
  title     = {SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis}, 
  author    = {Ziqiao Peng and Wentao Hu and Yue Shi and Xiangyu Zhu and Xiaomei Zhang and Jun He and Hongyan Liu and Zhaoxin Fan},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month     = {June},
  year      = {2024},
}

Acknowledgement

This code is developed heavily relying on ER-NeRF, and also RAD-NeRF, GeneFace, DFRF, AD-NeRF, and Deep3DFaceRecon_pytorch.

Thanks for these great projects.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SyncTalk: The Devil😈 is in the Synchronization for Talking Head Synthesis [CVPR 2024]

🔥🔥🔥 For Windows

For Linux

Installation

Install dependency

Data Preparation

Quick Start

Run the evaluation code

Inference with target audio

Train

Test

TODO

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets/image		assets/image
data		data
demo		demo
freqencoder		freqencoder
gridencoder		gridencoder
model		model
nerf_triplane		nerf_triplane
raymarching		raymarching
scripts		scripts
shencoder		shencoder
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

SyncTalk: The Devil😈 is in the Synchronization for Talking Head Synthesis [CVPR 2024]

🔥🔥🔥 For Windows

For Linux

Installation

Install dependency

Data Preparation

Quick Start

Run the evaluation code

Inference with target audio

Train

Test

TODO

Citation

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages