The official repository of the paper SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis
Paper | Project Page | Code
The proposed SyncTalk synthesizes synchronized talking head videos, employing tri-plane hash representations to maintain subject identity. It can generate synchronized lip movements, facial expressions, and stable head poses, and restores hair details to create high-resolution videos.
Thanks to okgpt, we have launched a Windows integration package, you can download SyncTalk-Windows.zip and unzip it, double-click inference.bat to run the demo.
Download link: Hugging Face || Baidu Netdisk
Tested on Ubuntu 18.04, Pytorch 1.12.1 and CUDA 11.3.
git clone https://github.com/ZiqiaoPeng/SyncTalk.git
cd SyncTalkconda create -n synctalk python==3.8.8
conda activate synctalk
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt
pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py38_cu113_pyt1121/download.html
pip install ./freqencoder
pip install ./shencoder
pip install ./gridencoder
pip install ./raymarchingIf you encounter problems installing PyTorch3D, you can use the following command to install it:
python ./scripts/install_pytorch3d.pyPlease place the May.zip in the data folder, the trial_may.zip in the model folder, and then unzip them.
python main.py data/May --workspace model/trial_may -O --test --asr_model ave
python main.py data/May --workspace model/trial_may -O --test --asr_model ave --portrait“ave” refers to our Audio Visual Encoder, “portrait” signifies pasting the generated face back onto the original image, representing higher quality. If it runs correctly, you will get the following results.
| Setting | PSNR | LPIPS | LMD |
|---|---|---|---|
| SyncTalk (w/o Portrait) | 32.201 | 0.0394 | 2.822 |
| SyncTalk (Portrait) | 37.644 | 0.0117 | 2.825 |
This is for a single subject; the paper reports the average results for multiple subjects.
python main.py data/May --workspace model/trial_may -O --test --test_train --asr_model ave --portrait --aud ./demo/test.wavPlease use files with the “.wav” extension for inference, and the inference results will be saved in “model/trial_may/results/”.
# by default, we load data from disk on the fly.
# we can also preload all data to CPU/GPU for faster training, but this is very memory-hungry for large datasets.
# `--preload 0`: load from disk (default, slower).
# `--preload 1`: load to CPU (slightly slower)
# `--preload 2`: load to GPU (fast)
python main.py data/May --workspace model/trial_may -O --iters 60000 --asr_model ave
python main.py data/May --workspace model/trial_may -O --iters 100000 --finetune_lips --patch_size 64 --asr_model ave
# or you can use the script to train
sh ./scripts/train_may.shpython main.py data/May --workspace model/trial_may -O --test --asr_model ave --portrait- Release Training Code.
- Release Pre-trained Model.
- Release Google Colab.
- Release Preprocessing Code.
@InProceedings{peng2023synctalk,
title = {SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis},
author = {Ziqiao Peng and Wentao Hu and Yue Shi and Xiangyu Zhu and Xiaomei Zhang and Jun He and Hongyan Liu and Zhaoxin Fan},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
}
This code is developed heavily relying on ER-NeRF, and also RAD-NeRF, GeneFace, DFRF, AD-NeRF, and Deep3DFaceRecon_pytorch.
Thanks for these great projects.
