ProbTalk: Towards Variable and Coordinated Holistic Co-Speech Motion Generation [CVPR2024]

The official PyTorch implementation of the CVPR2024 paper "Towards Variable and Coordinated Holistic Co-Speech Motion Generation".

Please visit our webpage for more details.

TODO

Training code.
Testing code.

Getting started

The training code was tested on Ubuntu 18.04.5 LTS and the visualization code was test on Windows 11, and it requires:

Python 3.8
conda3 or miniconda3
CUDA capable GPU (12GB+ GPU memory)

1. Setup environment

Clone the repo:

git clone https://github.com/feifeifeiliu/probtalk.git
cd probtalk

Create conda environment:

conda create --name probtalk python=3.8
conda activate probtalk
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 -c pytorch
pip install -r requirements.txt

Please install MPI-Mesh.

2. Get data (to do)

3. Download the pretrained models

Download pretrained models, unzip and place it in the ProbTalk folder, i.e. path-to-ProbTalk/experiments.

4. Training (to do)

5. Testing (to do)

5. Visualization

If you ssh into the linux machine, NotImplementedError might occur. In this case, please refer to issue for solving the error.

Download smplx model (Please register in the official SMPLX webpage before you use it.) and place it in path-to-ProbTalk/visualise/smplx_model. To visualise the demo videos, run:

bash demo.sh

The videos and generated motion data are saved in ./visualise/video/demo.

If you ssh into the linux machine, there might be an error about OffscreenRenderer. In this case, please refer to issue for solving the error.

Citation

If you find our work useful to your research, please consider citing:

@article{liu2024towards,
    title={Towards Variable and Coordinated Holistic Co-Speech Motion Generation},
    author={Liu, Yifei and Cao, Qiong and Wen, Yandong and Jiang, Huaiguang and Ding, Changxing},
    journal={arXiv preprint arXiv:2404.00368},
    year={2024}
}

@inproceedings{yi2023generating,
    title={Generating Holistic 3D Human Motion from Speech},
    author={Yi, Hongwei and Liang, Hualin and Liu, Yifei and Cao, Qiong and Wen, Yandong and Bolkart, Timo and Tao, Dacheng and Black, Michael J},
    booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, 
    pages={469-480},
    month={June}, 
    year={2023} 
}

Acknowledgements

We thank Hongwei Yi for the insightful discussions, Hualin Liang for helping us conduct user study.

For functions or scripts that are based on external sources, we acknowledge the origin individually in each file.
Here are some great resources we benefit:

Freeform, TalkShow for training pipeline
MPI-Mesh, Pyrender, Smplx, VOCA for rendering
Wav2Vec2 and Faceformer for audio encoder

License

This code and model are available for non-commercial and commercial purposes as defined in the LICENSE (i.e., MIT LICENSE). Note that, using ProbTalk, you have to register SMPL-X and agree with the LICENSE of it, and it's not MIT LICENSE, you can check the LICENSE of SMPL-X from https://github.com/vchoutas/smplx/blob/main/LICENSE; Enjoy your journey of exploring more beautiful avatars in your own application.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.idea		.idea
config		config
data_utils		data_utils
demo_audio		demo_audio
evaluation		evaluation
losses		losses
nets		nets
scripts		scripts
trainer		trainer
visualise		visualise
voca		voca
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
demo.sh		demo.sh
requirements.txt		requirements.txt

feifeifeiliu/probtalk

Folders and files

Latest commit

History

Repository files navigation

ProbTalk: Towards Variable and Coordinated Holistic Co-Speech Motion Generation [CVPR2024]

TODO

Getting started

1. Setup environment

2. Get data (to do)

3. Download the pretrained models

4. Training (to do)

5. Testing (to do)

5. Visualization

Citation

Acknowledgements

License

About

Resources

Stars

Watchers

Forks

Languages