Skip to content

SadTaker LongVideos是一种经过修改的图像到视频算法,支持无OOM【内存溢出:俗称爆内存】错误的长视频。 原始实现方式随着音频的持续时间线性地消耗内存。这意味着一个3分钟的剪辑需要8GB的vRam【内存】;30分钟需要80GB【内存】 本项目已经修改了脚本,将张量从vRAM和RAM卸载到磁盘中。这样,目标视频可以是任何长度。没有OOM【内存溢出:俗称爆内存】错误。 作者:AIGC666 链接:https://www.codewithgpu.com/i/mjavadpur/Sadtalker_LongVideos/Sadtalker_LongVideos 来源:CodeWithGpu 著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

License

javarike/sadtalker_longvideos

 
 

Repository files navigation

SadTalker-Unlocked is the image to video algorithm modified to support long videos with no OOM errors.

  • The original implementation consumes memory linearly with the duration of the audio. It means a 3 minute clip needs 8 GBs of vRam; 30 minutes needs 80 GBs.
  • I have modified the script to unload the tensor from both vRAM and RAM into disk. This way the target video can be any length. No OOM errors.

1. Installation.

Community tutorials: 中文Windows教程 (Chinese Windows tutorial) | 日本語コース (Japanese tutorial).

Linux/Unix

  1. Install Anaconda, Python and git.

  2. Creating the env and install the requirements.

git clone https://github.com/OpenTalker/SadTalker.git

cd SadTalker 

conda create -n sadtalker python=3.8

conda activate sadtalker

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

conda install ffmpeg

pip install -r requirements.txt

### Coqui TTS is optional for gradio demo. 
### pip install TTS

Windows

A video tutorial in chinese is available here. You can also follow the following instructions:

  1. Install Python 3.8 and check "Add Python to PATH".
  2. Install git manually or using Scoop: scoop install git.
  3. Install ffmpeg, following this tutorial or using scoop: scoop install ffmpeg.
  4. Download the SadTalker repository by running git clone https://github.com/davoodwadi/SadTalker-Unlocked.git.
  5. Download the checkpoints and gfpgan models in the downloads section.
  6. Run start.bat from Windows Explorer as normal, non-administrator, user, and a Gradio-powered WebUI demo will be started.

macOS

A tutorial on installing SadTalker on macOS can be found here.

Docker, WSL, etc

Please check out additional tutorials here.

2. Download Models

You can run the following script on Linux/macOS to automatically download all the models:

bash scripts/download_models.sh

We also provide an offline patch (gfpgan/), so no model will be downloaded when generating.

Pre-Trained Models

GFPGAN Offline Patch

Model Details

Model explains:

New version
Model Description
checkpoints/mapping_00229-model.pth.tar Pre-trained MappingNet in Sadtalker.
checkpoints/mapping_00109-model.pth.tar Pre-trained MappingNet in Sadtalker.
checkpoints/SadTalker_V0.0.2_256.safetensors packaged sadtalker checkpoints of old version, 256 face render).
checkpoints/SadTalker_V0.0.2_512.safetensors packaged sadtalker checkpoints of old version, 512 face render).
gfpgan/weights Face detection and enhanced models used in facexlib and gfpgan.
Old version
Model Description
checkpoints/auido2exp_00300-model.pth Pre-trained ExpNet in Sadtalker.
checkpoints/auido2pose_00140-model.pth Pre-trained PoseVAE in Sadtalker.
checkpoints/mapping_00229-model.pth.tar Pre-trained MappingNet in Sadtalker.
checkpoints/mapping_00109-model.pth.tar Pre-trained MappingNet in Sadtalker.
checkpoints/facevid2vid_00189-model.pth.tar Pre-trained face-vid2vid model from the reappearance of face-vid2vid.
checkpoints/epoch_20.pth Pre-trained 3DMM extractor in Deep3DFaceReconstruction.
checkpoints/wav2lip.pth Highly accurate lip-sync model in Wav2lip.
checkpoints/shape_predictor_68_face_landmarks.dat Face landmark model used in dilb.
checkpoints/BFM 3DMM library file.
checkpoints/hub Face detection models used in face alignment.
gfpgan/weights Face detection and enhanced models used in facexlib and gfpgan.

The final folder will be shown as:

image

3. Quick Start

The usage is similar to the original implementation Please read the document best practices and configuration tips.

    Open In Colab   Hugging Face Spaces   sd webui-colab  
Replicate Discord

Wenxuan Zhang *,1,2Xiaodong Cun *,2Xuan Wang 3Yong Zhang 2Xi Shen 2
Yu Guo1 Ying Shan 2   Fei Wang 1

1 Xi'an Jiaotong University   2 Tencent AI Lab   3 Ant Group  

CVPR 2023

sadtalker

TL;DR:       single portrait image 🙎‍♂️      +       audio 🎤       =       talking head video 🎞.


Highlights

  • The license has been updated to Apache 2.0, and we've removed the non-commercial restriction

  • SadTalker has now officially been integrated into Discord, where you can use it for free by sending files. You can also generate high-quailty videos from text prompts. Join: Discord

  • We've published a stable-diffusion-webui extension. Check out more details here. Demo Video

  • Full image mode is now available! More details...

still+enhancer in v0.0.1 still + enhancer in v0.0.2 input image @bagbag1815
still_e_n.mp4
full_body_2.bus_chinese_enhanced.mp4
  • Several new modes (Still, reference, and resize modes) are now available!

  • We're happy to see more community demos on bilibili, YouTube and X (#sadtalker).

Changelog

The previous changelog can be found here.

  • [2023.06.12]: Added more new features in WebUI extension, see the discussion here.

  • [2023.06.05]: Released a new 512x512px (beta) face model. Fixed some bugs and improve the performance.

  • [2023.04.15]: Added a WebUI Colab notebook by @camenduru: sd webui-colab

  • [2023.04.12]: Added a more detailed WebUI installation document and fixed a problem when reinstalling.

  • [2023.04.12]: Fixed the WebUI safe issues becasue of 3rd-party packages, and optimized the output path in sd-webui-extension.

  • [2023.04.08]: In v0.0.2, we added a logo watermark to the generated video to prevent abuse. This watermark has since been removed in a later release.

  • [2023.04.08]: In v0.0.2, we added features for full image animation and a link to download checkpoints from Baidu. We also optimized the enhancer logic.

To-Do

We're tracking new updates in issue #280.

Troubleshooting

If you have any problems, please read our FAQs before opening an issue.

WebUI Demos

Online Demo: HuggingFace | SDWebUI-Colab | Colab

Local WebUI extension: Please refer to WebUI docs.

Local gradio demo (recommanded): A Gradio instance similar to our Hugging Face demo can be run locally:

## you need manually install TTS(https://github.com/coqui-ai/TTS) via `pip install tts` in advanced.
python app_sadtalker.py

You can also start it more easily:

  • windows: just double click webui.bat, the requirements will be installed automatically.
  • Linux/Mac OS: run bash webui.sh to start the webui.

CLI usage

Animating a portrait image from default config:
python inference.py --driven_audio <audio.wav> \
                    --source_image <video.mp4 or picture.png> \
                    --enhancer gfpgan 

The results will be saved in results/$SOME_TIMESTAMP/*.mp4.

Full body/image Generation:

Using --still to generate a natural full body video. You can add enhancer to improve the quality of the generated video.

python inference.py --driven_audio <audio.wav> \
                    --source_image <video.mp4 or picture.png> \
                    --result_dir <a file to store results> \
                    --still \
                    --preprocess full \
                    --enhancer gfpgan 

More examples and configuration and tips can be founded in the >>> best practice documents <<<.

Citation

If you find our work useful in your research, please consider citing:

@article{zhang2022sadtalker,
  title={SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation},
  author={Zhang, Wenxuan and Cun, Xiaodong and Wang, Xuan and Zhang, Yong and Shen, Xi and Guo, Yu and Shan, Ying and Wang, Fei},
  journal={arXiv preprint arXiv:2211.12194},
  year={2022}
}

Acknowledgements

Facerender code borrows heavily from zhanglonghao's reproduction of face-vid2vid and PIRender. We thank the authors for sharing their wonderful code. In training process, we also used the model from Deep3DFaceReconstruction and Wav2lip. We thank for their wonderful work.

We also use the following 3rd-party libraries:

Extensions:

Related Works

Disclaimer

This is not an official product of Tencent.

1. Please carefully read and comply with the open-source license applicable to this code before using it. 
2. Please carefully read and comply with the intellectual property declaration applicable to this code before using it.
3. This open-source code runs completely offline and does not collect any personal information or other data. If you use this code to provide services to end-users and collect related data, please take necessary compliance measures according to applicable laws and regulations (such as publishing privacy policies, adopting necessary data security strategies, etc.). If the collected data involves personal information, user consent must be obtained (if applicable). Any legal liabilities arising from this are unrelated to Tencent.
4. Without Tencent's written permission, you are not authorized to use the names or logos legally owned by Tencent, such as "Tencent." Otherwise, you may be liable for legal responsibilities.
5. This open-source code does not have the ability to directly provide services to end-users. If you need to use this code for further model training or demos, as part of your product to provide services to end-users, or for similar use, please comply with applicable laws and regulations for your product or service. Any legal liabilities arising from this are unrelated to Tencent.
6. It is prohibited to use this open-source code for activities that harm the legitimate rights and interests of others (including but not limited to fraud, deception, infringement of others' portrait rights, reputation rights, etc.), or other behaviors that violate applicable laws and regulations or go against social ethics and good customs (including providing incorrect or false information, spreading pornographic, terrorist, and violent information, etc.). Otherwise, you may be liable for legal responsibilities.

LOGO: color and font suggestion: ChatGPT, logo font: Montserrat Alternates .

All the copyrights of the demo images and audio are from community users or the generation from stable diffusion. Feel free to contact us if you would like use to remove them.

About

SadTaker LongVideos是一种经过修改的图像到视频算法,支持无OOM【内存溢出:俗称爆内存】错误的长视频。 原始实现方式随着音频的持续时间线性地消耗内存。这意味着一个3分钟的剪辑需要8GB的vRam【内存】;30分钟需要80GB【内存】 本项目已经修改了脚本,将张量从vRAM和RAM卸载到磁盘中。这样,目标视频可以是任何长度。没有OOM【内存溢出:俗称爆内存】错误。 作者:AIGC666 链接:https://www.codewithgpu.com/i/mjavadpur/Sadtalker_LongVideos/Sadtalker_LongVideos 来源:CodeWithGpu 著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.1%
  • Shell 1.5%
  • Jupyter Notebook 1.3%
  • Batchfile 0.1%