数字人对话demo

基于开源ASR、LLM、TTS、THG的数字人对话demo，首包延迟3-5s。

在线demo：https://www.modelscope.cn/studios/AI-ModelScope/video_chat

详细的技术介绍请看这篇文章

中文简体 | English

技术选型

ASR (Automatic Speech Recognition): FunASR
LLM (Large Language Model): Qwen
TTS (Text to speech): GPT-SoVITS, CosyVoice
THG (Talking Head Generation): MuseTalk

本地部署

环境配置

ubuntu 22.04
python 3.10
torch 2.1.2

$ git lfs install
$ git clone https://www.modelscope.cn/studios/AI-ModelScope/video_chat.git
$ conda create -n metahuman python=3.10
$ conda activate metahuman
$ cd video_chat
$ pip install -r requirement.txt
$ pip install --upgrade gradio # 安装Gradio 5

权重下载

1. 创空间下载（推荐）

创空间仓库已设置git lfs追踪权重文件，如果是通过git clone https://www.modelscope.cn/studios/AI-ModelScope/video_chat.git克隆，则无需额外配置

2. 手动下载

2.1 MuseTalk

参考这个链接

目录如下：

./weights/
├── dwpose
│   └── dw-ll_ucoco_384.pth
├── face-parse-bisent
│   ├── 79999_iter.pth
│   └── resnet18-5c106cde.pth
├── musetalk
│   ├── musetalk.json
│   └── pytorch_model.bin
├── sd-vae-ft-mse
│   ├── config.json
│   └── diffusion_pytorch_model.bin
└── whisper
    └── tiny.pt

2.2 GPT-SoVITS

参考这个链接

启动服务

$ python app.py

使用自定义的数字人形象

在/data/video/中添加录制好的数字人形象视频
修改/src/thg.py中Muse_Talk类的avatar_list，加入(形象名, bbox_shfit)，关于bbox_shift的说明参考这个链接
在/app.py中Gradio的avatar_name中加入数字人形象名后重新启动服务，等待完成初始化即可。

TODO

音色克隆 ✅
链路优化（端到端语音API）

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
docs		docs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

数字人对话demo

技术选型

本地部署

环境配置

权重下载

1. 创空间下载（推荐）

2. 手动下载

启动服务

使用自定义的数字人形象

TODO

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

数字人对话demo

技术选型

本地部署

环境配置

权重下载

1. 创空间下载（推荐）

2. 手动下载

启动服务

使用自定义的数字人形象

TODO

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages