zhrtvc

Chinese Real Time Voice Cloning

tips: 中文或汉语的语言缩写简称是zh。

关注【啊啦嘻哈】微信公众号，回复一个字【听】，小萝莉有话对你说哦^v^

版本

v1.1.5

使用说明和注意事项详见readme

tips: 需要进入zhrtvc项目的代码子目录【zhrtvc】运行代码。

原始语音和克隆语音对比样例

链接: https://pan.baidu.com/s/1TQwgzEIxD2VBrVZKCblN1g

提取码: 8ucd

中文语音语料

中文语音语料zhvoice，语音更加清晰自然，包含8个开源数据集，3200个说话人，900小时语音，1300万字。

zhvoice语料可用于训练语音克隆的基础模型。

中文语音语料训练的语音合成器模型

name: logs-synx.zip

智浪淘沙训练和分享。

用中文的文本语音平行语料训练得到的语音合成器模型。

链接: https://pan.baidu.com/s/1ovtu1n3eF7y0JzSxstQC7w

提取码: f4jx

中文开源语音训练的语音编码器模型

name: ge2e_pretrained_iwater.pt

iwater训练和分享。

用中文开源语音语料训练的语音编码器模型。

链接: https://pan.baidu.com/s/1-5r_YXQOg2vZnuEh1Slxaw

提取码:19kh

toolbox

合成样例

aliaudio-Aibao-004113.wav

aliaudio-Aimei-007261.wav

aliaudio-Aina-000819.wav

aliaudio-Aiqi-009619.wav

aliaudio-Aitong-003149.wav

aliaudio-Aiwei-009461.wav

注意

跑提供的模型建议用Griffin-Lim声码器，目前MelGAN和WaveRNN没有完全适配。

目录介绍

zhrtvc

代码，包括encoder、synthesizer、vocoder、toolbox模块，包括模型训练的模块和可视化合成语音的模块。

执行脚本需要进入zhrtvc目录操作。

代码相关的说明详见zhrtvc目录下的readme文件。

models

预训练的模型，包括encoder、synthesizer、vocoder的模型。

预训练的模型在百度网盘下载，下载后解压，替换models文件夹即可。

样本模型

链接：https://pan.baidu.com/s/14hmJW7sY5PYYcCFAbqV0Kw

提取码：zl9i

data

语料样例，包括语音和文本对齐语料，处理好的用于训练synthesizer的数据样例。

可以直接执行synthesizer_preprocess_audio.py和synthesizer_preprocess_embeds.py把samples的语音文本对齐语料转为SV2TTS的用于训练synthesizer的数据。

语料样例在百度网盘下载，下载后解压，替换data文件夹即可。

样本数据

链接：https://pan.baidu.com/s/1Q_WUrmb7MW_6zQSPqhX9Vw

提取码：bivr

注意： 该语料样例用于测试跑通模型，数据量太少，不可能使得模型收敛，即不会训练出可用模型。在测试跑通模型情况下，处理自己的数据为语料样例的格式，用自己的数据训练模型即可。

学习交流

【AI解决方案交流群】QQ群：925294583

点击链接加入群聊：https://jq.qq.com/?_wv=1027&k=wlQzvT0N

Real-Time Voice Cloning

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. Feel free to check my thesis if you're curious or if you're looking for info I haven't documented yet (don't hesitate to make an issue for that too). Mostly I would recommend giving a quick look to the figures beyond the introduction.

SV2TTS is a three-stage deep learning framework that allows to create a numerical representation of a voice from a few seconds of audio, and to use it to condition a text-to-speech model trained to generalize to new voices.

Papers implemented

URL	Designation	Title	Implementation source
1806.04558	SV2TTS	Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis	This repo
1802.08435	WaveRNN (vocoder)	Efficient Neural Audio Synthesis	fatchord/WaveRNN
1712.05884	Tacotron 2 (synthesizer)	Natural TTS Synthesis by Conditioning Wavenet on Mel Spectrogram Predictions	Rayhane-mamah/Tacotron-2
1710.10467	GE2E (encoder)	Generalized End-To-End Loss for Speaker Verification	This repo

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
data		data
files		files
models		models
zhrtvc		zhrtvc
README.md		README.md
zhrtvc.png		zhrtvc.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

files

files

models

models

zhrtvc

zhrtvc

README.md

README.md

zhrtvc.png

zhrtvc.png

Repository files navigation

zhrtvc

版本

目录介绍

zhrtvc

models

data

学习交流

Real-Time Voice Cloning

Papers implemented

About

Releases

Packages

Languages

splinter21/zhrtvc

Folders and files

Latest commit

History

Repository files navigation

zhrtvc

版本

目录介绍

zhrtvc

models

data

学习交流

Real-Time Voice Cloning

Papers implemented

About

Resources

Stars

Watchers

Forks

Languages