Skip to content

Latest commit

 

History

History
152 lines (95 loc) · 5.74 KB

README.zh-CN.md

File metadata and controls

152 lines (95 loc) · 5.74 KB

CycleGAN-VC2-PyTorch

standard-readme compliant

中文说明 | English

本项目使用PyTorch复现论文:CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion, 在音色转换/声音克隆方面非常优秀的算法模型.

本项目使用CycleGAN实现语音转换(Voice Conversion),即将一个人的语音转换成另一个人的语音,或将男性的语音转换成女性的语音,反之亦然。CycleGAN是一种基于对抗生成网络(GAN)的模型,它可以自动学习如何将两个不同领域的数据进行转换,例如将照片转换成艺术作品。在这个项目中,CycleGAN被用来学习两个不同人的语音之间的映射关系,从而实现语音转换。该项目的实现基于PyTorch框架,同时使用了Mel-spectrogram特征提取和WaveNet声码器来生成转换后的语音。


CycleGAN-VC2

To advance the research on non-parallel VC, we propose CycleGAN-VC2, which is an improved version of CycleGAN-VC incorporating three new techniques: an improved objective (two-step adversarial losses), improved generator (2-1-2D CNN), and improved discriminator (Patch GAN).

network


本项目包括:

  1. 模型代码 ,复现论文中的算法模型.
  2. 语音预处理,对训练数据进行处理.
  3. 训练代码,训练模型.
  4. Examples of Voice Conversion - 模型训练后的转换样本。

内容列表


依赖

pip install -r requirements.txt

用法

预处理

python preprocess_training.py

自定义参数执行:

python preprocess_training.py --train_A_dir ./data/S0913/ --train_B_dir ./data/gaoxiaosong/ --cache_folder ./cache/

训练

python train.py

自定义参数执行:

python train.py --logf0s_normalization ./cache/logf0s_normalization.npz --mcep_normalization ./cache/mcep_normalization.npz --coded_sps_A_norm ./cache/coded_sps_A_norm.pickle --coded_sps_B_norm ./cache/coded_sps_B_norm.pickle --model_checkpoint ./model_checkpoint/ --resume_training_at ./model_checkpoint/_CycleGAN_CheckPoint --validation_A_dir ./data/S0913/ --output_A_dir ./converted_sound/S0913 --validation_B_dir ./data/gaoxiaosong/ --output_B_dir ./converted_sound/gaoxiaosong/

预训练模型

a pretrained model which converted between S0913 and GaoXiaoSong

download from Google Drive <735MB>


Demo

使用预训练模型转换的样本:

说话人A: S0913(./data/S0913/BAC009S0913W0351.wav)

说话人B: GaoXiaoSong(./data/gaoxiaosong/gaoxiaosong_1.wav)

说话人A的语音转换为说话人B的音色: Converted from S0913 to GaoXiaoSong (./converted_sound/S0913/BAC009S0913W0351.wav)


Star-History

star-history


引用

  1. CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion. Paper, Project
  2. Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks. Paper, Project
  3. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. Paper, Project, Code
  4. Image-to-Image Translation with Conditional Adversarial Nets. Paper, Project, Code

捐赠

If this project help you reduce time to develop, you can give me a cup of coffee :)

AliPay(支付宝)

ali_pay

WechatPay(微信)

wechat_pay

paypal


License

MIT © Kun