CCGAN-VC

Shindong Lee

Korea University

Overview

Conditional CycleGAN Voice Converter which conducts non-parallel many-to-many voice conversion with single generator and discriminator

CycleGAN-VC was successful in non-parallel voice conversion for 2 speakers. CycleGAN-VC needs 2 generators and 2 discriminators for one-to-one voice conversion.

Conditional CycleGAN-VC (CCGAN-VC) utilizes speaker identity vectors for additional information and conditions generator and discriminator with them. CCGAN-VC needs only 1 generator and 1 discriminator for conducting many-to-many voice conversion.

This code was modified from Lei Mao's CycleGAN-VC implementation(https://github.com/leimao/Voice_Converter_CycleGAN).

Performance

Voice Quality

Sample voices are uploaded in 'sample' directory.

Training Time and Model Size

	1 CycleGAN-VC	50 CycleGAN-VCs	1 CCGAN-VC
Number of speakers	2	10	10
Converting Directions	2	100	100
Number of G and D	2	100	1
Training Time	0.7 days	34.6 days	3.9 days
Model Size	1.35GB	67.50GB	0.69GB

GPU: 2080ti
CycleGAN-VC: 2500 epochs, CCGAN-VC 250 epochs

Reference

CycleGAN-VC

"Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks" by Takuhiro Kaneko and Hirokazu Kameoka (paper: https://arxiv.org/abs/1711.11293)
(demo: http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/cyclegan-vc/)
(V2: http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/cyclegan-vc2/index.html)
(Tensorflow implementation by Lei Mao: https://github.com/leimao/Voice_Converter_CycleGAN)

Dataset

VCC2016 Dataset: (It's already inside the repo)
(http://vc-challenge.org/vcc2016/summary.html)

How to Run

Train

$ python train.py --speaker_name_list [speaker_name_list] --num_epochs [num_epochs] --train_dir [train_dir] --model_dir [model_dir] --model_name [model_name] --random_seed [random_seed] --validation_dir [validation_dir] --output_dir [output_dir] --tensorboard_log_dir [tensorboard_log_dir]

ex)

$ python train.py --speaker_name_list SF1 SF2 SF3 SM1 SM2 TF1 TF2 TM1 TM2 TM3 --num_epochs 251 --train_dir ./data/vcc2016_training/ --model_dir ./model --model_name CCGAN.ckpt --random_seed 0 --validation_dir ./data/evaluation_all/ --output_dir ./validation_output --tensorboard_log_dir ./log

All arguments are optional. (You can omit any argument)

ex)

$ python train.py --speaker_name_list SF1 SF2 SF3 SM1 SM2 TF1 TF2 TM1 TM2 TM3 --num_epochs 251

or

$ python train.py --speaker_name_list SF1 SF2 SF3 SM1 SM2 --num_epochs 501

or simply,

$ python train.py

It conducts validiton during training every 100 epochs for first speaker and last speaker of speaker name list.

Test

For 1 X 1 voice conversion (1 direction)

$ python convert.py --speaker_name_list [speaker_name_list] --src [src] --tar [tar] --model_dir [model_dir] --model_name [model_name] --data_dir [data_dir] --output_dir [output_dir]

ex)

$ python convert.py --speaker_name_list SF1 SF2 SF3 SM1 SM2 TF1 TF2 TM1 TM2 TM3 --src SF1 --tar TM3 model_dir ./model --model_name CCGAN.ckpt --data_dir ./data/evaluation_all/ --output_dir ./converted_voices/

or simply,

$ python convert.py --speaker_name_list SF1 SF2 SF3 SM1 SM2 TF1 TF2 TM1 TM2 TM3 --src SF1 --tar TM3

You must put all speakers' names into speaker_name_list in the order that you set in training, since it creates one-hot vector with the list.

For n X n voice conversion (n^2 direction)

$ python convert_all.py --speaker_name_list [speaker_name_list] --model_dir [model_dir] --model_name [model_name] --data_dir [data_dir] --output_dir [output_dir]

ex)

$ python convert_all.py --speaker_name_list SF1 SF2 SF3 SM1 SM2 TF1 TF2 TM1 TM2 TM3 model_dir ./model --model_name CCGAN.ckpt --data_dir ./data/evaluation_all/ --output_dir ./converted_voices/

or simply,

$ python convert_all.py --speaker_name_list SF1 SF2 SF3 SM1 SM2 TF1 TF2 TM1 TM2 TM3

or ex)

$ python convert_all.py --speaker_name_list SF1 SF2 SM1 TF1 TM1 TM3

You must put all speakers' names into speaker_name_list in the order that you set in training, since it creates one-hot vector with the list.

For n X n voice conversion (n^2 direction) with only selected sentences

$ python convert_all_sample.py --speaker_name_list [speaker_name_list] --wav_list [wav_list] --model_dir [model_dir] --model_name [model_name] --data_dir [data_dir] --output_dir [output_dir]

ex)

$ python convert_all_sample.py --speaker_name_list SF1 SF2 SF3 SM1 SM2 TF1 TF2 TM1 TM2 TM3 --wav_list 200001.wav 200002.wav 200003.wav model_dir ./model --model_name CCGAN.ckpt --data_dir ./data/evaluation_all/ --output_dir ./converted_voices/

or simply,

$ python convert_all_sample.py --speaker_name_list SF1 SF2 SF3 SM1 SM2 TF1 TF2 TM1 TM2 TM3

or ex)

$ python convert_all_sample.py --speaker_name_list SF1 SF2 SM1 TF1 TM1 TM3

You must put all speakers' names into speaker_name_list in the order that you set in training, since it creates one-hot vectors with the list.

Data Download

(Since data are already in this repo, this is only for when you lost them)

$ python download.py

For debugging

$ python weight.py

It prints max and min of weights connected to one-hot vectors.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
converted_voices		converted_voices
data		data
download		download
log		log
model		model
picture		picture
sample		sample
validation_output		validation_output
LICENSE		LICENSE
README.md		README.md
convert.py		convert.py
convert_all.py		convert_all.py
convert_all_sample.py		convert_all_sample.py
download.py		download.py
model.py		model.py
module.py		module.py
preprocess.py		preprocess.py
train.py		train.py
utils.py		utils.py
weight.py		weight.py

License

jillnano/CCGAN-VC

Folders and files

Latest commit

History

Repository files navigation

CCGAN-VC

Overview

Performance

Voice Quality

Training Time and Model Size

Reference

CycleGAN-VC

Dataset

How to Run

Train

Test

For 1 X 1 voice conversion (1 direction)

For n X n voice conversion (n^2 direction)

For n X n voice conversion (n^2 direction) with only selected sentences

Data Download

For debugging

About

Resources

License

Stars

Watchers

Forks

Languages