GitHub

BetaVAE_VC

This repo contains code for paper "Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using ß-VAE" in SLT 2022.

Samples | Paper | Pretrained Models

0. Setup Conda Environment

conda env create -f environment.yaml
conda activate betavae-vc-env

1. Data preprocessing

Download corpus

English: VCTK
Mandarin: AISHELL3

Modify the paths specified in configs/haparams.py: corpus_dir for both VCTK and AiShell3, dataset_dir for extracted features and TFRecord files.
Prepare the dataset for training:

python preprocess.py

2. Training

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python train.py --out_dir ./outputs --data_dir /path/to/save/features/tfrecords

3. Inference

# inference from mels
# test-mels.txt contains list of paths for mel-spectrograms with *.npy format, one path per line
CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference-from-mel.py --ckpt_path ./outputs/models/ckpt-500 --test_dir outputs/tests --src_mels test-mels.txt --ref_mels test-mels.txt

# inference from wavs
# test-wavs.txt contains list of paths for speech with *.wav format, one path per line
CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference-from-wav.py --ckpt_path ./outputs/models/ckpt-500 --test_dir outputs/tests --src_wavs test-wavs.txt --ref_wavs test-wavs.txt

4. Latent extraction

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python feature_extraction.py --data_dir /path/to/save/features/tfrecords --save_dir ./outputs/features --ckpt_path ./outputs/models/ckpt-300

5. EER computation based on the extracted latents

# compute EER using content embeddings
python tests/compute_eer.py --data_dir ./outputs/features/EN --mode content
# compute EER using speaker embeddings
python tests/compute_eer.py --data_dir ./outputs/features/EN --mode spk

Cite this work

@inproceedings{slt2022_hui_disentanle,
  author    = {Hui Lu and
               Disong Wang and
               Xixin Wu and
               Zhiyong Wu and
               Xunying Liu and
               Helen Meng},
  title     = {Disentangled Speech Representation Learning for One-Shot Cross-Lingual
               Voice Conversion Using Beta-VAE},
  booktitle = {{IEEE} Spoken Language Technology Workshop, {SLT} 2022, Doha, Qatar,
               January 9-12, 2023},
  pages     = {814--821},
  publisher = {{IEEE}},
  year      = {2022},
  doi       = {10.1109/SLT54892.2023.10022787},
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
audio		audio
configs		configs
datasets		datasets
models		models
modules		modules
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
inference-from-mel.py		inference-from-mel.py
inference-from-wav.py		inference-from-wav.py
latent_extraction.py		latent_extraction.py
preprocess.py		preprocess.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BetaVAE_VC

Samples | Paper | Pretrained Models

0. Setup Conda Environment

1. Data preprocessing

2. Training

3. Inference

4. Latent extraction

5. EER computation based on the extracted latents

Cite this work

About

Releases

Packages

Languages

License

oytunturk/BetaVAE_VC

Folders and files

Latest commit

History

Repository files navigation

BetaVAE_VC

Samples | Paper | Pretrained Models

0. Setup Conda Environment

1. Data preprocessing

2. Training

3. Inference

4. Latent extraction

5. EER computation based on the extracted latents

Cite this work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages