This is a Pytorch implementation of StyleVC StyleVC: Non-Parallel Voice Conversion with Adversarial Style Generalization. Feel free to use and modify the code and please refer our repo.
- 2022/02/04 Release the StyleVC official code.
Audio samples generated by this implementation can be found here.
Run the 'inference.ipynb' file in Collab! here
(Option) You can make an environment using anaconda
conda create -n py37torch17 python=3.7.9
(Option) And then activate your conda environment and install PyTorch and Tensorflow
conda activate py37torch17
conda install pytorch=1.7 torchvision torchaudio cudatoolkit=10.1 -c pytorch
pip install --upgrade tensorflow-gpu==1.15
You can install the python dependencies with
pip install -r requirements.txt
Preprocessing is supported for VCTK Datasets.
- VCTK: CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit (version 0.92) https://datashare.ed.ac.uk/handle/10283/3443
You can refer to the sample file and the file structure below on Github. For preprocessing, use the following command.
python prepare_dataset.py --in_dir data/VCTK/original/ --out_dir_name VCTK_16K --dataset VCTK
The file structure after preprocessing is as follows:
├── data
│ ├── VCTK
│ │ ├── original
│ │ │ ├── wav48
│ │ │ │ ├── wavs
│ │ │ ├── metadata.csv
│ │ ├── VCTK22K
│ │ │ ├── train
│ │ │ │ ├── p225
│ │ │ │ │ ├── p225_021.npz
│ │ │ │ │ ├── ...
│ │ │ │ │ ├── p225_423.npz
│ │ │ │ ├── ...
│ │ │ │ ├── p376
│ │ │ ├── val
To train, set hyperparameters in model/hparams.py and use the command.
python trainer.py --dataset VCTK --dataset_name VCTK_16K --log_dir StyleVC_VCTK_test01
We used Hifigan finetuned. You can download the checkpoint and config file below and saved in 'vocoder/checkpoint'.
Model | Checkpoint file | Config file |
---|---|---|
VCTK | Download | Download |
python inference.py
We provide pretrained checkpoint. Download the checkpoint file below and put it in 'outputs/StyleVC_VCTK'.
Model | Checkpoint file |
---|---|
VCTK | Download |
Please cite the paper if you find StyleVC useful.
@inproceedings{hwang2022stylevc,
title={StyleVC: Non-Parallel Voice Conversion with Adversarial Style Generalization},
author={Hwang, In-Sun and Lee, Sang-Hoon and Lee, Seong-Whan},
booktitle={2022 26th International Conference on Pattern Recognition (ICPR)},
pages={23--30},
year={2022},
organization={IEEE}
}