Conformer-VC is inspired by Non-autoregressive sequence-to-sequence voice conversion that is parallel voice conversion methods powered by conformer.
The differences between original paper are
- NOT using reduction-factor.
- Mel-spectrograms are not normalized by speaker statistics.
- Extract durations by DTW, not pretrained autoregressive model.
- Use HiFi-GAN instead of ParallelWaveGAN
- pytorch
- numpy
- pyworld
- accelerate
- soundfile
- librosa
- cython
- omegaconf
- tqdm
- resemblyzer
- matplotlib
- scipy
If you get an error about the package, please install it.
- Preprocess
If you wanna train your dataset, please rewrite configs/preprocess.yaml and preprocess.py properly.
Note that num of source files and num of tgt files must be same and file ids must be same.
$ cd dtw && python setup.py build_ext --inplace && cd ..
$ python prerprocess.py
- Training
single gpu training
$ ln -s ./dataaset/feats DATA
$ python train.py
or multi gpus
$ ln -s ./dataaset/feats DATA
$ accelerate config
answer question of your machine.
$ accelerate launch train.py
- Validation
$ python validate.py --model_dir {MODEL_DIR} --hifi_gan {HIFI_GAN_DIR} --data_dir DATA
if this script run correctly, outputs directory is generated and synthesized wav is in it.