An implement of "Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training"
- PPG features (10ms frameshift)
- F0 features (10ms frameshift)
- Speaker embedding (One embedding per wav file)
- Audio files (wave format, 24000 sample rate, mono)
Set path / directory or other configurations in .json files in directory "configs" Rewrite your data load function in utils/dataset.py
Single GPU
CUDA_VISIBLE_DEVICES=0 python train.py -c configs/stage1.json
CUDA_VISIBLE_DEVICES=0 python train.py -c configs/stage2.json
CUDA_VISIBLE_DEVICES=0 python train.py -c configs/stage3.json