This repository contains the open source code, audio samples and pretrained models of my paper: QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion
Put pretrained model into logs/quickvc
python convert.py
You can change convert.txt to select the target and source
- Hubert-Soft
cd dataset
python encode.py soft dataset/VCTK-16K dataset/VCTK-16K
- Spectrogram resize data augumentation, please refer to FreeVC.
python train.py
If you want to change the config and model name, change:
parser.add_argument('-c', '--config', type=str, default="./configs/quickvc.json",help='JSON file for configuration')
parser.add_argument('-m', '--model', type=str,default="quickvc",help='Model name')
in utils.py
In order to use the sr during training, change this part to
i = random.randint(68,92)
c_filename = filename.replace(".wav", f"_{i}.npy")
If you have any question about the decoder, refer to MS-ISTFT-VITS.
If you have any question about the Hubert-soft, refer to Soft-VC.
If you have any question about the data augumentation, refer to FreeVC.