A PyTorch implementation of the FFTNet: a Real-Time Speaker-Dependent Neural Vocoder
- Install requirements:
pip install -r requirements.txt
- Download dataset:
wget http://festvox.org/cmu_arctic/cmu_arctic/packed/cmu_us_slt_arctic-0.95-release.tar.bz2
tar xf cmu_us_slt_arctic-0.95-release.tar.bz2
- Extract features:
python preprocess.py
- Training with default hyperparams:
python train.py
- Synthesize from model:
python generate.py --checkpoint=/path/to/model --lc_file=/path/to/local_conditon
- Test and modify conditonal sampling.
- Fast generation. (Without fast generation, it generates about 230 samples per second with a free TITAN Xp.)
- Post-synthesis denoising.
- This is not offical implementation, some details are different from the paper.
- Now the generated speech from this repo is not so good as wavenet vocoder.
- Work in progress.