The model used in this repo is a modified 2D version of WaveGrad, a denoising difussion probablistic (or difussion) model for speech synthesis.
pip install -r requirements.txt
zouqi config/default.yml preprocess
zouqi config/default.yml train --print-args
tensorboard --logdir logs
Samples from 46.2k iterations:
The implementation of the diffusion model is mainly based on the WaveGrad implementation from ivanvovk and also inspired by the other implementation from lmnt-com.