You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @hertz-pj , good point. I would say it depends on the purpose. For example, you'd choose FastSpeech2 If you need fast and safe performance. It goes to DiffSpeech if you want randomness and non-metalic speech in the output. If the interest is in both speed and randomness, PortaSpeech can be satisfying you.
@hertz-pj This is old, but just putting it there in case someone is searching for a comparison.
If you want to compare inference only, you can simple download pretrained models and run inference (even better if they are hosted on HuggingFace -- you can try directly).
For training, i haven't trained DiffSpeech, but FastSpeech2 trains 5-10x faster for the same comparable audio quality. FS2 takes under 2 hours on a single RTX 3090 to produce totally intelligible speech. However, PortaSpeech has more prosody variation.
From your experience, how are the effects of these models ranked.
The text was updated successfully, but these errors were encountered: