Is PortaSpeech a better choice than FastSpeech2 or DiffSpeech? #26

hertz-pj · 2022-07-19T08:23:19Z

From your experience, how are the effects of these models ranked.

keonlee9420 · 2022-07-31T05:27:11Z

Hi @hertz-pj , good point. I would say it depends on the purpose. For example, you'd choose FastSpeech2 If you need fast and safe performance. It goes to DiffSpeech if you want randomness and non-metalic speech in the output. If the interest is in both speed and randomness, PortaSpeech can be satisfying you.

iamanigeeit · 2024-01-28T14:43:17Z

@hertz-pj This is old, but just putting it there in case someone is searching for a comparison.

If you want to compare inference only, you can simple download pretrained models and run inference (even better if they are hosted on HuggingFace -- you can try directly).

For training, i haven't trained DiffSpeech, but FastSpeech2 trains 5-10x faster for the same comparable audio quality. FS2 takes under 2 hours on a single RTX 3090 to produce totally intelligible speech. However, PortaSpeech has more prosody variation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is PortaSpeech a better choice than FastSpeech2 or DiffSpeech? #26

Is PortaSpeech a better choice than FastSpeech2 or DiffSpeech? #26

hertz-pj commented Jul 19, 2022

keonlee9420 commented Jul 31, 2022

iamanigeeit commented Jan 28, 2024

Is PortaSpeech a better choice than FastSpeech2 or DiffSpeech? #26

Is PortaSpeech a better choice than FastSpeech2 or DiffSpeech? #26

Comments

hertz-pj commented Jul 19, 2022

keonlee9420 commented Jul 31, 2022

iamanigeeit commented Jan 28, 2024