Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is PortaSpeech a better choice than FastSpeech2 or DiffSpeech? #26

Open
hertz-pj opened this issue Jul 19, 2022 · 2 comments
Open

Is PortaSpeech a better choice than FastSpeech2 or DiffSpeech? #26

hertz-pj opened this issue Jul 19, 2022 · 2 comments

Comments

@hertz-pj
Copy link

From your experience, how are the effects of these models ranked.

@keonlee9420
Copy link
Owner

Hi @hertz-pj , good point. I would say it depends on the purpose. For example, you'd choose FastSpeech2 If you need fast and safe performance. It goes to DiffSpeech if you want randomness and non-metalic speech in the output. If the interest is in both speed and randomness, PortaSpeech can be satisfying you.

@iamanigeeit
Copy link

@hertz-pj This is old, but just putting it there in case someone is searching for a comparison.

If you want to compare inference only, you can simple download pretrained models and run inference (even better if they are hosted on HuggingFace -- you can try directly).

For training, i haven't trained DiffSpeech, but FastSpeech2 trains 5-10x faster for the same comparable audio quality. FS2 takes under 2 hours on a single RTX 3090 to produce totally intelligible speech. However, PortaSpeech has more prosody variation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants