-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long-form synthesis #9
Comments
Currently we train on a maximum of 30-second audios. With @ylacombe we're looking at increasing the context length to potentially longer audio lengths. Alibi embeddings (or a variant thereof) look promising for this https://arxiv.org/abs/2108.12409 As a future works, it would be amazing if you could feed an entire chapter of an audiobook to the model, and have it learn the prosody and intonation directly from training examples (with no guidance from the text prompt) |
That would be nice. I was wondering if it would be possible to use chunking, and have previous chunks as context, to make the speech sound natural with different speakers. (This would be nice for audiobooks with multiple characters.) |
Is there any updates aobut the long-form speech synthesis? I'm looking forward to the results. |
Hi,
Congrats on the release!! Is long form synthesis planned?
Thank you!
The text was updated successfully, but these errors were encountered: