New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tacotron (2?) based models appear to be limited to rather short input #739
Comments
I'm confused because sometime this works, other times it doesn't. Now, I accidentally reproduced the bad sample: Instead of just "test", you can hear something like "test-t-t-t-t-t-t-t....". All I changed is |
I get the same thing. If sentences are past a certain length, they are cut off in the produced wav. Here's a simple example: ❯ tts --text "This sentence, being as long as it is, most unfortunately, will not be fully stated." --out_path test.wav
> tts_models/en/ljspeech/tacotron2-DDC is already downloaded.
> vocoder_models/en/ljspeech/hifigan_v2 is already downloaded.
> Using model: Tacotron2
> Model's reduction rate `r` is set to: 1
> Vocoder Model: hifigan
> Generator Model: hifigan_generator
> Discriminator Model: hifigan_discriminator
Removing weight norm...
> Text: This sentence, being as long as it is, most unfortunately, will not be fully stated.
> Text splitted to sentences.
['This sentence, being as long as it is, most unfortunately, will not be fully stated.']
> Decoder stopped with `max_decoder_steps` 500
> Processing time: 3.1818737983703613
> Real-time factor: 0.49914852912682467
> Saving output to test.wav In this example, the speaker is cut off before saying "stated". How can we synthesize arbitrarily long sentences? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discourse page for further help. https://discourse.mozilla.org/c/tts |
It may be stale, but this issue is not fixed. It's easy to reproduce and a blocker for any serious work with TTS. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discourse page for further help. https://discourse.mozilla.org/c/tts |
It may be stale, but this issue is not fixed. It's easy to reproduce and a blocker for any serious work with TTS. |
Suffering this issue too. Unsure what to do to resolve it. Will try other models to see what happens, I suppose. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discourse page for further help. https://discourse.mozilla.org/c/tts |
It may be stale, but this issue is not fixed. It's easy to reproduce and a blocker for any serious work with TTS. |
I have the same problem here, long sentences get truncated. It seems to be just a configuration as they say here thorstenMueller/Thorsten-Voice#22 setting "max_decoder_steps": 10000 in the model config.json solved the problem |
Running
tts --text
on some meaningful sentences results in the following output:The audio file is truncated with respect to the text.
If I hack the config file at
TTS/tts/configs/tacotron_config.py
to have a largermax_decoder_steps
value, the output does seem to successfully get longer, but I'm not sure how safe this is.Are there any better solutions? Should I use a different model?
The text was updated successfully, but these errors were encountered: