Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wav output has no sound #1

Open
kudzaijaure-dot opened this issue Sep 14, 2022 · 3 comments
Open

wav output has no sound #1

kudzaijaure-dot opened this issue Sep 14, 2022 · 3 comments

Comments

@kudzaijaure-dot
Copy link

kudzaijaure-dot commented Sep 14, 2022

@souvikg544
Ive tried running this part of the script, but it raises a no argument error for these two:

  --model_path $test_ckpt \
  --config_path $test_config \

I manually copy the paths of the Best Model.pth and the config.json under tts_train_dir, and then it works with no error, but the output wav file has no speach, just a monotone buzzing sound.
Also tensorboard wouldn't launch so just skipped the step, could be related.

@souvikg544
Copy link
Owner

Thank you pulling up the issue . You have added the right path file. The problem is TTS speech generation from text requires at least 100000 epochs to get a suitable output .It also requires a big audio dataset. You can use Colab pro or AWS to achieve the results.

This is the same issue you are talking about ! Refer to the comments in the solution -

https://stackoverflow.com/questions/66307611/how-do-i-get-started-training-a-custom-voice-model-with-mozilla-tts-on-ubuntu-20

@souvikg544
Copy link
Owner

Also anyone achieving any solution on colab do let me know the way around ...

@kudzaijaure-dot
Copy link
Author

kudzaijaure-dot commented Sep 20, 2022

@souvikg544 Tried using over an hour of cleaned data, training took about 50 minutes but still out.wav has no speech, just a buzzing sound for a second. Every text extraction was successful, with 483 extracted 10 second bits. Tensorboard not launching so I skipped that stage. Audio Processor from TTS.Utils.Audio shows error first time trying to run command but runs normally with no changes the second time. Inferencing code below:
!tts --text "Text for TTS, to test how well the president of the united states speaks. Maybe what it requires is a verly long sentence that does the job"
--model_path '/content/tts_train_dir/run-September-20-2022_01+46PM-3de0986/best_model.pth'
--config_path '/content/tts_train_dir/run-September-20-2022_01+46PM-3de0986/config.json'
--out_path out.wav

Model recorded 100 epochs in training on 1hr of data, so the suggested 100000 would require 1000 hours of audio?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants