Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tacotron: Train TWEB dataset #22

Closed
erogol opened this issue Apr 23, 2018 · 5 comments
Closed

Tacotron: Train TWEB dataset #22

erogol opened this issue Apr 23, 2018 · 5 comments
Projects

Comments

@erogol
Copy link
Contributor

erogol commented Apr 23, 2018

Dataset: https://www.kaggle.com/bryanpark/the-world-english-bible-speech-dataset

@erogol
Copy link
Contributor Author

erogol commented Apr 23, 2018

Using the master branch had very poor performance due to the very long sequence length of the dataset. To alleviate the problem I try to use Truncated Backpropagation Through Time.

@erogol
Copy link
Contributor Author

erogol commented Apr 23, 2018

Dataset has a interesting frequency distribution, seems like post-processed after the recording.
individualimage

@erogol erogol added this to In Progress in v0.0.1 Apr 25, 2018
@erogol erogol moved this from In Progress to Done in v0.0.1 May 28, 2018
@erogol
Copy link
Contributor Author

erogol commented Jan 6, 2019

This dataset is in very low quality. It is low-pass filtered applied. It causes low stop-token prediction and pronunciation errors, especially for novel words. Training with phonemes might improve the results.

Also I replaces ReLU with RReLU and removed Dropout in prenet. These changes improved the results but yet to be tested on other datasets.

Sound example: https://soundcloud.com/user-565970875/tweb-example-108k-iters-2810d57
Model : https://drive.google.com/open?id=1deQ2akq9cuyreda0DgZOiBdydkbgseWP

@erogol erogol closed this as completed Jan 6, 2019
@erogol
Copy link
Contributor Author

erogol commented Jan 7, 2019

As I just discover, I trained the model with thw sampling rate 22050 which is default for LJSpeech but TWEB has 12000 sampling rate. That might be a important bug.

@viveknad94
Copy link

viveknad94 commented Nov 5, 2019

@erogol Is there any pretrained TTS model that has been trained with a male voice? I have been trying synthesize audio with the pretrained model (Tacotron-iter-108K on the TWEB dataset) but the commit is no longer present (2810d57). I get this error error: pathspec '2810d57' did not match any file(s) known to git.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
v0.0.1
  
Done
Development

No branches or pull requests

2 participants