New Tacotron2 model release with WaveRNN vocoder. #153

erogol · 2019-04-12T14:41:16Z

A new TTS Tacotron2 model trained on LJSpeech is released. It should work well with the MOLD WaveRNN model.

Model has been trained for 260K iterations. It has the best validation loss so far on LJSpeech.
Model has been trained first with dropout Prenet as in the original paper and then switched to BN prenet described above. And finally, it's been trained with "forward attention." for just experimental reasons.
In inference time you can try different attention related parameters and pick the one best fits you. So you can switch on/off forward attention, use "sigmoid" or "softmax" norm or consider to use attention windowing. The default settings are given by the model's config.json.
I think both WaveRNN and TTS models have more space for finetuning (especially WaveRNN) for better results.

You can also read more here #26

ZohaibAhmed · 2019-04-12T15:46:08Z

@erogol - tried to load the checkpoint with the latest code on the dev-tacotron2 branch. I get the following error:

RuntimeError: Error(s) in loading state_dict for Tacotron2:
	Missing key(s) in state_dict: "decoder.attention_layer.ta.weight", "decoder.attention_layer.ta.bias".

Solved - just make sure you use the right config.json files :)

ZohaibAhmed · 2019-04-15T20:24:28Z

@erogol - I tried to train a new WaveRNN model (from scratch and finetune on top of yours) as well as use my previous implementation of WaveRNN. For each one the output is very scrambled:

https://drive.google.com/open?id=1iHo-b3WwGrvRUc-RjhpQA_G0GgycsENW

When I use point the vocoder to the MOLD model that you published, I get clearer speech (I can make out all of the words) but with noise. Any ideas?

erogol · 2019-04-16T09:25:46Z

You need to train more to get cleaner output, but LJSpeech is also noisy. So to a level, it is acceptable.

ZohaibAhmed · 2019-04-16T12:17:20Z

@erogol - thanks. Is this the case even when I'm fine-tuning? By training more, do you mean training tacotron more or WaveRNN? How many steps should it generally start to get better?

I checked the alignment of what tacotron produces and it seems like the alignment is there.

erogol · 2019-04-16T13:55:06Z

I meant to train WaveRNN. If you train from the start, it sounds good after 300K iters but depends on the dataset.

ZohaibAhmed · 2019-04-16T14:29:29Z

@erogol Thanks. From your experience, do you think it's possible to fine-tune WaveRNN like we can fine-tune tacotron? My dataset is just a couple of hours so it might not be enough to train from scratch.

I've also tried to use my own implementation of WaveRNN (very similar to yours) and after 900k steps, it works well with Rayhane's tacotron implementation but not yours.

erogol · 2019-04-16T20:22:53Z

finetuning WaveRNN works but I've not tried a small dataset to finetune.

ZohaibAhmed · 2019-04-18T18:01:42Z

@erogol - I tried to finetune to 731k steps, the output still sounds scrambled: https://drive.google.com/file/d/1niGB9-IvkjW-Q7MTrgTtwa96Sp8Bu6Ub/view?usp=sharing

Any tips on what I can do to debug or see what might be wrong?

mrgloom · 2019-04-22T00:03:18Z

How to use WaveRnn model?
I have downloaded mold_ljspeech_best_model from here https://github.com/erogol/WaveRNN#released-models (https://drive.google.com/drive/folders/1wpPn3a0KQc6EYtKL0qOi4NqEmhML71Ve)
And use suggested notebook from db7f3d3 https://github.com/mozilla/TTS/blob/db7f3d36e7768f9179d42a8f19b88c2c736d87eb/notebooks/Benchmark.ipynb
But in config I can't see CONFIG.use_phonemes and CONFIG.embedding_size

Update:
I fix it, tacotron2 and wavernn is separate models and should be used from specific commits.

mrgloom · 2019-04-30T06:50:55Z

I have tried tacotron2 + wavernn and found that quality is good but wavernn is too slow on CPU about 3 sec for tacotron 2 and about 30 sec for wavernn, so it's comparable with waveglow model in terms of speed, but wavernn model size is smaller. Also tacotron 2 processing speed depends on sentence length(i.e. shorter sentences processed faster ~1 sec), but for wavernn it's also high for short sentences ~25 sec, why?

Model size:

Tacotron2:
    336Mb ljspeech-260k/checkpoint_260000.pth.tar
WaveRnn:
    49Mb mold_ljspeech_best_model/checkpoint_393000.pth.tar

CorentinJ · 2019-07-18T18:26:02Z

@erogol Do you have a model trained on the latest commit?

haqkiemdaim · 2019-10-10T06:45:24Z

@erogol - tried to load the checkpoint with the latest code on the dev-tacotron2 branch. I get the following error:
RuntimeError: Error(s) in loading state_dict for Tacotron2:
	Missing key(s) in state_dict: "decoder.attention_layer.ta.weight", "decoder.attention_layer.ta.bias".
Solved - just make sure you use the right config.json files :)

may i know which config.json file solve your issue? @ZohaibAhmed

erogol · 2019-10-10T11:06:08Z

@CorentinJ not yet but I'll be releasing new models soon.

RaulButuc · 2019-12-29T02:52:52Z

@erogol is there any way to just run the pre-trained model with custom given inputs in an "easy" way (I don't really understand most of the code just yet - as I'm still learning about ML).

reuben · 2019-12-29T20:42:53Z

@RaulButuc check out the instructions here: https://github.com/mozilla/TTS/wiki/Released-Models#simple-packaging---self-contained-package-that-runs-an-http-api-for-a-pre-trained-tts-model

RaulButuc · 2019-12-30T01:51:39Z

@reuben i actually tried that yesterday but unfortunately there is a conflict of pytorch versions in the requirements (had to manually download an older pytorch .whl to be able to install the TTS-0.0.1 package, which then throws a dependency requirements error when I try to run it)

EDIT:

I will try to do a clean install since maybe something got messed up yesterday with all the trial-and-error.
Also forgot to mention, I was actually rather interested in something similar to https://github.com/fatchord/WaveRNN (where you can just run a quick_start.py script with custom sentences) but for the 10bit version of the model given by @erogol. I tried writing one myself based on all the samples I could find here on GH, but not sure I fully understand how to correctly load the models

reuben · 2019-12-30T07:00:56Z

You should be able to create and use a fresh virtualenv to avoid any conflicts.

…

On 30 Dec 2019, at 02:51, Raul Butuc ***@***.***> wrote: @reuben i actually tried that yesterday but unfortunately there is a conflict of pytorch versions in the requirements (had to manually download an older pytorch .whl to be able to install the TTS-0.0.1 package, which then throws a dependency requirements error) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

erogol added this to In Progress in v0.0.1 Apr 12, 2019

erogol moved this from In Progress to Done in v0.0.1 Apr 12, 2019

erogol moved this from Done to In Progress in v0.0.1 Apr 12, 2019

erogol closed this as completed Apr 12, 2019

v0.0.1 automation moved this from In Progress to Done Apr 12, 2019

mozilla deleted a comment from IveJ Dec 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Tacotron2 model release with WaveRNN vocoder. #153

New Tacotron2 model release with WaveRNN vocoder. #153

erogol commented Apr 12, 2019 •

edited

Loading

ZohaibAhmed commented Apr 12, 2019 •

edited

Loading

ZohaibAhmed commented Apr 15, 2019

erogol commented Apr 16, 2019

ZohaibAhmed commented Apr 16, 2019

erogol commented Apr 16, 2019

ZohaibAhmed commented Apr 16, 2019

erogol commented Apr 16, 2019

ZohaibAhmed commented Apr 18, 2019

mrgloom commented Apr 22, 2019 •

edited

Loading

mrgloom commented Apr 30, 2019

CorentinJ commented Jul 18, 2019

haqkiemdaim commented Oct 10, 2019

erogol commented Oct 10, 2019

RaulButuc commented Dec 29, 2019 •

edited

Loading

reuben commented Dec 29, 2019

RaulButuc commented Dec 30, 2019 •

edited

Loading

reuben commented Dec 30, 2019 via email

New Tacotron2 model release with WaveRNN vocoder. #153

New Tacotron2 model release with WaveRNN vocoder. #153

Comments

erogol commented Apr 12, 2019 • edited Loading

ZohaibAhmed commented Apr 12, 2019 • edited Loading

ZohaibAhmed commented Apr 15, 2019

erogol commented Apr 16, 2019

ZohaibAhmed commented Apr 16, 2019

erogol commented Apr 16, 2019

ZohaibAhmed commented Apr 16, 2019

erogol commented Apr 16, 2019

ZohaibAhmed commented Apr 18, 2019

mrgloom commented Apr 22, 2019 • edited Loading

mrgloom commented Apr 30, 2019

CorentinJ commented Jul 18, 2019

haqkiemdaim commented Oct 10, 2019

erogol commented Oct 10, 2019

RaulButuc commented Dec 29, 2019 • edited Loading

reuben commented Dec 29, 2019

RaulButuc commented Dec 30, 2019 • edited Loading

reuben commented Dec 30, 2019 via email

erogol commented Apr 12, 2019 •

edited

Loading

ZohaibAhmed commented Apr 12, 2019 •

edited

Loading

mrgloom commented Apr 22, 2019 •

edited

Loading

RaulButuc commented Dec 29, 2019 •

edited

Loading

RaulButuc commented Dec 30, 2019 •

edited

Loading