Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Glow-TTS implementation. #500

Closed
erogol opened this issue Aug 17, 2020 · 1 comment
Closed

Initial Glow-TTS implementation. #500

erogol opened this issue Aug 17, 2020 · 1 comment
Labels
Projects

Comments

@erogol
Copy link
Contributor

erogol commented Aug 17, 2020

Paper: https://arxiv.org/pdf/2005.11129.pdf
Implementation: https://github.com/jaywalnut310/glow-tts

Initially, I plan to adapt to the original implementation.
I think the encoder can be simplified with a convolutional encoder. I'll try a couple of different architectures.

@erogol erogol created this issue from a note in v0.0.5 (In progress) Aug 17, 2020
@erogol erogol added improvement a new feature new-model labels Aug 17, 2020
@erogol erogol moved this from In progress to Done in v0.0.5 Sep 22, 2020
@erogol erogol moved this from Done to In progress in v0.0.5 Sep 22, 2020
@erogol
Copy link
Contributor Author

erogol commented Sep 22, 2020

Sample notebook: https://colab.research.google.com/drive/1NC4eQJFvVEqD8L4Rd8CVK25_Z-ypaBHD?usp=sharing
Model: https://github.com/mozilla/TTS/wiki/Released-Models

I finished the first version of the model. Here are some details.

The model is able to produce good quality results but my observation is that it is still less natural than our TacotronDDC model. However, it is easier to train due to its alternative to the attention module with a greedy search mechanism to learn the text-to-spec alignment. Then, it passes the learned alignment to a duration predictor which is used at inference time.

Glow-TTS enables to set speed and variation of the speech with certain parameters. It also does not rely on auto-regression thus it computs the output with a single pass that yields a faster execution time compared to Tacotron models with reduction factor smaller than 3. (So our release TacotronDDC model has very similar real-time factor both in a GPU or a CPU)

In my implementation there are couple of differences. I tried convolution encoder models to enable faster execution. And so far, I got comparable results with Gated Convolution to the original model. Using Gated Convolution speeds up the model ~1.3 times on a CPU.

Tensorboard outputs: (Please ignore empty loss plots which are from different runs of the same machine)

image

image

@erogol erogol closed this as completed Sep 22, 2020
@erogol erogol moved this from In progress to Done in v0.0.5 Sep 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
v0.0.5
  
Done
Development

No branches or pull requests

1 participant