Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tacotron 2 #11

Closed
DarkDefender opened this issue Dec 20, 2017 · 41 comments
Closed

Tacotron 2 #11

DarkDefender opened this issue Dec 20, 2017 · 41 comments

Comments

@DarkDefender
Copy link

DarkDefender commented Dec 20, 2017

Sorry if this is off-topic (deepvoice vs tacotron) but it seems like the tacotron 2 paper is now released.
The speech samples sounds better than ever (I think):
https://google.github.io/tacotron/publications/tacotron2/index.html

I must admit that I'm not too well versed in how much this differs from the original tacotron. But perhaps the changes made also could be used in your projects?

@r9y9
Copy link
Owner

r9y9 commented Dec 20, 2017

Paper (https://arxiv.org/abs/1712.05884) seems submitted to ICASSP 2018. I have read that today. It is very nice! I plan to implement WaveNet vocoder when I finish multi-speaker work (#10). DeepVoice3 and Tacotron2 both uses WaveNet vocoder.

@DarkDefender
Copy link
Author

Great! I wish you luck with the multi-speak and vocoder work! :D

@r9y9
Copy link
Owner

r9y9 commented Dec 31, 2017

I started to implement WaveNet vocoder. It's still quite WIP, but I think I implemented all basic features. If you are interested, check out https://github.com/r9y9/wavenet_vocoder. Audio samples from a model trained on CMU Arctic (16kHz, ~1200 utterances) can be found at r9y9/wavenet_vocoder#1 (comment).

@r9y9
Copy link
Owner

r9y9 commented Jan 1, 2018

It turned out to be easy to implement WaveNet vocoder. I think my implementation is already feature complete. The problem is that I don't have 32 GPUs :(

@DarkDefender
Copy link
Author

Yeah, it is really a bummer that the vocoder requires that much compute power to be able to train in a reasonable amount of time. :C

Perhaps you could try the WORLD vocoder method they used here? http://www.dtic.upf.edu/~mblaauw/NPSS/
However the quality will probably not be as good as the wavenet vocoder...

@DarkDefender
Copy link
Author

DarkDefender commented Jan 7, 2018

@r9y9 I had a listen to https://r9y9.github.io/wavenet_vocoder/ and I think that they sound really quite good!

The samples are much better (to me at least) than the tacotron samples as they do not seem to have the same harsh "sound compression artifact" noise. They instead sound like they have lower quality microphones or recored on a lower quality analog tape. (I guess most of it has to do with the 16kHz sampling freq)

So anyways, what changed? Did you buy 32 GPUs or did I missunderstand that it is not the wavenet vocoder itself that needs that much compute power? (IE it is tacotron + wavenet vocoder that requires that much)

@r9y9
Copy link
Owner

r9y9 commented Jan 7, 2018

@DarkDefender Nothing changed:) I just trained WaveNets with my single GPU (GTX 1080Ti) . As I noted in the demo page, it took 22 hours to train for the single speaker version and 44 hours for the multi-speaker version. I used 1 ~ 7 hours of audio sampled at 16kHz. For larger and higher rate sampled data, it will take more time to train.

@rraallvv
Copy link

rraallvv commented Feb 11, 2018

@r9y9 Wow! The output audio in the samples files is very impressive. If you don't mind I'd like to ask a couple of questions. Recently I was browsing some repos that do style transfer with deepvoice, in particular this one does a nice job, have your tried that kind of think? Also, do you know of an already trained network that I can run locally or online in a Jupyter notebook to generate speech from text?

Keep up the good work!

@r9y9
Copy link
Owner

r9y9 commented Feb 12, 2018

Hi, @rraallvv. I currently have lots of other things to do and have not tried it yet, sadly. WN for TTS is still WIP at #21. There is no pre-trained models at the moment.

@rafaelvalle
Copy link

rafaelvalle commented Feb 12, 2018

We're very close of issuing a pull request with a implementation of Tacotron 2 that is compatible with @r9y9's repo.

@rraallvv
Copy link

@r9y9 thanks for your quick reply.

@rafaelvalle
Copy link

@r9y9 probably beginning of next week we'll issue a PR with Taco 2. Here's the attention and predicted mel after 7k iterations.
image

image

@r9y9
Copy link
Owner

r9y9 commented Feb 24, 2018

@rafaelvalle Great! I cannot wait next week:)

@neverjoe
Copy link

hi @rafaelvalle, i'am working on Taco 2 two, can u explain how to reprodurce your result which looks working?

@rafaelvalle
Copy link

@neverjoe hold on tight, we're very close to a release of Tacotron 2 with FP16 and Distributed.

@neverjoe
Copy link

great job!

@PetrochukM
Copy link

PetrochukM commented Mar 22, 2018

@rafaelvalle What does the timeline look like? Any samples you can share?

@rafaelvalle
Copy link

rafaelvalle commented Mar 23, 2018 via email

@neverjoe
Copy link

neverjoe commented Mar 23, 2018 via email

@rafaelvalle
Copy link

rafaelvalle commented Mar 23, 2018

Mel-spectrogram and alignment during FP16 and DistributedDataParallel training.
screen shot 2018-03-22 at 7 07 20 pm
screen shot 2018-03-22 at 7 07 43 pm

Short demo sample not in the training set generated with the Griffin-Lim algorithm, not Wavenet Decoder..
taco2_fp16_sample.aiff.zip

@PetrochukM
Copy link

@rafaelvalle What is required to get to samples that sound similar to Googles Tacotron 2? Do you think your getting close?

@rafaelvalle
Copy link

rafaelvalle commented Mar 23, 2018 via email

@maozhiqiang
Copy link

@rafaelvalle great job! How do you improve your work on https://github.com/Rayhane-mamah/Tacotron-2, would you to explain it? thank you

@rafaelvalle
Copy link

we'll release the code soon and everything will become evident. i'm sorry this is taking some time but we're going over many bureaucracy layers.

@maozhiqiang
Copy link

@rafaelvalle Thank you for your contributions ! I'm looking forward to seeing the performance of tacotron-2

@duvtedudug
Copy link

@rafaelvalle any more updates for your tacotron 2?!

@rafaelvalle
Copy link

rafaelvalle commented Apr 14, 2018 via email

@duvtedudug
Copy link

duvtedudug commented Apr 14, 2018

Sounds great @rafaelvalle !!!

Open source?

For real time inference are you using something like the RNN based 'Efficient Neural Audio Synthesis' by Kalchbrenner et al. ?

@maozhiqiang
Copy link

maozhiqiang commented Apr 26, 2018

@rafaelvalle Do you implement parallel WaveNet! to real time

@rafaelvalle
Copy link

We use the first wavenet for real-time inference.
Here's NVIDIA's "CUDA alien code" that makes wavenet run faster than real-time.
https://github.com/NVIDIA/nv-wavenet/

@neverjoe
Copy link

neverjoe commented Apr 26, 2018 via email

@rafaelvalle
Copy link

Keep an eye on it.
Soon there will be a full TTS stack with faster than real-time inference open sourced.

@rafaelvalle
Copy link

Here's the link to the PyTorch implementation of Tacotron 2.
https://github.com/NVIDIA/tacotron2

@DarkDefender
Copy link
Author

@rafaelvalle Nice!
Do you have any speech samples that you can share?

@rafaelvalle
Copy link

Not yet but soon. We're currently focusing on another release.

@PetrochukM
Copy link

@rafaelvalle How much can we expect from Nvidia in upkeeping these repositories? The comments and tests are a bit lacking! Does not seem like there have been further updates to: https://github.com/NVIDIA/nv-wavenet

@rafaelvalle
Copy link

Hey @PetrochukM, please post any requests, issues, suggestion on the specific repos and the team responsible for it will address it precisely.

@PetrochukM
Copy link

@rafaelvalle Okay! Curious, how large of an effort is this? Assuming this is not a PyTorch like an effort by Facebook or a Tensorflow like effort.

@rafaelvalle
Copy link

You can probably find that information on the github repos as well...

@r9y9
Copy link
Owner

r9y9 commented May 9, 2018

r9y9/wavenet_vocoder#30 (comment)

Tacotron2 + WaveNet online TTS demo is comming soon.

r9y9 added a commit to r9y9/wavenet_vocoder that referenced this issue May 10, 2018
@r9y9
Copy link
Owner

r9y9 commented May 10, 2018

I think I can finally close the issue.

@r9y9 r9y9 closed this as completed May 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants