Text-to-Speech Synthesis

Voice synthesis related materials using deep learning

Lectures & Seminars

Deep running (Kim Tae-hoon, 2017.11)
- Video released by DEVIEW 2017 for easy understanding of Tacotron
Everyone's Labs WaveNet Study Video (Kim Seungil, 2017.10)
- Explain what you understand about WaveNet and the video with online discussion
Generative Model-Based Text-to-Speech Synthesis (Heiga Zen, 2017.02)
- Heiga Zen, one of the authors of the WaveNet paper, introduces TTS overall technology and WaveNet introduction video
Deep Running, Speak in the Voice of a Beloved Person - Popok Blog, 2018.03.27.
- AIA Life's Campaign Video 'Last Greetings' and blog post on voice synthesis technology

Dataset

CMU_ARCTIC (en)
- US English data set created for speech synthesis research at CMU's Language Technologies Institute
The LJ Speech Dataset (en)
- I'm on Keith Ito's website, but I can not find where and why
Blizzard 2012 (en)
- Data set used in a corpus-based speech synthesis challenge called Blizzard Challenge 2012
CSTR VCTK Corpus (en)
- English Multi-speaker Corpus for CSTR Voice Cloning Toolkit

Korean Corpus

KSS Dataset: Korean Single speaker Speech Dataset

WaveNet

Paper

WaveNet: A Generative Model for Raw Audio (2016.09)

Articles

WaveNet: A Generative Model for Raw Audio (DeepMind Blog)

Source Code

Multi-GPU

WaveNet takes too long to learn, so I do not seem to get the answer unless I use a multi-GPU. The related code links are summarized.

https://github.com/nakosung/tensorflow-wavenet/tree/multigpu (Tensorflow)
- WaveNet multi GPU 구현 버전
https://github.com/nakosung/tensorflow-wavenet/tree/model_parallel (Tensorflow)
- WaveNet model parallelism 구현 버전

Fast WaveNet

Paper

Fast Wavenet Generation Algorithm (2016.11)

Articles

Source Code

Parallel WaveNet

WaveRNN

Paper

Efficient Neural Audio Synthesis (2018.02)

Deep Voice

Paper

Deep Voice: Real-time Neural Text-to-Speech (2017.02)

Deep Voice 2

Paper

Deep Voice 2: Multi-Speaker Neural Text-to-Speech (2017.05)

Deep Voice 3

Paper

Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning (2017.10)

Source Code

Tacotron

Paper

Tacotron: Towards End-to-End Speech Synthesis (2017.05)

Source Code

https://github.com/keithito/tacotron
https://github.com/Kyubyong/tacotron
https://github.com/barronalex/Tacotron
https://carpedm20.github.io/tacotron/ (Multi-speaker Tacotron in TensorFlow)
- Multi-speaker implementation of Tactron 1 and Deep Voice 2

Tacotron 2

Paper

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions (2017.12)

Articles

Tacotron 2: Generating Human-like Speech from Text (Google Research Blog)

Source Code

https://github.com/riverphoenix/tacotron2 (implemented)
https://github.com/Rayhane-mamah/Tacotron-2 (implemented)
https://github.com/selap91/Tacotron2 (implemented)
https://github.com/CapstoneInha/Tacotron2-rehearsal
https://github.com/A-Jacobson/tacotron2 (PyTorch)
https://github.com/maozhiqiang/tacotron_cn (implementation verification required / Chinese)
https://github.com/LGizkde/Tacotron2_Tao_Shujie (check required)
https://github.com/ruclion/tacotron_with_style_control (Style Control)

HybridNet

HybridNet: A Hybrid Neural Architecture to Speed-up Autoregressive Models (2018.02) - Yanqi Zhou et al.
- WaveNet is used to pull out the audio context and use the LSTM from that context to generate the following samples faster. MOS is higher than WaveNet, and audio generation speed is 2 ~ 4 times faster than the same sound quality level. (Eg 40-layer WAVENET vs. 20-layer WAVENET + 1 LSTM)

ClariNet

ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech (2018.07) - Wei Ping et al.
- Gaussian autoregressive WaveNet with teacher-net and Gaussian We have minimized Regularized KL divergence for highly picked distributions using inverse autoregressive flow as student-net.
- Propose a text-to-wave architecture that generates end-to-end speech.

Articles

ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech - Baidu Research, 2018.07.20.

Demo

Sound demos for "ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech"

Voice Cloning

ISPEECH VOICE CLONING DEMOS
- Listen to famous people's voice cloning demo

Paper

Neural Voice Cloning with a Few Samples (2018.02)

Speed Up Strategy

Fast Generation for Convolutional Autoregressive Models (2017.04) - Prajit Ramachandran et al.
- This technique was applied to Wavenet and PixelCNN ++ models, and it was said that there was a speed increase of up to 21 times and 183 times, respectively. It is important to note that the speed improvement may not be greater than expected in a real environment because it is the maximum performance improvement for a specific situation.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
LICENSE		LICENSE
README.md		README.md
README_kor.md		README_kor.md
深度学习于语音合成研究综述.md		深度学习于语音合成研究综述.md

License

lbqin/SpeechSynthesis

Folders and files

Latest commit

History

Repository files navigation

Text-to-Speech Synthesis

Lectures & Seminars

Dataset

Korean Corpus

WaveNet

Paper

Articles

Source Code

Multi-GPU

Fast WaveNet

Paper

Articles

Source Code

Parallel WaveNet

Paper

Articles

Source Code

WaveRNN

Paper

Deep Voice

Paper

Deep Voice 2

Paper

Deep Voice 3

Paper

Source Code

Tacotron

Paper

Source Code

Tacotron 2

Paper

Articles

Source Code

HybridNet

ClariNet

Articles

Demo

Voice Cloning

Paper

Speed ​​Up Strategy

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Speed Up Strategy

Packages