Voice synthesis related materials using deep learning
- Deep running (Kim Tae-hoon, 2017.11)
- Video released by DEVIEW 2017 for easy understanding of Tacotron
- Everyone's Labs WaveNet Study Video (Kim Seungil, 2017.10)
- Explain what you understand about WaveNet and the video with online discussion
- Generative Model-Based Text-to-Speech Synthesis (Heiga Zen, 2017.02)
- Heiga Zen, one of the authors of the WaveNet paper, introduces TTS overall technology and WaveNet introduction video
- Deep Running, Speak in the Voice of a Beloved Person - Popok Blog, 2018.03.27.
- AIA Life's Campaign Video 'Last Greetings' and blog post on voice synthesis technology
- CMU_ARCTIC (en)
- US English data set created for speech synthesis research at CMU's Language Technologies Institute
- The LJ Speech Dataset (en)
- I'm on Keith Ito's website, but I can not find where and why
- Blizzard 2012 (en)
- Data set used in a corpus-based speech synthesis challenge called Blizzard Challenge 2012
- CSTR VCTK Corpus (en)
- English Multi-speaker Corpus for CSTR Voice Cloning Toolkit
- https://github.com/ibab/tensorflow-wavenet
- https://github.com/r9y9/wavenet_vocoder (PyTorch)
- https://github.com/kan-bayashi/PytorchWaveNetVocoder (PyTorch)
WaveNet takes too long to learn, so I do not seem to get the answer unless I use a multi-GPU. The related code links are summarized.
- https://github.com/nakosung/tensorflow-wavenet/tree/multigpu (Tensorflow)
- WaveNet multi GPU 구현 버전
- https://github.com/nakosung/tensorflow-wavenet/tree/model_parallel (Tensorflow)
- WaveNet model parallelism 구현 버전
- https://github.com/tomlepaine/fast-wavenet
- https://github.com/dhpollack/fast-wavenet.pytorch (PyTorch)
- https://github.com/kensun0/Parallel-Wavenet (not a complete implement)
- https://github.com/keithito/tacotron
- https://github.com/Kyubyong/tacotron
- https://github.com/barronalex/Tacotron
- https://carpedm20.github.io/tacotron/ (Multi-speaker Tacotron in TensorFlow)
- Multi-speaker implementation of Tactron 1 and Deep Voice 2
- https://github.com/riverphoenix/tacotron2 (implemented)
- https://github.com/Rayhane-mamah/Tacotron-2 (implemented)
- https://github.com/selap91/Tacotron2 (implemented)
- https://github.com/CapstoneInha/Tacotron2-rehearsal
- https://github.com/A-Jacobson/tacotron2 (PyTorch)
- https://github.com/maozhiqiang/tacotron_cn (implementation verification required / Chinese)
- https://github.com/LGizkde/Tacotron2_Tao_Shujie (check required)
- https://github.com/ruclion/tacotron_with_style_control (Style Control)
- HybridNet: A Hybrid Neural Architecture to Speed-up Autoregressive Models (2018.02) - Yanqi Zhou et al.
- WaveNet is used to pull out the audio context and use the LSTM from that context to generate the following samples faster. MOS is higher than WaveNet, and audio generation speed is 2 ~ 4 times faster than the same sound quality level. (Eg 40-layer WAVENET vs. 20-layer WAVENET + 1 LSTM)
- ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech (2018.07) - Wei Ping et al.
- Gaussian autoregressive WaveNet with teacher-net and Gaussian We have minimized Regularized KL divergence for highly picked distributions using inverse autoregressive flow as student-net.
- Propose a text-to-wave architecture that generates end-to-end speech.
- ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech - Baidu Research, 2018.07.20.
- ISPEECH VOICE CLONING DEMOS
- Listen to famous people's voice cloning demo
- Fast Generation for Convolutional Autoregressive Models (2017.04) - Prajit Ramachandran et al.
- This technique was applied to Wavenet and PixelCNN ++ models, and it was said that there was a speed increase of up to 21 times and 183 times, respectively. It is important to note that the speed improvement may not be greater than expected in a real environment because it is the maximum performance improvement for a specific situation.