Awesome Natural TTS

Awesome Natural TTS

Evaluation

TTS Arena: Benchmarking Text-to-Speech Models in the Wild
TTS-Arena - TTS-AGI 🤗

Vocoder

EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks, arXiv, 2402.00892, arxiv, pdf, cication: -1

Shijia Liao, Shiyi Lan, Arun George Zachariah · (double-blind-eva-gan)
Multi-Scale Sub-Band Constant-Q Transform Discriminator for High-Fidelity Vocoder, arXiv, 2311.14957, arxiv, pdf, cication: -1

Yicheng Gu, Xueyao Zhang, Liumeng Xue, Zhizheng Wu · (Amphion - open-mmlab) · (vocodexelysium.github)

Emotional TTS

EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech, arXiv, 2406.07803, arxiv, pdf, cication: -1

Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Sang-Hoon Lee, Seong-Whan Lee
Exploring speech style spaces with language models: Emotional TTS without emotion labels, arXiv, 2405.11413, arxiv, pdf, cication: -1

Shreeram Suresh Chandra, Zongyang Du, Berrak Sisman
Fine-Grained Quantitative Emotion Editing for Speech Generation, arXiv, 2403.02002, arxiv, pdf, cication: -1

Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li
ED-TTS: Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis, arXiv, 2401.08166, arxiv, pdf, cication: -1

Haobin Tang, Xulong Zhang, Ning Cheng, Jing Xiao, Jianzong Wang
EmotiVoice - netease-youdao

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
emotional-vits - innnky

无需情感标注的情感可控语音合成模型，基于VITS
ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models, arXiv, 2305.13831, arxiv, pdf, cication: -1

Minki Kang, Wooseok Han, Sung Ju Hwang, Eunho Yang

Natural TTS

Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback, arXiv, 2406.00654, arxiv, pdf, cication: -1

Chen Chen, Yuchen Hu, Wen Wu, Helin Wang, Eng Siong Chng, Chao Zhang
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data, arXiv, 2402.08093, arxiv, pdf, cication: -1

Mateusz Łajszczak, Guillermo Cámbara, Yang Li, Fatih Beyhan, Arent van Korlaar, Fan Yang, Arnaud Joly, Álvaro Martín-Cortinas, Ammar Abbas, Adam Michalski · (amazon-ltts-paper)

· (jiqizhixin)
Pheme: Efficient and Conversational Speech Generation, arXiv, 2401.02839, arxiv, pdf, cication: -1

Paweł Budzianowski, Taras Sereda, Tomasz Cichy, Ivan Vulić · (polyai-ldn.github) · (pheme - PolyAI-LDN) · (huggingface)
Incremental FastPitch: Chunk-based High Quality Text to Speech, arXiv, 2401.01755, arxiv, pdf, cication: -1

Muyang Du, Chuan Liu, Junjie Lai
Boosting Large Language Model for Speech Synthesis: An Empirical Study, arXiv, 2401.00246, arxiv, pdf, cication: -1

Hongkun Hao, Long Zhou, Shujie Liu, Jinyu Li, Shujie Hu, Rui Wang, Furu Wei
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis, arXiv, 2312.03491, arxiv, pdf, cication: -1

Zehua Chen, Guande He, Kaiwen Zheng, Xu Tan, Jun Zhu · (bridge-tts.github) · (Bridge-TTS - thu-ml)
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis, arXiv, 2312.03491, arxiv, pdf, cication: -1

Zehua Chen, Guande He, Kaiwen Zheng, Xu Tan, Jun Zhu · (bridge-tts.github)
Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis, arXiv, 2311.11745, arxiv, pdf, cication: -1

Jungil Kong, Junmo Lee, Jeongmin Kim, Beomjeong Kim, Jihoon Park, Dohee Kong, Changheon Lee, Sangjin Kim · (speechelf.github)
Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech, arXiv, 2206.02147, arxiv, pdf, cication: -1

Ziyue Jiang, Zhe Su, Zhou Zhao, Qian Yang, Yi Ren, Jinglin Liu, Zhenhui Ye
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes, arXiv, 2310.14663, arxiv, pdf, cication: -1

Seongho Joo, Hyukhun Koh, Kyomin Jung
Matcha-TTS: A fast TTS architecture with conditional flow matching, arXiv, 2309.03199, arxiv, pdf, cication: -1

Shivam Mehta, Ruibo Tu, Jonas Beskow, Éva Székely, Gustav Eje Henter · (shivammehta25.github)
DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer, arXiv, 2305.19567, arxiv, pdf, cication: -1

Yerin Choi, Myoung-Wan Koo · (dc-comix-tts - lakahaga)
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech, arXiv, 2302.04215, arxiv, pdf, cication: -1

Li-Wei Chen, Shinji Watanabe, Alexander Rudnicky · (MQTTS - b04901014)
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech, arXiv, 2203.16852, arxiv, pdf, cication: -1

Dan Lim, Sunghee Jung, Eesung Kim · (jets - imdanboy) · (imdanboy.github)
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models, arXiv, 2306.07691, arxiv, pdf, cication: -1

Yinghao Aaron Li, Cong Han, Vinay S. Raghavan, Gavin Mischler, Nima Mesgarani · (StyleTTS2 - yl4579) · (styletts2.github)
CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training, arXiv, 2305.10763, arxiv, pdf, cication: -1

Zhenhui Ye, Rongjie Huang, Yi Ren, Ziyue Jiang, Jinglin Liu, Jinzheng He, Xiang Yin, Zhou Zhao
CLAPSpeech
PortaSpeech: Portable and High-Quality Generative Text-to-Speech, arXiv, 2109.15166, arxiv, pdf, cication: -1

Yi Ren, Jinglin Liu, Zhou Zhao
- Abstract | PortaSpeech: Portable and High-Quality Generative Text-to-Speech
PortaSpeech - keonlee9420

PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech
NATSpeech - NATSpeech

A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)

MultiLingual TTS

Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data, arXiv, 2402.18932, arxiv, pdf, cication: -1

Takaaki Saeki, Gary Wang, Nobuyuki Morioka, Isaac Elias, Kyle Kastner, Andrew Rosenberg, Bhuvana Ramabhadran, Heiga Zen, Françoise Beaufays, Hadar Shemtov
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech, arXiv, 2305.19709, arxiv, pdf, cication: -1

Linh The Nguyen, Thinh Pham, Dat Quoc Nguyen
- GitHub - VinAIResearch/XPhoneBERT: XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)
- trained using the RoBERTa pre-training approach on 330M phoneme-level sentences from nearly 100 languages and locales.
Scaling Speech Technology to 1,000+ Languages, arXiv, 2305.13516, arxiv, pdf, cication: -1

Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale, arXiv, 2306.15687, arxiv, pdf, cication: -1

Matthew Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar
UMM-TTS: Unified Multilingual Multispeaker Text to Speech Synthesis in Low Resource Setting
Adapting TTS models For New Speakers using Transfer Learning, arXiv, 2110.05798, arxiv, pdf, cication: -1

Paarth Neekhara, Jason Li, Boris Ginsburg
Better speech synthesis through scaling, arXiv, 2305.07243, arxiv, pdf, cication: -1

James Betker
- TorToiSe - These words were never spoken.
- GitHub - neonbjb/tortoise-tts: A multi-voice TTS system trained with an emphasis on quality

Unsupervised TTS

Bag of Tricks for Unsupervised Text-to-Speech | OpenReview
- Bag of Tricks for Unsupervised TTS
- https://openreview.net/pdf?id=SbR9mpTuBn

Prompt TTS

PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions, arXiv, 2309.08140, arxiv, pdf, cication: -1

Reo Shimizu, Ryuichi Yamamoto, Masaya Kawamura, Yuma Shirahata, Hironori Doi, Tatsuya Komatsu, Kentaro Tachibana
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt, arXiv, 2301.13662, arxiv, pdf, cication: -1

Dongchao Yang, Songxiang Liu, Rongjie Huang, Chao Weng, Helen Meng
- InstructTTS | The deme page of InstructTTS
PromptTTS: Controllable Text-to-Speech with Text Descriptions, arXiv, 2211.12171, arxiv, pdf, cication: -1

Zhifang Guo, Yichong Leng, Yihan Wu, Sheng Zhao, Xu Tan
- PromptTTS: controllable text-to-speech with text descriptions - Speech Research

Efficient TTS

SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models, arXiv, 2406.02328, arxiv, pdf, cication: -1

Dongchao Yang, Dingdong Wang, Haohan Guo, Xueyuan Chen, Xixin Wu, Helen Meng
Speak While You Think: Streaming Speech Synthesis During Text Generation, icassp 2024-2024 ieee international conference on acoustics …, 2024, arxiv, pdf, cication: 1

Avihu Dekel, Slava Shechtman, Raul Fernandez, David Haws, Zvi Kons, Ron Hoory
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models, arXiv, 2404.00569, arxiv, pdf, cication: -1

Xiang Li, Fan Bu, Ambuj Mehrish, Yingting Li, Jiale Han, Bo Cheng, Soujanya Poria

· (CM-TTS - XiangLi2022)

Misc

Fetching Title#d04h
Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations, arXiv, 2311.01260, arxiv, pdf, cication: -1

Hanglei Zhang, Yiwei Guo, Sen Liu, Xie Chen, Kai Yu
E3 TTS: Easy End-to-End Diffusion-based Text to Speech, arXiv, 2311.00945, arxiv, pdf, cication: -1

Yuan Gao, Nobuyuki Morioka, Yu Zhang, Nanxin Chen
edge-tts - rany2

Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
VoiceLDM: Text-to-Speech with Environmental Context, arXiv, 2309.13664, arxiv, pdf, cication: -1

Yeonghyeon Lee, Inmo Yeon, Juhan Nam, Joon Son Chung · (voiceldm.github)
Large-Scale Automatic Audiobook Creation, arXiv, 2309.03926, arxiv, pdf, cication: -1

Brendan Walsh, Mark Hamilton, Greg Newby, Xi Wang, Serena Ruan, Sheng Zhao, Lei He, Shaofei Zhang, Eric Dettinger, William T. Freeman
ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource Scenarios, arXiv, 2305.12200, arxiv, pdf, cication: -1

Yuyue Wang, Huan Xiao, Yihan Wu, Ruihua Song
Controllable Speaking Styles Using a Large Language Model, arXiv, 2305.10321, arxiv, pdf, cication: -1

Atli Thor Sigurgeirsson, Simon King

OpenTTS

Awesome-ChatTTS - libukai

官方推荐的 ChatTTS 最佳入门指南，整理和汇总了常见问题和相关资源
IMS-Toucan - DigitalPhonetics

Multilingual and Controllable Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart.
ChatTTS - 2noise

TTS
- GitHub - jianchang512/ChatTTS-ui: 一个简单的本地网页界面，直接使用ChatTTS将文字合成为语音，同时支持对外提供API接口。
MassTTS - anyvoiceai

a TTS demo for training new characters.
parler-tts - huggingface

Inference and training library for high-quality TTS models.

· (huggingface)
- parler-tts/parler-tts-mini-expresso · Hugging Face
StableTTS - KdaiP

Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
MeloTTS - myshell-ai

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
piper - rhasspy

A fast, local neural text to speech system
xVA-Synth - DanRuta

Machine learning based speech synthesis Electron app, with voices from specific characters from video games · (huggingface)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

awesome_natural_tts.md

awesome_natural_tts.md

Awesome Natural TTS

Evaluation

Vocoder

Emotional TTS

Natural TTS

MultiLingual TTS

Unsupervised TTS

Prompt TTS

Efficient TTS

Misc

OpenTTS

Files

awesome_natural_tts.md

Latest commit

History

awesome_natural_tts.md

File metadata and controls

Awesome Natural TTS

Evaluation

Vocoder

Emotional TTS

Natural TTS

MultiLingual TTS

Unsupervised TTS

Prompt TTS

Efficient TTS

Misc

OpenTTS