Skip to content

Latest commit

 

History

History
259 lines (177 loc) · 21.1 KB

awesome_natural_tts.md

File metadata and controls

259 lines (177 loc) · 21.1 KB

Awesome Natural TTS

Evaluation

Vocoder

  • EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks, arXiv, 2402.00892, arxiv, pdf, cication: -1

    Shijia Liao, Shiyi Lan, Arun George Zachariah · (double-blind-eva-gan)

  • Multi-Scale Sub-Band Constant-Q Transform Discriminator for High-Fidelity Vocoder, arXiv, 2311.14957, arxiv, pdf, cication: -1

    Yicheng Gu, Xueyao Zhang, Liumeng Xue, Zhizheng Wu · (Amphion - open-mmlab) Star · (vocodexelysium.github)

Emotional TTS

  • EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech, arXiv, 2406.07803, arxiv, pdf, cication: -1

    Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Sang-Hoon Lee, Seong-Whan Lee

  • Exploring speech style spaces with language models: Emotional TTS without emotion labels, arXiv, 2405.11413, arxiv, pdf, cication: -1

    Shreeram Suresh Chandra, Zongyang Du, Berrak Sisman

  • Fine-Grained Quantitative Emotion Editing for Speech Generation, arXiv, 2403.02002, arxiv, pdf, cication: -1

    Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li

  • ED-TTS: Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis, arXiv, 2401.08166, arxiv, pdf, cication: -1

    Haobin Tang, Xulong Zhang, Ning Cheng, Jing Xiao, Jianzong Wang

  • EmotiVoice - netease-youdao Star

    EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

  • emotional-vits - innnky Star

    无需情感标注的情感可控语音合成模型,基于VITS

  • ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models, arXiv, 2305.13831, arxiv, pdf, cication: -1

    Minki Kang, Wooseok Han, Sung Ju Hwang, Eunho Yang

Natural TTS

  • Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback, arXiv, 2406.00654, arxiv, pdf, cication: -1

    Chen Chen, Yuchen Hu, Wen Wu, Helin Wang, Eng Siong Chng, Chao Zhang

  • BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data, arXiv, 2402.08093, arxiv, pdf, cication: -1

    Mateusz Łajszczak, Guillermo Cámbara, Yang Li, Fatih Beyhan, Arent van Korlaar, Fan Yang, Arnaud Joly, Álvaro Martín-Cortinas, Ammar Abbas, Adam Michalski · (amazon-ltts-paper)

    · (jiqizhixin)

  • Pheme: Efficient and Conversational Speech Generation, arXiv, 2401.02839, arxiv, pdf, cication: -1

    Paweł Budzianowski, Taras Sereda, Tomasz Cichy, Ivan Vulić · (polyai-ldn.github) · (pheme - PolyAI-LDN) Star · (huggingface)

  • Incremental FastPitch: Chunk-based High Quality Text to Speech, arXiv, 2401.01755, arxiv, pdf, cication: -1

    Muyang Du, Chuan Liu, Junjie Lai

  • Boosting Large Language Model for Speech Synthesis: An Empirical Study, arXiv, 2401.00246, arxiv, pdf, cication: -1

    Hongkun Hao, Long Zhou, Shujie Liu, Jinyu Li, Shujie Hu, Rui Wang, Furu Wei

  • Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis, arXiv, 2312.03491, arxiv, pdf, cication: -1

    Zehua Chen, Guande He, Kaiwen Zheng, Xu Tan, Jun Zhu · (bridge-tts.github) · (Bridge-TTS - thu-ml) Star

  • Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis, arXiv, 2312.03491, arxiv, pdf, cication: -1

    Zehua Chen, Guande He, Kaiwen Zheng, Xu Tan, Jun Zhu · (bridge-tts.github)

  • Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis, arXiv, 2311.11745, arxiv, pdf, cication: -1

    Jungil Kong, Junmo Lee, Jeongmin Kim, Beomjeong Kim, Jihoon Park, Dohee Kong, Changheon Lee, Sangjin Kim · (speechelf.github)

  • Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech, arXiv, 2206.02147, arxiv, pdf, cication: -1

    Ziyue Jiang, Zhe Su, Zhou Zhao, Qian Yang, Yi Ren, Jinglin Liu, Zhenhui Ye

  • DPP-TTS: Diversifying prosodic features of speech via determinantal point processes, arXiv, 2310.14663, arxiv, pdf, cication: -1

    Seongho Joo, Hyukhun Koh, Kyomin Jung

  • Matcha-TTS: A fast TTS architecture with conditional flow matching, arXiv, 2309.03199, arxiv, pdf, cication: -1

    Shivam Mehta, Ruibo Tu, Jonas Beskow, Éva Székely, Gustav Eje Henter · (shivammehta25.github)

  • DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer, arXiv, 2305.19567, arxiv, pdf, cication: -1

    Yerin Choi, Myoung-Wan Koo · (dc-comix-tts - lakahaga) Star

  • A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech, arXiv, 2302.04215, arxiv, pdf, cication: -1

    Li-Wei Chen, Shinji Watanabe, Alexander Rudnicky · (MQTTS - b04901014) Star

  • JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech, arXiv, 2203.16852, arxiv, pdf, cication: -1

    Dan Lim, Sunghee Jung, Eesung Kim · (jets - imdanboy) Star · (imdanboy.github)

  • StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models, arXiv, 2306.07691, arxiv, pdf, cication: -1

    Yinghao Aaron Li, Cong Han, Vinay S. Raghavan, Gavin Mischler, Nima Mesgarani · (StyleTTS2 - yl4579) Star · (styletts2.github)

  • CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training, arXiv, 2305.10763, arxiv, pdf, cication: -1

    Zhenhui Ye, Rongjie Huang, Yi Ren, Ziyue Jiang, Jinglin Liu, Jinzheng He, Xiang Yin, Zhou Zhao

  • CLAPSpeech

  • PortaSpeech: Portable and High-Quality Generative Text-to-Speech, arXiv, 2109.15166, arxiv, pdf, cication: -1

    Yi Ren, Jinglin Liu, Zhou Zhao

  • PortaSpeech - keonlee9420 Star

    PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech

  • NATSpeech - NATSpeech Star

    A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)

MultiLingual TTS

Unsupervised TTS

Prompt TTS

  • PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions, arXiv, 2309.08140, arxiv, pdf, cication: -1

    Reo Shimizu, Ryuichi Yamamoto, Masaya Kawamura, Yuma Shirahata, Hironori Doi, Tatsuya Komatsu, Kentaro Tachibana

  • InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt, arXiv, 2301.13662, arxiv, pdf, cication: -1

    Dongchao Yang, Songxiang Liu, Rongjie Huang, Chao Weng, Helen Meng

  • PromptTTS: Controllable Text-to-Speech with Text Descriptions, arXiv, 2211.12171, arxiv, pdf, cication: -1

    Zhifang Guo, Yichong Leng, Yihan Wu, Sheng Zhao, Xu Tan

Efficient TTS

  • SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models, arXiv, 2406.02328, arxiv, pdf, cication: -1

    Dongchao Yang, Dingdong Wang, Haohan Guo, Xueyuan Chen, Xixin Wu, Helen Meng

  • Speak While You Think: Streaming Speech Synthesis During Text Generation, icassp 2024-2024 ieee international conference on acoustics …, 2024, arxiv, pdf, cication: 1

    Avihu Dekel, Slava Shechtman, Raul Fernandez, David Haws, Zvi Kons, Ron Hoory

  • CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models, arXiv, 2404.00569, arxiv, pdf, cication: -1

    Xiang Li, Fan Bu, Ambuj Mehrish, Yingting Li, Jiale Han, Bo Cheng, Soujanya Poria

    · (CM-TTS - XiangLi2022) Star

Misc

  • Fetching Title#d04h

  • Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations, arXiv, 2311.01260, arxiv, pdf, cication: -1

    Hanglei Zhang, Yiwei Guo, Sen Liu, Xie Chen, Kai Yu

  • E3 TTS: Easy End-to-End Diffusion-based Text to Speech, arXiv, 2311.00945, arxiv, pdf, cication: -1

    Yuan Gao, Nobuyuki Morioka, Yu Zhang, Nanxin Chen

  • edge-tts - rany2 Star

    Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key

  • VoiceLDM: Text-to-Speech with Environmental Context, arXiv, 2309.13664, arxiv, pdf, cication: -1

    Yeonghyeon Lee, Inmo Yeon, Juhan Nam, Joon Son Chung · (voiceldm.github)

  • Large-Scale Automatic Audiobook Creation, arXiv, 2309.03926, arxiv, pdf, cication: -1

    Brendan Walsh, Mark Hamilton, Greg Newby, Xi Wang, Serena Ruan, Sheng Zhao, Lei He, Shaofei Zhang, Eric Dettinger, William T. Freeman

  • ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource Scenarios, arXiv, 2305.12200, arxiv, pdf, cication: -1

    Yuyue Wang, Huan Xiao, Yihan Wu, Ruihua Song

  • Controllable Speaking Styles Using a Large Language Model, arXiv, 2305.10321, arxiv, pdf, cication: -1

    Atli Thor Sigurgeirsson, Simon King

OpenTTS