-
EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks,
arXiv, 2402.00892
, arxiv, pdf, cication: -1Shijia Liao, Shiyi Lan, Arun George Zachariah · (double-blind-eva-gan)
-
Multi-Scale Sub-Band Constant-Q Transform Discriminator for High-Fidelity Vocoder,
arXiv, 2311.14957
, arxiv, pdf, cication: -1Yicheng Gu, Xueyao Zhang, Liumeng Xue, Zhizheng Wu · (Amphion - open-mmlab)
· (vocodexelysium.github)
-
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech,
arXiv, 2406.07803
, arxiv, pdf, cication: -1Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Sang-Hoon Lee, Seong-Whan Lee
-
Exploring speech style spaces with language models: Emotional TTS without emotion labels,
arXiv, 2405.11413
, arxiv, pdf, cication: -1Shreeram Suresh Chandra, Zongyang Du, Berrak Sisman
-
Fine-Grained Quantitative Emotion Editing for Speech Generation,
arXiv, 2403.02002
, arxiv, pdf, cication: -1Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li
-
ED-TTS: Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis,
arXiv, 2401.08166
, arxiv, pdf, cication: -1Haobin Tang, Xulong Zhang, Ning Cheng, Jing Xiao, Jianzong Wang
-
EmotiVoice - netease-youdao
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
-
emotional-vits - innnky
无需情感标注的情感可控语音合成模型,基于VITS
-
ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models,
arXiv, 2305.13831
, arxiv, pdf, cication: -1Minki Kang, Wooseok Han, Sung Ju Hwang, Eunho Yang
-
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback,
arXiv, 2406.00654
, arxiv, pdf, cication: -1Chen Chen, Yuchen Hu, Wen Wu, Helin Wang, Eng Siong Chng, Chao Zhang
-
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data,
arXiv, 2402.08093
, arxiv, pdf, cication: -1Mateusz Łajszczak, Guillermo Cámbara, Yang Li, Fatih Beyhan, Arent van Korlaar, Fan Yang, Arnaud Joly, Álvaro Martín-Cortinas, Ammar Abbas, Adam Michalski · (amazon-ltts-paper)
· (jiqizhixin)
-
Pheme: Efficient and Conversational Speech Generation,
arXiv, 2401.02839
, arxiv, pdf, cication: -1Paweł Budzianowski, Taras Sereda, Tomasz Cichy, Ivan Vulić · (polyai-ldn.github) · (pheme - PolyAI-LDN)
· (huggingface)
-
Incremental FastPitch: Chunk-based High Quality Text to Speech,
arXiv, 2401.01755
, arxiv, pdf, cication: -1Muyang Du, Chuan Liu, Junjie Lai
-
Boosting Large Language Model for Speech Synthesis: An Empirical Study,
arXiv, 2401.00246
, arxiv, pdf, cication: -1Hongkun Hao, Long Zhou, Shujie Liu, Jinyu Li, Shujie Hu, Rui Wang, Furu Wei
-
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis,
arXiv, 2312.03491
, arxiv, pdf, cication: -1Zehua Chen, Guande He, Kaiwen Zheng, Xu Tan, Jun Zhu · (bridge-tts.github) · (Bridge-TTS - thu-ml)
-
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis,
arXiv, 2312.03491
, arxiv, pdf, cication: -1Zehua Chen, Guande He, Kaiwen Zheng, Xu Tan, Jun Zhu · (bridge-tts.github)
-
Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis,
arXiv, 2311.11745
, arxiv, pdf, cication: -1Jungil Kong, Junmo Lee, Jeongmin Kim, Beomjeong Kim, Jihoon Park, Dohee Kong, Changheon Lee, Sangjin Kim · (speechelf.github)
-
Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech,
arXiv, 2206.02147
, arxiv, pdf, cication: -1Ziyue Jiang, Zhe Su, Zhou Zhao, Qian Yang, Yi Ren, Jinglin Liu, Zhenhui Ye
-
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes,
arXiv, 2310.14663
, arxiv, pdf, cication: -1Seongho Joo, Hyukhun Koh, Kyomin Jung
-
Matcha-TTS: A fast TTS architecture with conditional flow matching,
arXiv, 2309.03199
, arxiv, pdf, cication: -1Shivam Mehta, Ruibo Tu, Jonas Beskow, Éva Székely, Gustav Eje Henter · (shivammehta25.github)
-
DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer,
arXiv, 2305.19567
, arxiv, pdf, cication: -1Yerin Choi, Myoung-Wan Koo · (dc-comix-tts - lakahaga)
-
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech,
arXiv, 2302.04215
, arxiv, pdf, cication: -1Li-Wei Chen, Shinji Watanabe, Alexander Rudnicky · (MQTTS - b04901014)
-
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech,
arXiv, 2203.16852
, arxiv, pdf, cication: -1Dan Lim, Sunghee Jung, Eesung Kim · (jets - imdanboy)
· (imdanboy.github)
-
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models,
arXiv, 2306.07691
, arxiv, pdf, cication: -1Yinghao Aaron Li, Cong Han, Vinay S. Raghavan, Gavin Mischler, Nima Mesgarani · (StyleTTS2 - yl4579)
· (styletts2.github)
-
CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training,
arXiv, 2305.10763
, arxiv, pdf, cication: -1Zhenhui Ye, Rongjie Huang, Yi Ren, Ziyue Jiang, Jinglin Liu, Jinzheng He, Xiang Yin, Zhou Zhao
-
PortaSpeech: Portable and High-Quality Generative Text-to-Speech,
arXiv, 2109.15166
, arxiv, pdf, cication: -1Yi Ren, Jinglin Liu, Zhou Zhao
-
PortaSpeech - keonlee9420
PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech
-
NATSpeech - NATSpeech
A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)
-
Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data,
arXiv, 2402.18932
, arxiv, pdf, cication: -1Takaaki Saeki, Gary Wang, Nobuyuki Morioka, Isaac Elias, Kyle Kastner, Andrew Rosenberg, Bhuvana Ramabhadran, Heiga Zen, Françoise Beaufays, Hadar Shemtov
-
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech,
arXiv, 2305.19709
, arxiv, pdf, cication: -1Linh The Nguyen, Thinh Pham, Dat Quoc Nguyen
- GitHub - VinAIResearch/XPhoneBERT: XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)
- trained using the RoBERTa pre-training approach on 330M phoneme-level sentences from nearly 100 languages and locales.
-
Scaling Speech Technology to 1,000+ Languages,
arXiv, 2305.13516
, arxiv, pdf, cication: -1Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi
-
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale,
arXiv, 2306.15687
, arxiv, pdf, cication: -1Matthew Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar
-
UMM-TTS: Unified Multilingual Multispeaker Text to Speech Synthesis in Low Resource Setting
-
Adapting TTS models For New Speakers using Transfer Learning,
arXiv, 2110.05798
, arxiv, pdf, cication: -1Paarth Neekhara, Jason Li, Boris Ginsburg
-
Better speech synthesis through scaling,
arXiv, 2305.07243
, arxiv, pdf, cication: -1James Betker
-
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions,
arXiv, 2309.08140
, arxiv, pdf, cication: -1Reo Shimizu, Ryuichi Yamamoto, Masaya Kawamura, Yuma Shirahata, Hironori Doi, Tatsuya Komatsu, Kentaro Tachibana
-
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt,
arXiv, 2301.13662
, arxiv, pdf, cication: -1Dongchao Yang, Songxiang Liu, Rongjie Huang, Chao Weng, Helen Meng
-
PromptTTS: Controllable Text-to-Speech with Text Descriptions,
arXiv, 2211.12171
, arxiv, pdf, cication: -1Zhifang Guo, Yichong Leng, Yihan Wu, Sheng Zhao, Xu Tan
-
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models,
arXiv, 2406.02328
, arxiv, pdf, cication: -1Dongchao Yang, Dingdong Wang, Haohan Guo, Xueyuan Chen, Xixin Wu, Helen Meng
-
Speak While You Think: Streaming Speech Synthesis During Text Generation,
icassp 2024-2024 ieee international conference on acoustics …, 2024
, arxiv, pdf, cication: 1Avihu Dekel, Slava Shechtman, Raul Fernandez, David Haws, Zvi Kons, Ron Hoory
-
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models,
arXiv, 2404.00569
, arxiv, pdf, cication: -1Xiang Li, Fan Bu, Ambuj Mehrish, Yingting Li, Jiale Han, Bo Cheng, Soujanya Poria
· (CM-TTS - XiangLi2022)
-
Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations,
arXiv, 2311.01260
, arxiv, pdf, cication: -1Hanglei Zhang, Yiwei Guo, Sen Liu, Xie Chen, Kai Yu
-
E3 TTS: Easy End-to-End Diffusion-based Text to Speech,
arXiv, 2311.00945
, arxiv, pdf, cication: -1Yuan Gao, Nobuyuki Morioka, Yu Zhang, Nanxin Chen
-
edge-tts - rany2
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
-
VoiceLDM: Text-to-Speech with Environmental Context,
arXiv, 2309.13664
, arxiv, pdf, cication: -1Yeonghyeon Lee, Inmo Yeon, Juhan Nam, Joon Son Chung · (voiceldm.github)
-
Large-Scale Automatic Audiobook Creation,
arXiv, 2309.03926
, arxiv, pdf, cication: -1Brendan Walsh, Mark Hamilton, Greg Newby, Xi Wang, Serena Ruan, Sheng Zhao, Lei He, Shaofei Zhang, Eric Dettinger, William T. Freeman
-
ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource Scenarios,
arXiv, 2305.12200
, arxiv, pdf, cication: -1Yuyue Wang, Huan Xiao, Yihan Wu, Ruihua Song
-
Controllable Speaking Styles Using a Large Language Model,
arXiv, 2305.10321
, arxiv, pdf, cication: -1Atli Thor Sigurgeirsson, Simon King
-
Awesome-ChatTTS - libukai
官方推荐的 ChatTTS 最佳入门指南,整理和汇总了常见问题和相关资源
-
IMS-Toucan - DigitalPhonetics
Multilingual and Controllable Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart.
-
ChatTTS - 2noise
TTS
-
MassTTS - anyvoiceai
a TTS demo for training new characters.
-
parler-tts - huggingface
Inference and training library for high-quality TTS models.
· (huggingface)
-
StableTTS - KdaiP
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
-
MeloTTS - myshell-ai
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
-
piper - rhasspy
A fast, local neural text to speech system
-
xVA-Synth - DanRuta
Machine learning based speech synthesis Electron app, with voices from specific characters from video games · (huggingface)