Base paper

Text-to-Speech Synthesis by Paul Taylor http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.118.5905&rep=rep1&type=pdf

Experimental and theoretical advances in prosody: A review https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3216045/

Intonational Phonology by Ladd https://books.google.de/books?id=ys_jtGM5WjYC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false

Adversarial Autoencoders https://arxiv.org/pdf/1511.05644.pdf

https://github.com/Naresh1318/Adversarial_Autoencoder

https://www.cl.uni-heidelberg.de/courses/ws14/deepl/BengioETAL12.pdf

IEMOCAP pdf https://sail.usc.edu/iemocap/Busso_2008_iemocap.pdf

Audio Google papers https://google.github.io/tacotron/

Base paper

paper	status	link/tag
Tacotron: Towards End-to-End Speech Synthesis	finished	https://arxiv.org/pdf/1703.10135.pdf
Uncovering Latent Style Factors for Expressive Speech Synthesis	finished	https://arxiv.org/pdf/1711.00520.pdf
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions	finished	https://arxiv.org/pdf/1712.05884.pdf https://ai.googleblog.com/2017/12/tacotron-2-generating-human-like-speech.html
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron	finished	https://arxiv.org/pdf/1803.09047.pdf https://ai.googleblog.com/2018/03/expressive-speech-synthesis-with.html
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis	finished	https://arxiv.org/pdf/1803.09017.pdf https://ai.googleblog.com/2018/03/expressive-speech-synthesis-with.html

Stylelayer

paper	status	link/tag
LEARNING LATENT REPRESENTATIONS FOR STYLE CONTROL AND TRANSFER IN END-TO-END SPEECH SYNTHESIS	ICASSP2019 finished	https://arxiv.org/pdf/1812.04342.pdf http://home.ustc.edu.cn/~zyj008/ICASSP2019/
Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis	finished	https://arxiv.org/pdf/1808.01410.pdf
Hierarchical Generative Modeling for Controllable Speech Synthesis	finished	https://arxiv.org/pdf/1810.07217.pdf
Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization	finished	https://openreview.net/pdf?id=Bkg9ZeBB37
Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis	can use this	https://goo.gl/Jy8WvF
Neural Discrete Representation Learning	read again for clarity	https://arxiv.org/pdf/1711.00937.pdf
A Style Control Technique for HMM-Based Speech Synthesis	cant be extended	https://goo.gl/Y9caHX
Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data	similar work done by google	https://arxiv.org/pdf/1709.07902.pdf
Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder	Better implementations are present	https://arxiv.org/pdf/1804.02135.pdf
A Comparison of Expressive Speech Synthesis Approaches based on Neural Network	great paper. can be used	http://lxie.npu-aslp.org/papers/2018ASMMC-XLM.pdf
Investigating context features hidden in End-to-End TTS	good read but not relevant	https://arxiv.org/pdf/1811.01376.pdf
Improving Unsupervised Style Transfer in end-to-end Speech Synthesis with end-to-end Speech Recognition	finished	http://speech.ee.ntu.edu.tw/~tlkagk/paper/asr-guided-tacotron.pdf
Speech, Prosody, and Machines: Nine Challenges for Prosody Research	read again for lit review not for approach	https://www.isca-speech.org/archive/SpeechProsody_2018/pdfs/_Inv-5.pdf
Learning Latent Representations for Speech Generation and Transformation	finished	https://arxiv.org/pdf/1704.04222.pdf
Disentangled sequential autoencoder	finished	https://arxiv.org/pdf/1803.02991.pdf
Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis	finished	https://arxiv.org/pdf/1807.11470.pdf
FEATURE BASED ADAPTATION FOR SPEAKING STYLE SYNTHESIS	not a great paper wrt my rs view	https://goo.gl/f95mGb
NEURAL TTS STYLIZATION WITH ADVERSARIAL AND COLLABORATIVE GAMES (tts gan)	iclr 2019	https://openreview.net/pdf?id=ByzcS3AcYX https://researchdemopage.wixsite.com/tts-gan
ROBUST AND FINE-GRAINED PROSODY CONTROL OF END-TO-END SPEECH SYNTHESIS	icassp 2019	https://arxiv.org/pdf/1811.02122.pdf http://neosapience.com/en/research/2018-10-29-icassp/

Emotion

paper	status	link/tag
A Comparison of Expressive Speech Synthesis Approaches based on Neural Network	great paper. can be used	http://lxie.npu-aslp.org/papers/2018ASMMC-XLM.pdf
Attentive Convolutional Neural Network based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech	finished	https://arxiv.org/pdf/1706.00612.pdf
Emotional Statistical Parametric Speech Synthesis Using LSTM-RNNs	finished	https://ieeexplore.ieee.org/document/8282282
An Investigation to Transplant Emotional Expressions in DNN-based TTS Synthesis Synthesis	can be used with paper 1	https://ieeexplore.ieee.org/document/8282231
Unsupervised clustering of emotion and voice styles for expressive tts.	finished	https://ieeexplore.ieee.org/document/6288797
A DNN-based emotional speech synthesis by speaker adaptation	similar to other paper	http://www.apsipa.org/proceedings/2018/pdfs/0000633.pdf
Speaker Representations for Speaker Adaptation in Multiple Speakers BLSTM-RNN-based Speech Synthesis	not a great paper wrt my rs view	https://goo.gl/LynbNz
Emotional transplant in statistical speech synthesis based on emotion additive model	finished	https://www.isca-speech.org/archive/interspeech_2015/papers/i15_0274.pdf
Emotional End-to-End Neural Speech synthesizer	finished	https://arxiv.org/pdf/1711.05447.pdf

Compare paper

paper	status	link/tag
VOICELOOP: VOICE FITTING AND SYNTHESIS VIA A PHONOLOGICAL LOOP	not imp	https://arxiv.org/pdf/1707.06588.pdf
CHAR2WAV: END-TO-END SPEECH SYNTHESIS	not imp	https://mila.quebec/wp-content/uploads/2017/02/end-end-speech.pdf
DEEP VOICE 3: SCALING TEXT-TO-SPEECH WITH CONVOLUTIONAL SEQUENCE LEARNING	not imp	https://arxiv.org/pdf/1710.07654.pdf
VOICELOOP: VOICE FITTING AND SYNTHESIS VIA A PHONOLOGICAL LOOP	not imp	https://arxiv.org/pdf/1707.06588.pdf

PHD thesis http://veu.talp.cat/igor/PhD_Igor_Jauk-June2017.pdf Unsupervised Learning for Expressive Speech Synthesis MSc thesis https://github.com/FeiCoding/State_of_the_art_tacotron2_model_reproduction Reproduction & Improvement of State-of-art TTS model

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Base paper

Stylelayer

Emotion

Compare paper

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Base paper

Stylelayer

Emotion

Compare paper

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages