Skip to content

erishan6/paperlist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 

Repository files navigation

Text-to-Speech Synthesis by Paul Taylor http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.118.5905&rep=rep1&type=pdf

Experimental and theoretical advances in prosody: A review https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3216045/

Intonational Phonology by Ladd https://books.google.de/books?id=ys_jtGM5WjYC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false

Adversarial Autoencoders https://arxiv.org/pdf/1511.05644.pdf

https://github.com/Naresh1318/Adversarial_Autoencoder

https://www.cl.uni-heidelberg.de/courses/ws14/deepl/BengioETAL12.pdf

IEMOCAP pdf https://sail.usc.edu/iemocap/Busso_2008_iemocap.pdf

Audio Google papers https://google.github.io/tacotron/

Base paper

paper status link/tag
Tacotron: Towards End-to-End Speech Synthesis finished https://arxiv.org/pdf/1703.10135.pdf
Uncovering Latent Style Factors for Expressive Speech Synthesis finished https://arxiv.org/pdf/1711.00520.pdf
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions finished https://arxiv.org/pdf/1712.05884.pdf https://ai.googleblog.com/2017/12/tacotron-2-generating-human-like-speech.html
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron finished https://arxiv.org/pdf/1803.09047.pdf https://ai.googleblog.com/2018/03/expressive-speech-synthesis-with.html
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis finished https://arxiv.org/pdf/1803.09017.pdf https://ai.googleblog.com/2018/03/expressive-speech-synthesis-with.html

Stylelayer

paper status link/tag
LEARNING LATENT REPRESENTATIONS FOR STYLE CONTROL AND TRANSFER IN END-TO-END SPEECH SYNTHESIS ICASSP2019 finished https://arxiv.org/pdf/1812.04342.pdf http://home.ustc.edu.cn/~zyj008/ICASSP2019/
Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis finished https://arxiv.org/pdf/1808.01410.pdf
Hierarchical Generative Modeling for Controllable Speech Synthesis finished https://arxiv.org/pdf/1810.07217.pdf
Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization finished https://openreview.net/pdf?id=Bkg9ZeBB37
Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis can use this https://goo.gl/Jy8WvF
Neural Discrete Representation Learning read again for clarity https://arxiv.org/pdf/1711.00937.pdf
A Style Control Technique for HMM-Based Speech Synthesis cant be extended https://goo.gl/Y9caHX
Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data similar work done by google https://arxiv.org/pdf/1709.07902.pdf
Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder Better implementations are present https://arxiv.org/pdf/1804.02135.pdf
A Comparison of Expressive Speech Synthesis Approaches based on Neural Network great paper. can be used http://lxie.npu-aslp.org/papers/2018ASMMC-XLM.pdf
Investigating context features hidden in End-to-End TTS good read but not relevant https://arxiv.org/pdf/1811.01376.pdf
Improving Unsupervised Style Transfer in end-to-end Speech Synthesis with end-to-end Speech Recognition finished http://speech.ee.ntu.edu.tw/~tlkagk/paper/asr-guided-tacotron.pdf
Speech, Prosody, and Machines: Nine Challenges for Prosody Research read again for lit review not for approach https://www.isca-speech.org/archive/SpeechProsody_2018/pdfs/_Inv-5.pdf
Learning Latent Representations for Speech Generation and Transformation finished https://arxiv.org/pdf/1704.04222.pdf
Disentangled sequential autoencoder finished https://arxiv.org/pdf/1803.02991.pdf
Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis finished https://arxiv.org/pdf/1807.11470.pdf
FEATURE BASED ADAPTATION FOR SPEAKING STYLE SYNTHESIS not a great paper wrt my rs view https://goo.gl/f95mGb
NEURAL TTS STYLIZATION WITH ADVERSARIAL AND COLLABORATIVE GAMES (tts gan) iclr 2019 https://openreview.net/pdf?id=ByzcS3AcYX https://researchdemopage.wixsite.com/tts-gan
ROBUST AND FINE-GRAINED PROSODY CONTROL OF END-TO-END SPEECH SYNTHESIS icassp 2019 https://arxiv.org/pdf/1811.02122.pdf http://neosapience.com/en/research/2018-10-29-icassp/

Emotion

paper status link/tag
A Comparison of Expressive Speech Synthesis Approaches based on Neural Network great paper. can be used http://lxie.npu-aslp.org/papers/2018ASMMC-XLM.pdf
Attentive Convolutional Neural Network based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech finished https://arxiv.org/pdf/1706.00612.pdf
Emotional Statistical Parametric Speech Synthesis Using LSTM-RNNs finished https://ieeexplore.ieee.org/document/8282282
An Investigation to Transplant Emotional Expressions in DNN-based TTS Synthesis Synthesis can be used with paper 1 https://ieeexplore.ieee.org/document/8282231
Unsupervised clustering of emotion and voice styles for expressive tts. finished https://ieeexplore.ieee.org/document/6288797
A DNN-based emotional speech synthesis by speaker adaptation similar to other paper http://www.apsipa.org/proceedings/2018/pdfs/0000633.pdf
Speaker Representations for Speaker Adaptation in Multiple Speakers BLSTM-RNN-based Speech Synthesis not a great paper wrt my rs view https://goo.gl/LynbNz
Emotional transplant in statistical speech synthesis based on emotion additive model finished https://www.isca-speech.org/archive/interspeech_2015/papers/i15_0274.pdf
Emotional End-to-End Neural Speech synthesizer finished https://arxiv.org/pdf/1711.05447.pdf

Compare paper

paper status link/tag
VOICELOOP: VOICE FITTING AND SYNTHESIS VIA A PHONOLOGICAL LOOP not imp https://arxiv.org/pdf/1707.06588.pdf
CHAR2WAV: END-TO-END SPEECH SYNTHESIS not imp https://mila.quebec/wp-content/uploads/2017/02/end-end-speech.pdf
DEEP VOICE 3: SCALING TEXT-TO-SPEECH WITH CONVOLUTIONAL SEQUENCE LEARNING not imp https://arxiv.org/pdf/1710.07654.pdf
VOICELOOP: VOICE FITTING AND SYNTHESIS VIA A PHONOLOGICAL LOOP not imp https://arxiv.org/pdf/1707.06588.pdf

PHD thesis http://veu.talp.cat/igor/PhD_Igor_Jauk-June2017.pdf Unsupervised Learning for Expressive Speech Synthesis MSc thesis https://github.com/FeiCoding/State_of_the_art_tacotron2_model_reproduction Reproduction & Improvement of State-of-art TTS model

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors