总结了ICASSP2022中TTS和VC相关论文,主要是TTS。统计了sessions,topics,tasks/motivation以及对应的论文数量。
- TTS相关工作最多的是acoustic model,其次是expressiveness和prosody。
- Acoustic model主题中关于模型结构的论文居多,其次是关于AM中的时长建模。
- Expresiveness主题中关于disentanglement的论文居多。
- Prosody主题中关于control的论文居多。
- Front-end主题中关于G2P,多音字消歧的论文居多。
- Vocoder主题中大多是提升效率的论文。
- Multimodal主题中关于配音的论文居多。
Sessions | #Sessions | #Papers | |
---|---|---|---|
1 | Expressiveness/Adaptation | 4 | 24 |
2 | General topic | 2 | 12 |
3 | Novel acoustic model | 1 | 6 |
4 | Front-end | 1 | 6 |
5 | Vocoder and evaluation | 1 | 6 |
6 | Multi-lingual/Multimodal | 1 | 6 |
7 | Singing Voice and others | 1 | 5 |
Total | 7 | 11 | 65 |
Topics | #Papers | |
---|---|---|
1 | Acoustic model | 12 |
2 | Expressiveness | 10 |
3 | Prosody | 9 |
4 | Front-end | 8 |
5 | Vocoder | 6 |
6 | Adaptation | 5 |
7 | Multimodal | 5 |
8 | Multi-lingual/Cross-lingual | 4 |
9 | Singing voice synthesis | 2 |
10 | Speech editing | 2 |
11 | Others | 2 |
Total | 11 | 65 |
Tasks | #Papers | |
---|---|---|
1 | Model | 3 |
2 | Duration | 2 |
3 | Analysis | 2 |
4 | Input | 1 |
5 | Speaker | 1 |
6 | Efficiency | 1 |
7 | Noisy data | 1 |
8 | Incremental TTS | 1 |
Total | 8 | 12 |
Tasks | #Papers | |
---|---|---|
1 | Disentanglement | 3 |
2 | Emotion | 2 |
3 | Low-quality | 1 |
4 | Adaption | 1 |
5 | Reference selection | 1 |
6 | Conversational TTS | 1 |
7 | Low-resource | 1 |
Total | 7 | 10 |
Tasks | #Papers | |
---|---|---|
1 | Control | 4 |
2 | Rich prosody | 2 |
3 | Cross-sentence context | 2 |
4 | Word-level prosody | 1 |
Total | 4 | 9 |
Tasks | #Papers | |
---|---|---|
1 | G2P | 2 |
2 | Polyphone disambiguation | 2 |
3 | Prosodic structure prediction | 1 |
4 | POS model compression | 1 |
5 | End-to-end text normalization | 1 |
6 | Mathematical formulas | 1 |
Total | 6 | 8 |
Tasks | #Papers | |
---|---|---|
1 | Efficiency | 4 |
2 | New method | 2 |
Total | 2 | 6 |
Tasks | #Papers | |
---|---|---|
1 | Speaker generation | 1 |
2 | VC for postprocessing | 1 |
3 | Multimodal | 1 |
4 | Low-quality data | 1 |
5 | New structure | 1 |
Total | 5 | 5 |
Tasks | #Papers | |
---|---|---|
1 | Dubbing | 4 |
2 | Speech-to-animation | 1 |
Total | 2 | 5 |
Tasks | #Papers | |
---|---|---|
1 | Data augmentation | 1 |
2 | Lifelong learning | 1 |
3 | Triple loss | 1 |
4 | Improved structure | 1 |
Total | 4 | 4 |
Tasks | #Papers | |
---|---|---|
1 | End-to-end | 1 |
2 | Melody unsupervision | 1 |
Total | 2 | 2 |
Tasks | #Papers | |
---|---|---|
1 | Speech editing | 2 |
Total | 1 | 2 |
Tasks | #Papers | |
---|---|---|
1 | MOS net | 1 |
2 | Phase reconstruction | 1 |
Total | 2 | 2 |
Sessions | #Sessions | #Papers | |
---|---|---|---|
1 | Conversion | 2 | 12 |
2 | Representation | 1 | 6 |
3 | Singing voice and others | 1 | 6 |
Total | 3 | 4 | 24 |
Topics | #Papers | |
---|---|---|
1 | VC | 7 |
2 | One-shot/Representation | 6 |
3 | Singing VC | 3 |
4 | Dysarthric speech | 2 |
5 | Noise robust | 2 |
6 | Prounication robust | 1 |
7 | Streaming VC | 1 |
8 | Data augmentation | 1 |
9 | Tool | 1 |
Total | 9 | 24 |