ESLTTS

The full paper can be accessed here: arXiv, IEEE Xplore.

Abstract

With the progress made in speaker-adaptive TTS approaches, advanced approaches have shown a remarkable capacity to reproduce the speaker’s voice in the commonly used TTS datasets. However, mimicking voices characterized by substantial accents, such as non-native English speakers, is still challenging. Regrettably, the absence of a dedicated TTS dataset for speakers with substantial accents inhibits the research and evaluation of speaker-adaptive TTS models under such conditions. To address this gap, we developed a corpus of non-native speakers' English utterances.

We named this corpus “English as a Second Language TTS dataset ” (ESLTTS). The ESLTTS dataset consists of roughly 37 hours of 42,000 utterances from 134 non-native English speakers. These speakers represent a diversity of linguistic backgrounds spanning 31 native languages. For each speaker, the dataset includes an adaptation set lasting about 5 minutes for speaker adaptation, a test set comprising 10 utterances for speaker-adaptive TTS evaluation, and a development set for further research.

Dataset Structure

ESLTTS Dataset/
├─ Malayalam_3/     ------------ {Speaker Native Language}_{Speaker id}
│  ├─ ada_1.flac    ------------ {Subset Name}_{Utterance id}
│  ├─ ada_1.txt     ------------ Transcription for "ada_1.flac"
│  ├─ test_1.flac   ------------ {Subset Name}_{Utterance id}
│  ├─ test_1.txt    ------------ Transcription for "test_1.flac"
│  ├─ dev_1.flac    ------------ {Subset Name}_{Utterance id}
│  ├─ dev_1.txt     ------------ Transcription for "dev_1.flac"
│  ├─ ...
├─ Arabic_3/        ------------ {Speaker Native Language}_{Speaker id}
│  ├─ ada_1.flac    ------------ {Subset Name}_{Utterance id}
│  ├─ ...
├─ ...

Online Access

You can access this dataset through Google Driver or IEEE Dataport.

Citation

@article{wang2024usat,
  title={USAT: A Universal Speaker-Adaptive Text-to-Speech Approach},
  author={Wang, Wenbin and Song, Yang and Jha, Sanjay},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  year={2024},
  publisher={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

Repository files navigation

ESLTTS

Abstract

Dataset Structure

Online Access

Citation

About

Releases

Packages

License

mushanshanshan/ESLTTS

Folders and files

Latest commit

History

LICENSE

LICENSE

README.md

README.md

Repository files navigation

ESLTTS

Abstract

Dataset Structure

Online Access

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages