So-to-Speak: an exploratory platform for investigating the interplay between style and prosody in TTS

Interspeech 2023 Demonstration

We introduce So-to-Speak, a customisable interface tailored for showcasing the capabilities of different controllable TTS systems. The interface allows for the generation, synthesis, and playback of hundreds of samples simultaneously, displayed on an interactive grid, with variation both low level prosodic features and high level style controls. To offer insights into speech quality, automatic estimates of MOS scores are presented for each sample. So-to-Speak facilitates the audiovisual exploration of the interaction between various speech features, which can be useful in a range of applications in speech technology.

Acknowledgements

The code implementation is using an adaptation of Nvidia's implementation of Tacotron 2 as a synthesis engine, HiFi-GAN as vocoder, and code and the trained model for the automatic MOS predictor.

Setup

synthesis model and vocoder

Pre-trained models for the synthesis model and the HiFi-GAN model are provided as assets in the release and should be placed in the models/tronduo and models/hifigan folders respectively.

Automatic MOS predictor

Run the script run_inference.py from the repository should download both a base wav2vec model and a fine-tuned version, both of which are required to run MOS prediction script here. Both should be placed in the models/mos_ssl folder.

Start

Run the code in the Jupyter Notebook SoToSpeak_launch_interface.ipynb to start the demo; availability of GPU and use of CUDA is required for synthesis.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
hifigan		hifigan
models		models
mos_ssl		mos_ssl
tronduo		tronduo
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SoToSpeak_launch_interface.ipynb		SoToSpeak_launch_interface.ipynb
requirements.txt		requirements.txt
synthesis_sts_joes_srf0.py		synthesis_sts_joes_srf0.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hifigan

hifigan

models

models

mos_ssl

mos_ssl

tronduo

tronduo

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

SoToSpeak_launch_interface.ipynb

SoToSpeak_launch_interface.ipynb

requirements.txt

requirements.txt

synthesis_sts_joes_srf0.py

synthesis_sts_joes_srf0.py

Repository files navigation

So-to-Speak: an exploratory platform for investigating the interplay between style and prosody in TTS

Acknowledgements

Setup

synthesis model and vocoder

Automatic MOS predictor

Start

About

Releases 1

Packages

Contributors 2

Languages

License

evaszekely/So_To_Speak

Folders and files

Latest commit

History

Repository files navigation

So-to-Speak: an exploratory platform for investigating the interplay between style and prosody in TTS

Acknowledgements

Setup

synthesis model and vocoder

Automatic MOS predictor

Start

About

Resources

License

Stars

Watchers

Forks

Languages